A study on the impact of pre-processing techniques in Spanish and english text classification over short and large text documents

Gerardo Orellana, Belen Arias, Marcos Orellana, Victor Saquicela, Fernando Baculima, Nelson Piedra

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

19 Citas (Scopus)

Resumen

Nowadays, text mining is a long studied field in science, the vast amount of text resources available has made scientist explore several domains through many different techniques. One of the main processes in text mining are cleaning, reduction and transformation before the application of a classification algorithm. Preprocessing has a large impact in classification algorithms because text is an unstructured form of data with very large number of dimensions and therefore it can be seen as a very sparse matrix. These characteristics that make text so complex are addressed by preprocessing algorithms which extract the main data features. We present a work with a comparison of the performance of the different preprocessing algorithms for a classification problem in two datasets written in Spanish and English.

Idioma originalInglés
Título de la publicación alojadaProceedings - 3rd International Conference on Information Systems and Computer Science, INCISCOS 2018
EditorialInstitute of Electrical and Electronics Engineers Inc.
Páginas277-283
Número de páginas7
ISBN (versión digital)9781538676127
DOI
EstadoPublicada - 5 dic. 2018
Publicado de forma externa
Evento3rd International Conference on Information Systems and Computer Science, INCISCOS 2018 - Quito, Ecuador
Duración: 14 nov. 201816 nov. 2018

Serie de la publicación

NombreProceedings - 3rd International Conference on Information Systems and Computer Science, INCISCOS 2018
Volumen2018-December

Conferencia

Conferencia3rd International Conference on Information Systems and Computer Science, INCISCOS 2018
País/TerritorioEcuador
CiudadQuito
Período14/11/1816/11/18

Huella

Profundice en los temas de investigación de 'A study on the impact of pre-processing techniques in Spanish and english text classification over short and large text documents'. En conjunto forman una huella única.

Citar esto