A Comparative Evaluation of Preprocessing Techniques for Short Texts in Spanish

Marcos Orellana, Andrea Trujillo, Priscila Cedillo

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

2 Citas (Scopus)

Resumen

Natural Language Processing (NLP) is used to identify key information, generating predictive models, and explaining global events or trends. Also, NLP is supported during the process to create knowledge. Therefore, it is important to apply refinement techniques in major stages such as preprocessing, when data is frequently produced and processed with poor results. This document analyzes and measures the impact of combinations of preprocessing techniques and libraries for short texts that have been written in Spanish. These techniques were applied in tweets for analysis of sentiments considering evaluation parameters in its analysis, the processing time and characteristics of the techniques for each library. The performed experimentation provides readers insights for choosing the appropriate combination of techniques during preprocessing. The results show improvement of up to 5% to 9% in the performance of the classification.

Idioma originalInglés
Título de la publicación alojadaAdvances in Information and Communication - Proceedings of the 2020 Future of Information and Communication Conference FICC
EditoresKohei Arai, Supriya Kapoor, Rahul Bhatia
EditorialSpringer
Páginas111-124
Número de páginas14
ISBN (versión impresa)9783030394417
DOI
EstadoPublicada - 2020
EventoFuture of Information and Communication Conference, FICC 2020 - San Francisco, Estados Unidos
Duración: 5 mar. 20206 mar. 2020

Serie de la publicación

NombreAdvances in Intelligent Systems and Computing
Volumen1130 AISC
ISSN (versión impresa)2194-5357
ISSN (versión digital)2194-5365

Conferencia

ConferenciaFuture of Information and Communication Conference, FICC 2020
País/TerritorioEstados Unidos
CiudadSan Francisco
Período5/03/206/03/20

Huella

Profundice en los temas de investigación de 'A Comparative Evaluation of Preprocessing Techniques for Short Texts in Spanish'. En conjunto forman una huella única.

Citar esto