TY - GEN
T1 - Automatic speech-to-text transcription in an ecuadorian radio broadcast context
AU - Sigcha, Erik
AU - Medina, José
AU - Vega, Francisco
AU - Saquicela, Víctor
AU - Espinoza, Mauricio
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - A key element to enable the analysis and accessing to radio broadcast content is the development of automatic speech-to-text systems. The building of these systems has been possible given the current available of different speech resources, models, and open source services designed mainly for English language. However, the most of these tools have been migrated to other languages like Spanish for avoiding the creation of these systems from scratch. Despite existing efforts there is no clear evidence of the tools that can be used to convert audio to text in other dialects of Spanish. Also, the most of these systems are trained to consider a specific context, therefore, audio transcription systems personalized for a language and a specific context are needed. This article describes the implementation of an architecture oriented to automatic speech-to-text transcription applied on Ecuadorian radio broadcasters, using available free tools for performing audio segmentation and transcription. The selected tools were evaluated measuring their performance and facilities for adjusting to the defined architecture. At the end, a Web application was developed and its final performance was compared with IBM Watson speech to text service; the results show that the proposed system improves the accuracy and achieves a Word Error Rate around 10%. The obtained results allow to suggest the use of a free tools set in order to train models oriented to specific speech-to-text transcription scenarios.
AB - A key element to enable the analysis and accessing to radio broadcast content is the development of automatic speech-to-text systems. The building of these systems has been possible given the current available of different speech resources, models, and open source services designed mainly for English language. However, the most of these tools have been migrated to other languages like Spanish for avoiding the creation of these systems from scratch. Despite existing efforts there is no clear evidence of the tools that can be used to convert audio to text in other dialects of Spanish. Also, the most of these systems are trained to consider a specific context, therefore, audio transcription systems personalized for a language and a specific context are needed. This article describes the implementation of an architecture oriented to automatic speech-to-text transcription applied on Ecuadorian radio broadcasters, using available free tools for performing audio segmentation and transcription. The selected tools were evaluated measuring their performance and facilities for adjusting to the defined architecture. At the end, a Web application was developed and its final performance was compared with IBM Watson speech to text service; the results show that the proposed system improves the accuracy and achieves a Word Error Rate around 10%. The obtained results allow to suggest the use of a free tools set in order to train models oriented to specific speech-to-text transcription scenarios.
KW - Audio content analysis
KW - Automatic audio segmentation
KW - Automatic speech recognition
KW - Python
KW - Speech to text
UR - https://www.scopus.com/pages/publications/85028800153
U2 - 10.1007/978-3-319-66562-7_49
DO - 10.1007/978-3-319-66562-7_49
M3 - Contribución a la conferencia
AN - SCOPUS:85028800153
SN - 9783319665610
T3 - Communications in Computer and Information Science
SP - 695
EP - 709
BT - Advances in Computing - 12th Colombian Conference, CCC 2017, Proceedings
A2 - Solano, Andres
A2 - Ordonez, Hugo
PB - Springer Verlag
T2 - 12th Colombian Conference on Computing, CCC 2017
Y2 - 19 September 2017 through 22 September 2017
ER -