Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts

Guerreiro, N. M.; Rei, R.; Batista, F.

doi:10.1016/j.eswa.2021.115740

Utilize este identificador para referenciar este registo: http://hdl.handle.net/10071/23064

Autoria:	Guerreiro, N. M. Rei, R. Batista, F.
Data:	2021
Título próprio:	Towards better subtitles: A multilingual approach for punctuation restoration of speech transcripts
Volume:	186
ISSN:	0957-4174
DOI (Digital Object Identifier):	10.1016/j.eswa.2021.115740
Palavras-chave:	Punctuation marks Intelligent subtitles Pre-trained embeddings Speech transcripts Sentence boundaries Multilingual embeddings
Resumo:	This paper proposes a flexible approach for punctuation prediction that can be used to produce state-of-the-art results in a multilingual scenario. We have performed experiments using transcripts of TED Talks from the IWSLT 2017 and IWSLT 2011 evaluation campaigns. Our experiments show that the recognition errors of the ASR output degrade the performance of our models, in line with related literature. Our monolingual models perform consistently in Human-edited transcripts of German, Dutch, Portuguese and Romanian, suggesting that commas may be more difficult to predict than periods, using pre-trained contextual models. We have trained a single multilingual model that predicts punctuation in multiple languages that achieves results comparable with the ones achieved by monolingual models, revealing evidence of the potential of using a single multilingual model to solve the task for multiple languages. Then, we argue that usage of current punctuation systems in the literature are implicitly dependent on correct segmentation of ASR outputs for they rely on positional information to solve the punctuation task. This is too big of a requirement for use in a real life application. Through several experiments, we show that our method to train and test models is more robust to different segmentation. These contributions are of particular importance in our multilingual pipeline, since they avoid training a different model for each of the involved languages, and they guarantee that the model will be more robust to incorrect segmentation of the ASR outputs in comparison with other methods in the literature. To the best of our knowledge, we report the first experiments using a single multilingual model for punctuation restoration in multiple languages.
Arbitragem científica:	yes
Acesso:	Acesso Aberto
Aparece nas coleções:	CTI-RI - Artigos em revistas científicas internacionais com arbitragem científica

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
article_82888.pdf	Versão Aceite	439,95 kB	Adobe PDF	Ver/Abrir

Mostrar registo em formato completo Visualizar estatísticas