SABER

Autores
Amjad - Maaz
Sidorov Grigori

Título	Data augmentation using machine translation for fake news detection in the Urdu language
Tipo	Congreso
Sub-tipo	Memoria
Descripción	12th International Conference on Language Resources and Evaluation, LREC 2020
Resumen	The task of fake news detection is to distinguish legitimate news articles that describe real facts from those which convey deceiving and fictitious information. As the fake news phenomenon is omnipresent across all languages, it is crucial to be able to efficiently solve this problem for languages other than English. A common approach to this task is supervised classification using features of various complexity. Yet supervised machine learning requires substantial amount of annotated data. For English and a small number of other languages, annotated data availability is much higher, whereas for the vast majority of languages, it is almost scarce. We investigate whether machine translation at its present state could be successfully used as an automated technique for annotated corpora creation and augmentation for fake news detection focusing on the English-Urdu language pair. We train a fake news classifier for Urdu on (1) the manually annotated dataset originally in Urdu and (2) the machine-translated version of an existing annotated fake news dataset originally in English. We show that at the present state of machine translation quality for the English-Urdu language pair, the fully automated data augmentation through machine translation did not provide improvement for fake news detection in Urdu. © European Language Resources Association (ELRA), licensed under CC-BY-NC
Observaciones
Lugar	Marselle
País	Francia
No. de páginas	2537-2542
Vol. / Cap.
Inicio	2020-05-11
Fin	2020-05-16
ISBN/ISSN	9791095546344