Autores
Gelbukh Alexander
Tamayo Herrera Antonio Jesús
Título Using Transformers on Noisy vs. Clean Data for Paraphrase Identification in Mexican Spanish
Tipo Congreso
Sub-tipo Memoria
Descripción 2022 Iberian Languages Evaluation Forum, IberLEF 2022
Resumen Paraphrase identification is relevant for plagiarism detection, question answering, and machine translation among others. In this work, we report a transfer learning approach using transformers to tackle paraphrase identification on noisy vs. clean data in Spanish as our contribution to the PAR-MEX 2022 shared task. We carried out fine-tuning as well as hyperparameters tuning on BERTIN, a model pre-trained on the Spanish portion of a massive multilingual web corpus. We achieved the best performance in the competition (F1 = 0.94) by fine-tuning BERTIN on noisy data and using it to identify paraphrase on clean data. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Observaciones CEUR Workshop Proceedings, v. 3202
Lugar Coruña
País España
No. de páginas
Vol. / Cap.
Inicio 2022-09-20
Fin
ISBN/ISSN