Título |
Using Transformers on Noisy vs. Clean Data for Paraphrase Identification in Mexican Spanish |
Tipo |
Congreso |
Sub-tipo |
Memoria |
Descripción |
2022 Iberian Languages Evaluation Forum, IberLEF 2022 |
Resumen |
Paraphrase identification is relevant for plagiarism detection, question answering, and machine translation among others. In this work, we report a transfer learning approach using transformers to tackle paraphrase identification on noisy vs. clean data in Spanish as our contribution to the PAR-MEX 2022 shared task. We carried out fine-tuning as well as hyperparameters tuning on BERTIN, a model pre-trained on the Spanish portion of a massive multilingual web corpus. We achieved the best performance in the competition (F1 = 0.94) by fine-tuning BERTIN on noisy data and using it to identify paraphrase on clean data. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). |
Observaciones |
CEUR Workshop Proceedings, v. 3202 |
Lugar |
Coruña |
País |
España |
No. de páginas |
|
Vol. / Cap. |
|
Inicio |
2022-09-20 |
Fin |
|
ISBN/ISSN |
|