Autores
Meque Abdul Gafar Manuel
Balouchzahi Fazlourrahman
Sidorov Grigori
Gelbukh Alexander
Título Mexican Spanish Paraphrase Identification using Data Augmentation
Tipo Congreso
Sub-tipo Memoria
Descripción 2022 Iberian Languages Evaluation Forum, IberLEF 2022
Resumen Reorganizing words in a passage using synonyms and different words without changing the main message delivered in the original sentence is called paraphrasing. Simplifying, clarification or taking quotes, etc. In this paper, we address a Paraphrase Identification model for Mexican Spanish text pairs. A data augmentation step was done using Google Translate API, and then three different similarity algorithms, namely: Jaccard, Cosine, and Spacy similarity were used to create a similarity vector for each text pair. The paraphrase identification task was modeled as binary classification of text pairs into two classes, namely: Paraphrases and Not-Paraphrases. The proposed methodology with voting classifier of three machine learning classifiers obtained a F1-score of 0.8754 for paraphrases category. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Observaciones CEUR Workshop Proceedings, v. 3202
Lugar Coruña
País España
No. de páginas
Vol. / Cap.
Inicio 2022-09-20
Fin
ISBN/ISSN