Autores
Ojo Olumide Ebenezer
Adebanji Olaronke Oluwayemisi
Calvo Castro Francisco Hiram
Gelbukh Alexander
Sidorov Grigori
Título Hate and Offensive Content Identification in Indo-Aryan Languages using Transformer-based Models
Tipo Congreso
Sub-tipo Memoria
Descripción 15th Forum for Information Retrieval Evaluation, FIRE 2023
Resumen Open exchange of hate speech, insults, derogatory remarks, and obscenities on social media platforms can undermine objective discourse and facilitate radicalization by spreading propaganda and exposing people to danger. People who have been targeted by these offensive and hateful content often experience physiological effects as a result. In this work, we present our models for detecting hate speech and offensive content in two Indo-Aryan languages submitted to HASOC 2023. Although Gujarati and Sinhala are considered low-resource languages, our models demonstrated commendable accuracy in detecting hate speech after fine-tuning them with language-specific hate speech datasets. Our experiments employed and fine-tuned two transformer models, namely DistilBERT and mBERT, and we show that these transformer models were effective in detecting hate speech in Indo-Aryan texts. mBERT achieved the macro F1-score of 0.6 in the Sinhala text and excelled further with a score of 0.8 in the Gujarati text classification. © 2023 Copyright for this paper by its authors.
Observaciones CEUR Workshop Proceedings, v. 3681
Lugar Goa
País India
No. de páginas 383-392
Vol. / Cap.
Inicio 2023-12-15
Fin 2023-12-18
ISBN/ISSN