Autores
Balouchzahi Fazlourrahman
Sidorov Grigori
Gelbukh Alexander
Título A comparative study of syllables and character level N-grams for Dravidian multi-script and code-mixed offensive language identification
Tipo Libro
Sub-tipo JCR
Descripción Journal of Intelligent and Fuzzy Systems
Resumen Curfews and lockdowns around the world in the Covid-19 era have increased the usage of the internet drastically and accordingly the amount of data shared on social media. In addition to using social media for sharing useful information, some miscreants are using the power of social media to spread hate speech and offensive content. Filtering the offensive language content manually is a laborious task due to the huge volume of data. Further, rapid developments in hardware and software technology have provided opportunities for users to post their comments not only in English but also in their native language scripts. However, based on the ease of Roman script usage, social media users specifically in multilingual countries like India, prefer to comment in code-mixed and multi-script texts. The typical systems that are employed to process and analyze monolingual texts are usually not appropriate for these kinds of texts. Further, as these texts do not adhere to the rules and regulations of any language to frame the words and sentences, the complexity of analyzing such texts increases. The novelty of the present study is to address the Offensive Language Identification (OLI) task in code-mixed and multi-script texts, this paper proposes to use relevant syllable and character n-grams features to train Machine Learning (ML) classifiers. The performance of the proposed models is evaluated on three Dravidian language pairs, namely: Malayalam-English, Tamil-English, and Kannada-English. The performances of ML classifiers prove the effectiveness of syllable and character n-grams features for code-mixed and multi-script texts analysis.
Observaciones DOI 10.3233/JIFS-212872
Lugar Amsterdam
País Paises Bajos
No. de páginas 6995-7005
Vol. / Cap. v. 43 no. 6
Inicio 2022-11-11
Fin
ISBN/ISSN