Autores
Tonja Atnafu Lambebo
Belay Tadesse Destaw
Yigezu Mesay Gemeda
Kolesnikova Olga
Título EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation
Tipo Congreso
Sub-tipo Memoria
Descripción 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Resumen Large language models (LLMs) have gained popularity recently due to their outstanding performance in various downstream Natural Language Processing (NLP) tasks. However, low-resource languages are still lagging behind current state-of-the-art (SOTA) developments in the field of NLP due to insufficient resources to train LLMs. Ethiopian languages exhibit remarkable linguistic diversity, encompassing a wide array of scripts, and are imbued with profound religious and cultural significance. This paper introduces EthioLLM - multilingual large language models for five Ethiopian languages (Amharic, Ge'ez, Afan Oromo, Somali, and Tigrinya) and English, and Ethiobenchmark - a new benchmark dataset for various downstream NLP tasks. We evaluate the performance of these models across five downstream NLP tasks. We open-source our multilingual language models, new benchmark datasets for various downstream tasks, and task-specific fine-tuned language models and discuss the performance of the models. Our dataset and models are available at the EthioNLP HuggingFace repository. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Observaciones
Lugar Hybrid, Torino
País Italia
No. de páginas 6341-6352
Vol. / Cap.
Inicio 2024-05-20
Fin 2024-05-25
ISBN/ISSN 9782493814104