Autores
Gelbukh Alexander
Título An Open-Source Lemmatizer for Russian Language based on Tree Regression Models
Tipo Revista
Sub-tipo Indefinido
Descripción Research in Computing Science
Resumen In this article, we consider the problem of supervised morphological analysis using an approach that differs from industry spread analogs. The article describes a new method of lemmatization based on the algorithms of machine learning, in particular, on the algorithms of regression analysis, trained on the open grammatical dictionary of Russian language. Comparison of obtained results was performed with existing alternative applications that are used nowadays for addressing lemmatization problems in NLP problems for Russian language. The proposed method shows some potential for further development as it has comparable quality but uses relatively simple machine learning algorithm and at the same time is not rule based involving no manual work. The source code for our lemmatizer is publicly available
Observaciones
Lugar Ciudad de México
País Mexico
No. de páginas 147-153
Vol. / Cap. v. 149 no. 3
Inicio 2020-03-03
Fin
ISBN/ISSN