Autores
Calvo Castro Francisco Hiram
Hernández Castañeda Angel
Título Author identification using latent dirichlet allocation
Tipo Congreso
Sub-tipo Memoria
Descripción 18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
Resumen We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naïve Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages. © Springer Nature Switzerland AG 2018.
Observaciones DOI 10.1007/978-3-319-77116-8_22, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), V. 10762
Lugar Budapest
País Hungria
No. de páginas 303-312
Vol. / Cap. 10762 LNCS
Inicio 2017-04-17
Fin 2017-04-23
ISBN/ISSN 9783319771151