SABER

Autores
Calvo Castro Francisco Hiram
Hernández Castañeda Angel

Título	Author identification using latent dirichlet allocation
Tipo	Congreso
Sub-tipo	Memoria
Descripción	18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
Resumen	We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naïve Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages. © Springer Nature Switzerland AG 2018.
Observaciones	DOI 10.1007/978-3-319-77116-8_22, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), V. 10762
Lugar	Budapest
País	Hungria
No. de páginas	303-312
Vol. / Cap.	10762 LNCS
Inicio	2017-04-17
Fin	2017-04-23
ISBN/ISSN	9783319771151