Título |
Author identification using latent dirichlet allocation |
Tipo |
Congreso |
Sub-tipo |
Memoria |
Descripción |
18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017 |
Resumen |
We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naïve Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages. © Springer Nature Switzerland AG 2018. |
Observaciones |
DOI 10.1007/978-3-319-77116-8_22, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), V. 10762 |
Lugar |
Budapest |
País |
Hungria |
No. de páginas |
303-312 |
Vol. / Cap. |
10762 LNCS |
Inicio |
2017-04-17 |
Fin |
2017-04-23 |
ISBN/ISSN |
9783319771151 |