Resumen |
In this paper we propose a method for automatic author clustering called Document Authoring Link Retriever, DALIR. Documents are represented using Doc2Vec, experimenting with several parameters; afterwards, vectors are clustered (or linked together) using K-means and Hierarchical Agglomerative Clustering. We experimented with different vector representation sizes, different fixed number of clusters, and clustering methods. We evaluated our method on the author clustering task of PAN @ CLEF 2017. We used the BCubed F-score evaluation scheme of this task, being able to overcome some of the reported results from the first places of this challenge, although our method requires to manually establish a number of clusters a priori. © 2020, Springer Nature Switzerland AG. |