Autores
Sánchez Pérez Miguel Ángel
Gelbukh Alexander
Sidorov Grigori
Título Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition
Tipo Revista
Sub-tipo SCOPUS
Descripción Lecture Notes in Computer Science
Resumen The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2014, which resulted in the bestperforming system at the PAN 2014 competition and outperforms the best-performing system of the PAN 2013 competition by the cumulative evaluation measure Plagdet. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme that permits us to consider stopwords without increasing the rate of false positives. We introduce a recursive algorithm to extend the ranges of matching sentences to maximal length passages. We also introduce a novel filtering method to resolve overlapping plagiarism cases. Our system is available as open source.
Observaciones http://dx.doi.org/10.1007/978-3-319-24027-5_42
Lugar
País Alemania
No. de páginas 402–413
Vol. / Cap. Vol. 9283
Inicio 2015-11-20
Fin
ISBN/ISSN