SABER

Título	The Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014
Tipo	Congreso
Sub-tipo	Memoria
Descripción	Notebook for PAN at CLEF 2014. CLEF 2014. CLEF2014 Working Notes
Resumen	The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask at PAN 2014 plagiarism detection competition. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme that permits us to keep stopwords without increasing the false positives rate. We introduce a recursive algorithm to extend the matching sentences to maximal length passages. We also introduce a novel filtering method to resolve overlapping plagiarism cases. By the cumulative measure (Plagdet), our approach outperforms the best-performing system of the PAN 2013 competition, and was the best-performing (on the first corpus) and third best-performing (on the second corpus) system according to the official results of the PAN 2014 competition. Our system is publicly available in open-source form.
Observaciones	Drive: The-winning-approach_2014
Lugar	Sheffield
País	Reino Unido
No. de páginas	1004-1011
Vol. / Cap.	1180
Inicio	2014-09-15
Fin
ISBN/ISSN