Autores
Sidorov Grigori
Posadas Durán Juan Pablo Francisco
Jiménez Salazar Héctor
Chanona Hernández Liliana
Título A New Combined Lexical and Statistical based Sentence Level Alignment Algorithm for Parallel Texts
Tipo Revista
Sub-tipo CONACYT
Descripción INTERNATIONAL JOURNAL OF COMPUTATIONAL LINGUISTICS AND APPLICATIONS
Resumen Parallel texts alignment is an active research area in Natural Language Processing field. In this paper, we propose a method for sentence alignment of parallel texts that is based both on lexical and statistical information. The alignment procedure uses dynamic programming technique. We made our experiments for Spanish and English texts. We use lexical information from bilingual Spanish-English dictionary, as well as the sentence length measured in words and in characters. The proposed method was tested on a corpus of fiction texts, where the frequency of multiple alignments, omissions and insertions is higher than in other types of texts. We obtained better results than the standard Vanilla aligner system that uses a purely statistical approach.
Observaciones
Lugar
País
No. de páginas 257-263
Vol. / Cap. Vol. 2, No. 1-2
Inicio 2011-12-01
Fin
ISBN/ISSN