Resumen |
Open Information Extraction (IE) is the task of extracting relational tuples representing facts from text, with no prior specification of relation, no prespecified vocabulary, or a manually tagged training corpus. Part-of-speech based systems are shown to be competitive with parsing-based systems on this task and work faster for large-scale corpora. Nevertheless, implementation of such a system requires language-specific information. So far, all work has been done for English. We present a relation extraction algorithm for Open IE in Spanish, based on POS-tagged input and semantic constraints. We provide a description of its implementation in an Open IE system for Spanish ExtrHech. We compare its performance with Open IE systems for English, including a comparison on a parallel English-Spanish dataset, and show that the performance is comparable with the stateof- the-art systems, while the system is more robust to noisy input. We give a comparative analysis of errors in extractions for both languages. |