Resumen |
In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function specifies a certain universal semantic concept found in any natural language. Knowledge of collocation and its semantic content is important for natural language processing, as collocation comprises the restrictions on how words can be used together. We experimented with word2vec embeddings and six supervised machine learning methods most commonly used in a wide range of natural language processing tasks. Our objective was to study the ability of word2vec embeddings to represent the context of collocations in a way that could discriminate among lexical functions. A difference from previous work with word embeddings is that we trained word2vec on a lemmatized corpus after stopwords elimination, supposing that such vectors would capture a more accurate semantic characterization. The experiments were performed on a collection of 1,131 Excelsior newspaper issues. As the experimental results showed, word2vec representation of collocations outperformed the classical bag-of-words context representation implemented in a vector space model and fed into the same supervised learning methods. |