Resumen |
Measuring lexical similarity using WordNet has a long tradition. In the last decade, it has been challenged by distributional methods, and more recently by neural word embedding. In recent years, several larger lexical similarity benchmarks have been introduced, on which word embedding has achieved state-of-the-art results. The success of such methods has eclipsed the use of WordNet for predicting human judgments of lexical similarity. We propose a new set cardinality-based method for measuring lexical similarity, which exploits the WordNet graph, obtaining a word representation, which we called word2set, based on related neighboring words. We show that the features extracted from set cardinalities computed using this word representation, when fed into a support vector regression classifier trained on a dataset of common synonyms and antonyms, produce results competitive with those of word-embedding approaches. On the task of predicting the lexical sentiment polarity, our WordNet set-based representation significantly outperforms the classical measures and achieves the performance of neural embeddings. Although word embedding is still the best approach for these tasks, our method significantly reduces the gap between the results shown by knowledge-based approaches and by distributional representations, without requiring a large training corpus. It is also more effective for less-frequent words. |