Discovering hidden collocations in a bilingual Spanish–English dictionary

Authors: Margarita Alonso Ramos

This paper addresses the problem of how to exploit the collocational information included in an online Spanish–English dictionary. Even though collocations are not identified as such in this dictionary, abundant collocational information is used as a means of distinguishing senses. Given that this information is structured in XML markup, the conversion into a bilingual collocation database seems viable in order to obtain the germ of a first Spanish–English collocation dictionary. The concept of collocation used here comes from the Explanatory and Combinatorial Lexicology (Mel’čuk, 2012). In this framework, collocations are understood as recurrent phrases composed of two lexical units, one of which, the base, is selected according to its meaning, while the selection of the other, the collocate, is determined by the base. The methodology I propose consists of reorganizing the links between words in such a way that the bilingual collocational correspondence is included in the entry for the base. The lexical tool obtained as a result of this reorganization could be exploited for different applications in natural language processing, ranging from machine translation to computer assisted language learning systems.

Keywords: collocations; bilingual dictionary; reusability of lexical resources

Reference: In Kosem, I., Jakubiček, M., Kallas, J., Krek, S. (eds.) Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom. Ljubljana/Brighton: Trojina, Institute for Applied Slovene Studies/Lexical Computing Ltd., pp. 170-185.


Published: 2015