An Unsupervised Approach to Characterize the Adjectival Microstructure in a Hungarian Monolingual Explanatory Dictionary
Keywords:
automatic sense induction, monolingual lexicography, polysemy, unsupervised graph-based approach, adjectivesAbstract
The present paper describes the initial phase of a collaboration between Hungarian lexicographers and computational linguists aimed at compiling the new version of The Explanatory Dictionary of the Hungarian Language. This research thread focuses on the automatic sense induction of Hungarian adjectives in attributive positions, and their salient nominal contexts, with a particular emphasis on polysemies. The proposed methodology is intended to facilitate lexicographers’ work in characterizing both the micro- and macrostructure of adjectives in a monolingual setting. A corpus-driven, unsupervised graph-based approach was employed, which, as per our expectations, could potentially reduce the reliance on human intuition, especially in the ambiguous domain of polysemic sense distinctions. Initially, distributional criteria for meaning distinction were introduced, followed by the description of the employed algorithm. The algorithm models adjectival semantics using two unique subgraphs: connected graph components are used to model adjectival semantic domains, while maximally connected subgraphs, so called cliques, model polysemies. Automatically induced meaning distinctions were validated using salient nominal context candidates extracted from corpus data. We expect that while connected graph components aid in characterizing the adjectival macrostructure, cliques provide lexicographers with useful insights for establishing the adjectival microstructure. These hypotheses were also tested: we investigated the extent to which the proposed framework can assist expert lexicographers during the dictionary compilation process by comparing a sample of our automatically obtained results to the previous version of The Explanatory Dictionary of the Hungarian Language.
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.