From experiments to an application: the first prototype of an adjective detector for Estonian

Authors

  • Geda Paulsen Author
  • Ahti Lohk Author
  • Maria Tuulik Author
  • Ene Vainik Author

Keywords:

language technology, lexicography, corpus linguistics, adjective, the Estonian language

Abstract

In this study, we discuss the process of developing a multi-parameter application – the adjective similarity calculator (ASC) – that determines the relative adjectivity of a word or a word form. The tool relates the statistical summary of a word (form)’s corpus behaviour to the most typical and central aspects of the Estonian adjective: the adjectival corpus profile. To establish this profile, we use close-context patterns characterising adjectives and detectable in the corpus (see the experiments in Tuulik et al. 2022, Paulsen et al. 2022, and Vainik et al., 2023). The first prototype of the ASC will be evaluated based on clear cases of adjectives and PoS representatives overlapping with adjectival properties, but also based on words representing more distant classes. The main purpose of the application is to improve lexicographic work in categorisation procedures of the partly overlapping lexical categories to the adjective, particularly in such ambiguous cases as adjectivised participles, nouns and adverbs.

Author Biography

  • Geda Paulsen

    In this study, we discuss the process of developing a multi-parameter application – the adjective similarity calculator (ASC) – that determines the relative adjectivity of a word or a word form. The tool relates the statistical summary of a word (form)’s corpus behaviour to the most typical and central aspects of the Estonian adjective: the adjectival corpus profile. To establish this profile, we use close-context patterns characterising adjectives and detectable in the corpus (see the experiments in Tuulik et al. 2022, Paulsen et al. 2022, and Vainik et al., 2023). The first prototype of the ASC will be evaluated based on clear cases of adjectives and PoS representatives overlapping with adjectival properties, but also based on words representing more distant classes. The main purpose of the application is to improve lexicographic work in categorisation procedures of the partly overlapping lexical categories to the adjective, particularly in such ambiguous cases as adjectivised participles, nouns and adverbs. 

Downloads

Published

2023-06-29