From mouth to keyboard: the place of non-canonical written and spoken structures in lexicography

Authors: Ana Zwitter Vitez, Darja Fišer

As user-generated content is on the rise both in terms of volume and importance, the long established relation between spoken and written communication needs to be re-examined in lexicography. This is the aim of this paper, in which we perform a corpus-based analysis of typical non-canonical words in spoken and computer-mediated communication in Slovene. The results show that the spoken and the Twitter corpus contain a similar proportion of non-standard pronunciation/spelling variants, interaction words and informal lexemes. On the opposite end of the spectrum are news comments which contain a higher proportion of nouns and a smaller proportion of non-canonical words. The presented study brings a language-independent methodology of identifying typical elements of spoken and written informal texts.

Keywords: lexicography; non-canonical language; computer-mediated communication; spoken language

Reference: In Kosem, I., Jakubiček, M., Kallas, J., Krek, S. (eds.) Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom. Ljubljana/Brighton: Trojina, Institute for Applied Slovene Studies/Lexical Computing Ltd., pp. 250-267.


Published: 2015