Building a CEFR-Labeled Core Vocabulary and Developing a Lexical Resource for Slovenian as a Second and Foreign Language


  • Matej Klemen Author
  • Špela Arhar Holdt Author
  • Iztok Kosem Author
  • Eva Pori Author
  • Polona Gantar Author
  • Mihaela Knez Author


Lexicography and CEFR, Slovenian, second and foreign language, textbook corpus, core vocabulary


This article introduces two newly available datasets: the KUUS 1.0 corpus and the list Core Vocabulary for Slovenian as L2 1.0. The KUUS 1.0 corpus consists of seventeen textbooks published by the Center for Slovene as a Second and Foreign Language at the University of Ljubljana, and it contains a total of 520,796 words accompanied by various linguistic tags and metadata. Using the KUUS 1.0 corpus, we compiled the list Core Vocabulary for Slovenian as L2 1.0. The list includes 350 words labeled as A1-core, 864 words as A1-larger, 1,451 words as A2, and 2,608 words as B1. The A1 vocabulary was used as pilot data for a project focused on developing a lexical description for learning Slovenian as a second and foreign language. Our methodology involved combining the data from the new datasets with existing, openly available lexical information on modern Slovenian, with the aim of achieving didactic adaptation and maximal reusability of the results.


