Building a CEFR-Labeled Core Vocabulary and Developing a Lexical Resource for Slovenian as a Second and Foreign Language

Matej Klemen; Špela Arhar Holdt; Iztok Kosem; Eva Pori; Polona Gantar; Mihaela Knez

Building a CEFR-Labeled Core Vocabulary and Developing a Lexical Resource for Slovenian as a Second and Foreign Language

Authors

Matej Klemen Author
Špela Arhar Holdt Author
Iztok Kosem Author
Eva Pori Author
Polona Gantar Author
Mihaela Knez Author

Keywords:

Lexicography and CEFR, Slovenian, second and foreign language, textbook corpus, core vocabulary

Abstract

This article introduces two newly available datasets: the KUUS 1.0 corpus and the list Core Vocabulary for Slovenian as L2 1.0. The KUUS 1.0 corpus consists of seventeen textbooks published by the Center for Slovene as a Second and Foreign Language at the University of Ljubljana, and it contains a total of 520,796 words accompanied by various linguistic tags and metadata. Using the KUUS 1.0 corpus, we compiled the list Core Vocabulary for Slovenian as L2 1.0. The list includes 350 words labeled as A1-core, 864 words as A1-larger, 1,451 words as A2, and 2,608 words as B1. The A1 vocabulary was used as pilot data for a project focused on developing a lexical description for learning Slovenian as a second and foreign language. Our methodology involved combining the data from the new datasets with existing, openly available lexical information on modern Slovenian, with the aim of achieving didactic adaptation and maximal reusability of the results.

Downloads

Published

2023-06-29

Issue

Proceedings of the eLex 2023 conference.

Section

Articles

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

How to Cite

Building a CEFR-Labeled Core Vocabulary and Developing a Lexical Resource for Slovenian as a Second and Foreign Language . (2023). Electronic Lexicography in the 21st Century, 664-678. https://elex.link/ojs/index.php/elex/article/view/55

Download Citation