Multiple Access Paths for Digital Collections of Lexicographic Paper Slips

Authors: Toma Tasovac, Snežana Petrović

The paper describes the process of digitizing and annotating some 23,000 lexicographic paper slips compiled by the amateur lexicographer Dimitrije Čemerikić (1882-1960) to document the Serbian dialect from the historic city of Prizren. This previously unpublished dictionary of the Prizren dialect is an important resource not only for dialectologists and linguists, but also for ethnolinguists and ethnologists who are interested in various aspects of popular culture and urban life in the city of Prizren. The alphabetic arrangement of the macrostructure, however, is not conducive to exploratory searches: if users want to find out which dialect word corresponds to a standard Serbian word, or explore a certain type of vocabulary, they need access paths to the dictionary content that go beyond the indexing of the macrostructure. The paper describes an elaborate annotation strategy based on marking up headwords with standardized orthographic alternatives, providing lexical equivalents and assigning semantic fields to entries in order to achieve robust navigability and searchability of the collection without full-text transcription and/or structural data modeling.

Keywords: digitization; dialect dictionaries; navigation; searchability; access paths

Reference: In Kosem, I., Jakubiček, M., Kallas, J., Krek, S. (eds.) Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom. Ljubljana/Brighton: Trojina, Institute for Applied Slovene Studies/Lexical Computing Ltd., pp. 384-396.


Published: 2015