Carole Tiberius & Jesse de Does
Dutch Language Institute, Netherlands
Carole Tiberius is professor of Computational Linguistics at the Leiden University Centre for Linguistics (LUCL) and a senior computational linguist at the Instituut voor de Nederlandse Taal (INT, Dutch Language Institute).
She holds degrees from the Higher Institute for Translators and Interpreters in Antwerp and the University of Nijmegen (MA in Language, Speech and Computer Science) as well as a PhD from the University of Brighton on research into ‘multilingual lexical knowledge representation’.
Her research interests lie in the domains of computational lexicography and corpus linguistics. At the Dutch Language Institute, she is primarily involved in contemporary lexicographic projects such as the Vertaalwoordenschat, an online platform for bilingual dictionaries and Woordcombinaties, a project combining collocations and pattern analysis for Dutch. She is one of the authors of A Frequency Dictionary of Dutch.
Jesse de Does is a senior computational linguist at the Instituut voor de Nederlandse Taal (INT, Dutch Language Institute).
He holds degrees in mathematics and Slavic linguistics. From 1986 to 1990, he worked as a research assistant in the Department of Slavic Language and Literature at Leiden University. In 1995, he obtained his PhD in applied mathematics.
His professional interests are historical language processing, linguistic annotation, corpus retrieval and language resource development. Since 2008, he has been closely involved in various international and national projects, such as IMPACT, SUCCEED, tranScriptorium, CLARIN-NL, CLARIAH and SSHOC-NL. He is currently the SSHOC-NL project leader for the institute and a member of the senior team.
KEYNOTE: LLMs and lexicography at the Dutch Language Institute
Marko Robnik-Šikonja
Faculty of Computer and Information Science, University of Ljubljana, Slovenia
Marko Robnik-Šikonja is a Professor of Computer Science and Informatics at the University of Ljubljana, Faculty of Computer and Information Science, and head of Machine Learning and Language Technology Lab. His research interests span machine learning, data mining, natural language processing, and explainable artificial intelligence. His most notable scientific results concern deep learning, natural language analysis, feature evaluation, ensemble learning, predictive model explanation, information network analysis, and data generation. He is (co)author of over 250 scientific publications cited more than 9,500 times. He has contributed to several national and EU projects and authored several data mining software packages and language resources.
KEYNOTE TITLE: Large language models for lexicography
Currently, large language models (LLMs) are redefining methodological approaches in many scientific areas, including linguistics and lexicography. LLMs are pretrained on huge text corpora by predicting the next tokens and adapted for human interaction with the instruction following datasets. This does not make them immune to hallucinations and biases, requiring a human-in-the-loop approach. In the context of lexicography, LLMs can be used to support several tasks. We will present how the information contained in language databases can be utilized to improve LLMs on lexicographic tasks. Our current methodology is based on knowledge graph extraction, continued pretraining of LLMs, prompt engineering, and semi-automatic evaluation.
Michal Měchura
Lexical Computing and Dublin City University
Michal Měchura is a language technologist with two decades of experience building IT solutions for lexicography, terminology and onomastics. He has worked on projects such as the National Terminology Database for Irish, the Placenames Database of Ireland and the New English–Irish Dictionary. He is the founder of the open-source dictionary writing system Lexonomy and the author of Terminologue, an open-source terminology management platform. Recently, Michal has been chairing the LEXIDMA technical committee in OASIS which has created DMLex, a modern data model for lexicography.
KEYNOTE: We need to talk about data structures in lexicography