Kris Heylen and Vincent Vandeghinste
Dutch Language Institute (INT), Netherlands
A White Paper on the Future of Academic Lexicography
Academic, evidence-based lexicography has a long tradition of analyzing large amounts of language data in a scientific way in order to compile concise, high-quality knowledge about words and their usage with an eye to serving the entire language community. However, lexicography increasingly faces challenges with respect to:
- its role in society, science and the knowledge economy,
- the scalability of both the analysis and production process, and
- the customizability and accessibility of its content for a diverse audience and for integration in new IT applications.
The Lorentz workshop on the “Future of Academic Lexicography” (Leiden 4-9 November 2019) brought together lexicographers and experts from neighboring disciplines like Data Analytics, Artificial Intelligence, Citizen Science, Human-Computer Interaction and Sociology to explore how each of these challenges can be tackled in a multidisciplinary way to strengthen the position of academic lexicography as a locus for scientific research with direct relevance for, and impact on, society. The conclusions and recommendations of the workshop were summarized in a White Paper and will be presented at the start of the panel session and will form the point of departure for the discussion which will be moderated by Kris Heylen and Vincent Vandeghinste.
Dr. Kris Heylen is a senior researcher at the Instituut voor de Nederlandse Taal (Dutch Language Institute) where he supports the ideation, proposal writing and follow-up of research projects in the domains of computational lexicology and applied linguistics. He holds a Master in Linguistics, a Master in Artificial Intelligence and a PhD in Linguistics from the KU Leuven (University of Leuven). He is also a research fellow at the KU Leuven research group Quantitative Lexicology and Variational Linguistics (QLVL) and specializes in the corpus-based, statistical modeling of lexical semantics and lexical variation.
Dr. Vincent Vandeghinste is a senior researcher at the Instituut voor de Nederlandse Taal (INT, Dutch Language Institute), where he coordinates the tasks on Contemporary Dutch, and is working on topics such as CLARIN and other linguistic infrastructure, treebanking, machine translation, and language technology for inclusion. He has a PhD in Linguistics and a Master’s in (Experimental) Psychology. He is also affiliated with the Centre for Computational Linguistics and Leuven.AI at the University of Leuven, where he is involved in courses on Machine Translation, Computational Linguistics, Computational Lexicography and Language Engineering Applications.
Lexical Computing / Masaryk University, Czechia
Scalability of maths for lexicography
Many mathematical methods and formulas are used in lexicography to find interesting or important information about words, contexts, and other parts of natural language. The presentation will show several examples of using maths in terms of scalability and highlight the importance of scalability in practical lexicography.
Pavel Rychlý is a computer scientist and researcher in natural language processing. He works as an Associate professor at
Faculty of Informatics, Masaryk University (Brno, Czech Republic) and is the head of the Natural Language Processing Centre there. Since his PhD on indexing text corpora, he has turned to efficient large-scale text processing. Pavel is the main software architect of the Sketch Engine and the original author of many of its components, mainly the Manatee corpus indexing system.
Pilar León Araúz
University of Granada, Spain
Designing and populating specialized knowledge resources: EcoLexicon and by-products
The design and population of specialized knowledge resources need a dynamic framework for knowledge extraction and representation. Frame-based term analysis and concept modelling allow for the representation of specialized knowledge in a meaningful way for target users, who need to accommodate specialized notions into previously stored knowledge structures. In EcoLexicon, a terminological knowledge base on the environmental domain, this translates into extracting, organizing and describing specialized concepts and terms in a wide range of different formats. In this presentation, EcoLexicon will be presented together with the methods employed in its construction as well as the by-products that came along, namely the EcoLexicon English corpus, the EcoLexicon Semantic Sketch Grammar and EcoLexiCAT, a terminology-enhanced translation tool.
Pilar León-Araúz is a lecturer and researcher in terminology, lexicography, corpus linguistics and translation technologies at the University of Granada (Spain). She holds degrees from the University of Granada, Northumbria University (UK) and Université de Provence (France). Her research focuses on knowledge extraction and representation for the development of terminological resources using corpus-based techniques. She is also the winner of the Adam Kilgarriff Prize 2020.