Margarita Correia

CELGA-ILTEC, University of Coimbra / University of Lisbon

Margarita Correia has a PhD in Portuguese Linguistics from the University of Lisbon, and a Post-Doc in Computational Lexicography at the Federal University of São Carlos (UFSCar – Brazil). She has been a professor at the Department of General and Romance Languages of the Faculty of Letters of the University of Lisbon since 1990, where she has taught several courses (including Lexicology, Lexicography and Terminology) at undergraduate and graduate levels.

She is a member of the Direction Board of the Centre for General and Applied Linguistics Studies (CELGA-ILTEC, University of Coimbra), where she coordinates the research group Lexicon and Computational Modeling.

She works mainly in Applied Linguistics, with a focus on Lexicography, Terminology, Neology and Language Policy. With José Pedro Ferreira, she directed the projects VOP – Vocabulário Ortográfico do Português [the Spelling Dictionary of Portuguese] (1st and 2nd edition) and Lince – Conversor para a Nova Ortografia [Spelling Converter] (2010), which are official instruments for the implementation of the 1990 spelling reform in Portugal. With José Pedro Ferreira and Gladis Maria de Barcellos Almeida, she coordinated the VOC – Vocabulário Ortográfico Comum da Língua Portuguesa [the Common Spelling Dictionary of the Portuguese Language, Ferreira, Correia, & Almeida (Orgs.) 2017)], under the supervision of the Instituto Internacional da Língua Portuguesa (IILP) [International Institute of the Portuguese Language]. Since 2018, she is the president of the Scientific Board of the IILP.

Matt Kohl


Matt Kohl began his career at the Oxford English Dictionary ( He then continued working at the Oxford University Press in the field of language technology, where he lead the development of LEAP (Lexical Engine and Platform), a platform to store, optimise and deliver lexical data for projects such as Oxford Global Languages. This work also laid foundations for the Oxford dictionaries API program. He has since transitioned into software and knowledge engineering, and is currently helping to build out the data architecture at GeoPhy ( . Matt is the creator of The Right Rhymes (, a hip-hop dictionary based on rap lyrics. He lives and works in London.

Matt Kohl is the winner of the Adam Kilgarriff Prize.

Alexander Geyken

Berlin-Brandenburg Academy of Sciences and the Humanities

The Center for digital lexicography of the German Language:
new perspectives for smart lexicography

The Zentrum für digitale Lexikogaphie der deutschen Sprache (ZDL, Center for digital lexicography of the German Language) aims to provide a comprehensive and empirically reliable description of the German language from its origins to the present. To this end, four German academies in Berlin (BBAW, coordinator), Göttingen (AdGW), Leipzig (SAW), and Mainz (AdWL) have joined forces. The academies have a rich tradition of dictionary projects, encompassing historical as well as modern dictionaries and including the Grimmsches Wörterbuch, the dictionaries of Old High German, Middle High German, Early New High German and the Digital Dictionary (DWDS) of contemporary German. In addition, the center is cooperating with the Leibniz Institute for the German Language (IDS) for neologisms and contemporary text corpora. In order to provide a ubiquitous search interface to these diverse dictionary sources, a considerable amount of integration work will be necessary in the coming years, including work on common formats, lemma lists, as well as cross-linking references from dictionaries to corpora.

Alexander Geyken works at the Berlin-Brandenburg Academy of Sciences and the Humanities (BBAW) since 1999 where he directs the long-term research project “Digital Dictionary of the German Language” (DWDS) as well as the Berlin part of the “Zentrum für digitale Lexikographie der deutschen Sprache” (ZDL). He received his Ph.D. in “Computational Linguistics” at the University of Munich in 1998, and obtained his habilitation (post-doctoral degree) in 2017 in the field of “Linguistics” at the University of Potsdam, where also holds a teaching position since May 2018. His main research interests are computational lexicography, corpus linguistics as well as the use of syntactic and semantic resources for the mining of large textual data.

David Baines

SIL International

SIL’s language data collection

SIL linguists have studied minority languages since 1934. This talk will describe the extent of SIL’s language data has and give a brief description of the history of data collection methods and tools.

The translation of the Bible into many languages represents a multilingual parallel corpus. Complete translations of the New and Old Testaments exist in 690 languages. New Testament translations exist in an additional 1550 languages. SIL is considering how to provide greater access to academic linguists to those translations for which they hold the copyright.

SIL has also published lexicons for 660 languages and vocabulary lists in an additional 200 languages and is considering possibilities for sharing that data more widely.

SIL’s FieldWorks software has been used as a tool for managing lexical data and has been used to create many of the more recent dictionaries.

Keywords: multilingual corpus, lexicon, FieldWorks, Rapid word collection.

Related sites

FieldWorks: Open-source dictionary editing software.

FLEx Tools: Programs for manipulating FLEx data.

LanguageDepot: FieldWorks data hosting.

Language Forge: Online dictionary creation and collaboration.

Rapid Word Collection: Create dictionaries in minority languages.

David Baines began working with SIL in the Philippines in 2000 and later worked with SIL in Chad. He joined the Language Software Development department of SIL International in 2007 as a software tester for FieldWorks. Many of his roles at SIL have included liaison between linguists and developers. For the past couple of years he has been importing dictionaries from Shoebox/Toolbox into FieldWorks prior to publication on Webonary and as mobile apps. Part of his current role is to design interactions between translators’ software and FieldWorks so that the translators can make the fullest use of linguistic data.  He has a particular interest in finding beneficial partnerships between SIL and other individuals or organisations and has encouraged SIL International to apply for Observer status with ELEXIS.

Maciej Piasecki

Wroclaw University of Technology

Maciej Piasecki is an Assistant Professor at the Wroclaw University of Science and Technology (Department of Computational Intelligence, Faculty of Computer Science and Management), Poland, the Polish National Coordinator of CLARIN ( (European language technology research infrastructure), the Chair of CLARIN ERIC National Coordinators Forum (since 04.2018) and the coordinator of CLARIN-PL ( (Polish consortium, a part of CLARIN). He is the leader of G4.19 Research Group: Computational Linguistics and Language Technology ( – one of the largest Polish research teams in these areas. The main mission of G4.19 is development of open robust language technology for Polish, both in monolingual and bilingual setting.
Since 2008 he has been or is a coordinator of 14 large projects or their work packages (national and funded from EU structural funds, including 3 projects in cooperation with companies) on language technology and its different applications . He is also a member of the DARIAH-PL Board ( and Global WordNet Association Board.
His main research areas include Computational Linguistics, Natural Language Engineering and Human Language Technology. The main research topics are: automated extraction of the lexico-semantic knowledge from text, semi-automated wordnet expansion, Distributional Semantics, relational lexical semantics and shallow semantic processing of text. He has also been working on morpho-syntactic processing of Polish (a co-author of the first publicly available morpho-syntactic tagger of Polish, with many applications), Information Extraction, Question Answering, Formal Semantics and Machine Translation. He has been the leader of the Polish wordnet project: plWordNet ( – the largest language resource of this type in the world.