From Structured Textual Data to Semantic Linked-data for Georgian Verbal Knowledge

Authors

  • Archil Elizbarashvili Author
  • Mireille Ducassé Author
  • Manana Khachidze Author
  • Magda Tsintsadze Author

Keywords:

Data transformation, Data validation, Machine learning, Decision tree, Georgian language

Abstract

The Georgian language has a difficult verbal system. To help foreigners learn Georgian, a linked-data base of infected forms of Georgian verbs is being built: KartuVerbs. We use structured textual knowledge developed by Meurer (2007) that has a much broader scope than KartuVerbs. However, accessing its lexicographic data is challenging; the work on its base has stopped; all properties are not systematically present for every verb; some properties, important for us, do not exist. After filtering and reconstructing some properties, KartuVerbs currently contains more than 5 million infected forms related to more than 16 000 verbs; there are more than 80 million links in the base. Response times are acceptable when running on a private machine, thus validating the feasibility of the linked-data approach. There is still a need to validate, correct and expand data. Considering the mass of data, this requires tools. This paper presents a process to transform textual structured knowledge into semantic linked data, applied to Georgian verbal knowledge. The process successively applies improvement tools. A specific one, using decision tree technique, complement occasional missing values. The scripts produced so far are freely available. They can be adapted to other applications to help transform data produced for given objectives into other data suited for different objectives.

Downloads

Published

2023-06-29