From Structured Textual Data to Semantic Linked-data for Georgian Verbal Knowledge
Keywords:
Data transformation, Data validation, Machine learning, Decision tree, Georgian languageAbstract
The Georgian language has a difficult verbal system. To help foreigners learn Georgian, a linked-data base of infected forms of Georgian verbs is being built: KartuVerbs. We use structured textual knowledge developed by Meurer (2007) that has a much broader scope than KartuVerbs. However, accessing its lexicographic data is challenging; the work on its base has stopped; all properties are not systematically present for every verb; some properties, important for us, do not exist. After filtering and reconstructing some properties, KartuVerbs currently contains more than 5 million infected forms related to more than 16 000 verbs; there are more than 80 million links in the base. Response times are acceptable when running on a private machine, thus validating the feasibility of the linked-data approach. There is still a need to validate, correct and expand data. Considering the mass of data, this requires tools. This paper presents a process to transform textual structured knowledge into semantic linked data, applied to Georgian verbal knowledge. The process successively applies improvement tools. A specific one, using decision tree technique, complement occasional missing values. The scripts produced so far are freely available. They can be adapted to other applications to help transform data produced for given objectives into other data suited for different objectives.
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.