{"id":413,"date":"2015-07-26T22:15:00","date_gmt":"2015-07-26T22:15:00","guid":{"rendered":"https:\/\/elex.link\/elex2015\/?page_id=413"},"modified":"2015-08-08T17:53:30","modified_gmt":"2015-08-08T17:53:30","slug":"paper-32","status":"publish","type":"page","link":"https:\/\/elex.link\/elex2015\/conference-proceedings\/paper-32\/","title":{"rendered":"paper-32"},"content":{"rendered":"<h3><strong>Predicting corpus example quality via supervised machine learning<\/strong><\/h3>\n<p><strong>Authors:<\/strong> Nikola Ljube\u0161i\u0107, Mario Peronja<\/p>\n<p style=\"text-align: justify;\"><strong>Abstract:<\/strong><br \/>\nIn this paper we present a supervised-learning approach to extracting good dictionary examples from corpora.We train our predictor of quality on a dataset of corpus examples annotated with a four-level ordinal variable, ranging from a very bad to a very good example. Each of the examples is formally described through 23 variables; the dependence of the quality of which is modelled using a regression model. The evaluation of the ranked results for each of the collocations in the annotated dataset shows that we obtain precision on 10 top-ranked examples of ~80% and a precision of ~90% on the three top-ranked examples. Our approach is highly language independent as well, suffering almost no loss on the 10 top-ranked examples and a loss of ~4% on the three highest-ranked examples once the language-dependent and knowledge-source-dependent features are removed.<\/p>\n<p><strong>Keywords:<\/strong> dictionary example; corpus extraction; supervised machine learning<\/p>\n<p><strong>Reference:<\/strong> In Kosem, I., Jakubi\u010dek, M., Kallas, J.,\u00a0Krek, S. (eds.) <em>Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom<\/em>. Ljubljana\/Brighton: Trojina, Institute for Applied Slovene Studies\/Lexical Computing Ltd., pp. 477-485.<\/p>\n<p><strong>URL:<\/strong> <a href=\"https:\/\/elex.link\/elex2015\/proceedings\/eLex_2015_32_Ljubesic+Peronja.pdf\">https:\/\/elex.link\/elex2015\/proceedings\/eLex_2015_32_Ljubesic+Peronja.pdf<\/a><\/p>\n<p><strong>P<\/strong><strong>ublished: <\/strong>2015<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Predicting corpus example quality via supervised machine learning Authors: Nikola Ljube\u0161i\u0107, Mario Peronja Abstract: In this paper we present a supervised-learning approach to extracting good dictionary examples from corpora.We train our predictor of quality on a dataset of corpus examples &hellip; <a class=\"more-link\" href=\"https:\/\/elex.link\/elex2015\/conference-proceedings\/paper-32\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":327,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-413","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/pages\/413","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/comments?post=413"}],"version-history":[{"count":4,"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/pages\/413\/revisions"}],"predecessor-version":[{"id":568,"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/pages\/413\/revisions\/568"}],"up":[{"embeddable":true,"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/pages\/327"}],"wp:attachment":[{"href":"https:\/\/elex.link\/elex2015\/wp-json\/wp\/v2\/media?parent=413"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}