Word sense induction for (French) verb valency discovery

Authors

  • Naïma Hassert Author
  • François Lareau Author

Abstract

We explore the use of Transformers in word sense induction for the automatic construction of a valency dictionary of French verbs. To account for the way the arguments of a verb change depending on its sense, this type of dictionary must distinguish at least the main senses of a lemma. However, constructing such a resource manually is very costly and requires highly trained staff. That is why one important subtask in the construction of this resource is to automatically identify the polysemy of the verbs. For each of the 2,000 most frequent French verbs, we extract the word embeddings of 20,000 of their occurrences in context found with Sketch Engine, and we cluster those embeddings to find the different senses of each verb. In order to identify the language model and clustering algorithm most suited to our task, we extract the word embeddings of the sentences in the FrenchSemEval evaluation dataset with one language-specific model, CamemBERT, and two multilingual models, XLM-RoBERTa and T5. These vectors are then clustered with three different algorithms that do not require a predetermined number of clusters: Affinity Propagation, Agglomerative Clustering and HDBSCAN. Our experiments confirm the potential of unsupervised methods to identify verb senses, and indicate that monolingual language models are better than multilingual ones for word sense induction tasks involving a single language.

Downloads

Published

2023-06-29