Credits

About

The aim of this project is to make existing FrameNet (FN) resources computationally accessible for multilingual natural language generation and controlled semantic parsing via a shared semantico-syntactic grammar and lexicon API.

We provide a currently bilingual but potentially multilingual FN-based grammar and lexicon library implemented in Grammatical Framework (GF) on top of GF Resource Grammar Library (RGL). The API of the FN-based library represents a shared set of automatically extracted semantico-syntactic verb valence patterns from 66,918 annotated sentences in Berkeley FrameNet (BFN 1.5) and 4,267 sentences in Swedish FrameNet (SweFN, a snapshot taken in December 3, 2014). The concise set of 869 patterns covers 483 shared frames (using BFN frames as interlingua) and 77.5% of sentences evoking the shared frames in both BFN and SweFN (44,645 and 2,596 sentences respectively).

Based on the FN-annotated sentences covered by the shared valence patterns, and the GF RGL type system for verbs, we have extracted 3,432 lexical entries (subcategorized lexical units, LUs) from BFN, and 1,899 entries form SweFN. LUs between BFN and SweFN are not directly aligned, therefore a specific lexicon is generated for each language. However, a partial shared lexicon has been automatically derived on top of the language-specific lexicons, currently providing a mapping between 703 LUs in BFN and 900 LUs in SweFN. The shared lexicon covers 25.1% (11,223) of BFN sentences and 35.8% (928) of SweFN sentences – of the above mentioned sentences which are represented by the shared valence patterns.

All numbers are indicative and a subject to change if more corpus examples, translation equivalents or improved heuristics is provided.

As a side result, a unified method for comparing and mapping semantic and syntactic valence patterns and lexical units across framenets is proposed. Thus, from the perspective of developers of FN-annotated corpora, this can be seen as a tool providing cross-lingual hints on how to improve the coverage.

Publications

Normunds Gruzitis and Dana Dannélls. A multilingual FrameNet-based grammar and lexicon for controlled natural language. Journal of Language Resources and Evaluation, 2015 (preprint)
Dana Dannélls and Normunds Gruzitis. Controlled natural language generation from a multilingual FrameNet-based grammar. In: Controlled Natural Language, volume 8625 of LNCS, pages 155-166. Springer, 2014 (preprint, slides, video)
Dana Dannélls and Normunds Gruzitis. Extracting a bilingual semantic grammar from FrameNet-annotated corpora. In: Proceedings of the 9th International Language Resources and Evaluation Conference (LREC), pages 2466-2473, 2014
Normunds Gruzitis, Peteris Paikens and Guntis Barzdins. FrameNet Resource Grammar Library for GF. In: Controlled Natural Language, volume 7427 of LNCS, pages 121-137. Springer, 2012 (preprint)

Related work

Swedish Constructicon in GF

Credits

This work has been supported by Swedish Research Council under Grant No. 2012-5746 (Reliable Multilingual Digital Communication: Methods and Applications) and by Centre for Language Technology in Gothenburg. The research leading to these results has received funding also from Latvian State Research Programme NexIT.

Main contributors:

Normunds Grūzītis
Dana Dannélls