REMU Retreat 2015

Dates: 23-24 April 2015
Venue: Stenungsbaden Yacht Club

Participants

Aarne Ranta
~~Gerardo Schneider~~
Koen Claessen
Ramona Enache
Normunds Gruzitis
Prasanth Kolachina
Inari Listenmaa
John Camilleri
Thomas Hallgren
Maria-Emilia Cambronero

Program

Thursday

8:20 Meet at Nils Ericsson terminal
8:30–9:09 Bus (Orust Express mot Henån) to Stenungsön.
9:15 Fika
10:30 Session 1
- Aarne
- Prasanth
12:00 Lunch & walk
14:00 Session 2
- Normunds
- Maria-Emilia
- John
16:00 Fika & check in
16:30 Session X1
- GF Summer School Planning
- Open discussion, groupwork, private meetings
Spa time
20:00 Dinner

Friday

9:00 Session 3
- Koen
- Inari
10:20 Fika & check out
10:40 Session 4
- Ramona
- Thomas
12:00 Summary meeting
12:30 Lunch & walk
14:30 Session X2
- Open discussion, groupwork, private meetings
15:30 Fika
16:20–17:03 Bus (Orust Express mot Göteborg) to Nils Ericsson terminal

Format

Everyone gets a slot of up to 40 minutes for combined talk and discussion.
Talks should not be conference-level research presentations; people should talk about their work (within REMU) so far, including difficulties, disappointments, unexplored ideas, future plans etc.
Everyone attends all talks and discussions.
The talks are interesting and accessible to everyone in the retreat.
Informative abstracts are published a week before the retreat.
Everyone reads the abstracts before the retreat.

Abstracts

Deadline for submitting abstracts is Thursday 16th April. Titles are optional! Please send them to John or edit this page directly (see bottom).

Aarne: Is GF Translation Competitive? (Slides)

GF translation was originally targeted to controlled languages. There is very little competition in this area, since the main stream of machine translation is wide-coverage tasks and the main stream of controlled languages is monolingual. In the last couple of years, however, GF translation has extended to wide-coverage translation, and the question arises how it compares to other approaches, Statistical MT in particular.

The most obvious way to assess competitiveness is to participate in competitions. We are preparing for two competitions at the moment, both belonging to the EMNLP conference organized in Lisbon in September:

WMT 2015 translation task English-Finnish, https://groups.google.com/forum/#!topic/wmt-tasks/LxyPH6r2psQ with Prasanth
DiscoMT 2015 shared task on pronoun translation English-French, https://www.idiap.ch/workshop/DiscoMT/shared-task with Liza Zimina (U Tampere)

The talk will explain the objectives of these competitions and how GF can be used in them.

In the English-Finnish task, the main idea is to exploit GF’s ability to

bridge the gap between extremely unrelated languages
deal with rich morphology

We are building a hybrid system in which GF translation results are post-filtered by a statistical n-gram model for the target language.

In the pronoun translation task, the main idea is to use GF abstract syntax trees to identify the possible interpretations of pronouns so that we can pick the right gender in the target language. For instance:

I see a house. It is green. → Je vois une maison. Elle est verte.

I see a tree. It is green. → Je vois un arbre. Il est vert.

Koen: Grammar Analysis and Synthesis using SAT

I will report on my (successful) attempts to do symbolic analysis of grammars using a SAT-solver. As well as my (partially successful) attempts on automatic synthesis of grammars from examples.

Ramona: Is there life after PhD studies ? Highlights of my time in REMU

The presentation is an overview of some of the projects I was involved during my postdoc time. It will feature a high-level presentation of the research I've been involved and my experience with research and innovation.

My work in REMU included continuing two directions from the MOLTO project:

a multilingual interface for Cultural Heritage IR system
representing multiword expressions in GF and extracting them from phrase tables

In addition to this, I have also worked on formalizing and extracting ambiguities from GF grammars, together with Koen Claessen.

Recently, I have started a collaboration with Brian Davis and Hazem Safwat from the Insight Centre for Data Analytics, National University of Ireland and Normunds Gruzitis. The purpose is to develop a CNL for e-government policies, using GF and Attempto and a method to map existing policies to their CNL representation.

Another direction of my work has been commercialization of research. I will provide more insights on my work to build an app for Maritime communication, the Innovation Office grant that funded it and my participation in the national program Mentor4Research, where I was a finalist.

Normunds: Implementing semantic frames and constructions in GF (Slides)

Berkeley FrameNet is a lexico-semantic resource based on the theory of frame semantics. It is being exploited in a range of NLP application areas (e.g. information extraction) and has inspired the development of framenets for other languages including Swedish.

FrameNet Constructicon is an emerging complementary resource to FrameNet. Just as lexicon is a collection of lexical units (word-meaning pairs), so a constructicon is a collection of lexico-syntactic constructions (form-meaning pairs at varying levels of complexity and abstraction). It is in between the syntax and lexicon.

In this presentation, I will briefly summarize my work (together with Dana Dannélls) on the automatic extraction and generation of a computational bilingual English-Swedish FrameNet-based grammar and lexicon in GF. The approach leverages existing FrameNet-annotated corpora to extract a set of cross-lingual semantico-syntactic valence patterns. The resulting grammar is a step towards providing a frame semantic abstraction layer, a general-purpose semantic API, over the interlingual syntactic API provided by GF Resource Grammar Library.

I will continue with an introduction and some details on the currently ongoing work on the formalization and implementation of Swedish Constructicon in GF, based on the SweCxn resource that is being developed in parallel at Språkbanken. To some extent, this work follows a similar methodology that was used for FrameNet. However, many new issues and challenges have to be addressed.

Prasanth: From GF Abstract Syntax to Dependencies

Universal Dependency Annotation project is an effort at harmonizing typological differences across languages through a uniform annotation, much like the shared abstract syntax used in the Resource Grammar Library (RGL). The project annotates corpora from different languages using shared information across languages and specific extensions needed for each language.

In this presentation, I will briefly present an idea to link the abstract syntax in the RGL with the annotation schema used in the UD annotation. I will present multiple examples from corpora to highlight both the symmetry and differences between the two schema. The implications of this project are numerous, many still under exploration. However, I will present two simple case studies: one that benefits the GF community and other that is useful to the UD community.

Inari: Constraint Grammar as a SAT problem

We represent Constraint Grammar [Karlsson et al., 1995] as a Boolean satisfiability (SAT) problem. A logic-based approach to CG has possible advantages over more traditional approaches. Constraint rules encoded in logic capture richer dependencies between the tags than standard CG; this means that a SAT-solver may disambiguate more words, and may do so more precisely. Also, the SAT-solver requires less rules, and these rules are simpler. The ordering of rules is something we found was incompatible with a logic-based approach. We compensate this by maximising rule applications. Whether this is an advantage or disadvantage remains to be seen. Our initial experimental results are promising.

John: Analysis of normative texts — my work in REMU so far

My work within REMU has focused on the formal analysis of normative texts ("contracts") such as terms of use, privacy policies, and service agreements. We do this by modelling such documents in terms of the obligations, permissions and prohibitions of agents over actions. For this we use the Contract-Oriented (C-O) Diagram formalism [Díaz et al., 2014] which models contracts as trees of clauses. This formalism also comes with a translation function from C-O Diagrams to networks of timed automata (NTA), which are amenable to model checking using the UPPAAL tool.

Much of my time has been spent on writing a working implementation of the formalism, including the translation function. This is written in Haskell, and includes:

A data-type for C-O Diagrams
Pretty-printed and parser to/from a shorthand notation
Serialisation to/from XML (COML)
Translation function to UPPAAL NTA (in XML format)

Some time was also spent on developing front-end tools for working with C-O Diagrams:

A CNL, written in GF, together with a simple web-based interface for working with contracts as CNL
A visual editor for editing contracts as diagrams [Gabriele Paganelli and Filippo del Tedesco]
ConPar: a tool for parsing NL texts and identifying agents, actions and modalities [Normunds Gruzitis]

In developing the above we have made a number of changes to the original syntax and its translation to NTA. We have also written a trace semantics for C-O diagrams, defining what it means for a trace (a sequence of observed events) to satisfy or violate a contract. This trace semantics is independent from the translation to NTA, but should accept exactly the same traces as the equivalent translation. We have been attempting to prove this correspondence primarily using QuickCheck. A lot of work has gone into building all the necessary support, but some implementational issues still remain.

Many directions for future work exist:

Complete testing work and proof of trace semantics
Extensions to language, particularly declarative clauses and repetition
Optimisations in NTA translation to minimise model size
Improvements to ConPar and CNL to make front-end more mature
Application to larger case studies
Work on finding query patterns, and producing a query tool which abstracts away from the low-level automata details

Thomas: gf -output-format=haskell -haskell=concrete (Slides)

GF has for many years had an option -output-format=haskell that converts the abstract syntax of a GF grammar into the corresponding data types in Haskell. Recently, I have added the suboption -haskell=concrete to also convert concrete syntax to Haskell, i.e. in addition to a module containing data types for the abstract syntax, you get a module containing Haskell translations of the linearization types and linearization functions, for each concrete syntax in the grammar.

While GF and Haskell are both functional languages, there are several language features in GF that have no direct correspondence in Haskell, e.g. parameterized modules, variants, anonymous record types, record subtyping, record extension and pattern matching with regular expression. With the current approach, many of these are eliminated before translation to Haskell, but some of them require special care. I will describe the solution I have chosen, and show some concrete examples of translated grammars.

Maria-Emilia: C-O Diagrams (Slides)

In my presentation I will talk about a graphical representation not only for electronic contracts (e-contracts) but also for the specification of any kind of normative text (Web service composition behavior, software product lines engineering, requirements engineering, etc.), called Contract-Oriented Diagrams (C-O Diagrams, for short).

They appear as a way to further close the gap between contracts and its representation, where we consider that three criteria must be met:

the representation must be usable and understandable for non-expert users
the logic behind this representation must provide reasoning techniques, and
the internal machine-codification must be easy manipulated by programmer and allow runtime monitoring.

Therefore, C-O diagrams use deontic logic as a source of inspiration to define formal languages in order to specify contracts where (legal) obligations, permissions, and prohibitions, as well as events/consequences resulting from violations of obligations and prohibitions are of importance. In such diagrams we are also able to represent absolute and relative timing constraints.

Editing this page

git pull
Edit www/retreat/2015/index.md
Run make in above directory
Commit and push both index.md and index.html