Development‎ > ‎Architecture‎ > ‎

Template Generation

A Template Generation module takes in a natural language query with language specification, and determines the structure of corresponding SPARQL queries, together with information about slots in the structure. 

INPUT

Input is a question string and a language tag:

{ "string": " ", "language": " " }

Output

Output is a list of templates that specify a query together with information about the slots and a score:

[
  {
    "query": "SELECT|ASK v1 WHERE {v1 ...}",
    "slots": [ { "s": "v1", "p": "...", "o": "..." } ],
    "score": 0.0
  }
]

Example

For the  example question How many students does the Free University in Amsterdam have? templates like the following ones could be generated:

[
 {
   "query": "SELECT COUNT(?v3) WHERE {
                ?v1 ?p1 ?v2 .
                ?v1 ?p2 ?v3 .
                ?v3 ?p3 ?v4 .
             }",
   "slots": [
     {"s": "v2", "p": "verbalization", "o": "Amsterdam"},
     {"s": "v2", "p": "is", "o": "owl:NamedIndividual" },
     {"s": "p1", "p": "verbalization", "o": "in"},
     {"s": "p1", "p": "is", "o": "owl:Property"},
     {"s": "p2", "p": "verbalization", "o": "have"},
     {"s": "p2", "p": "is", "o": "owl:ObjectProperty"},
     {"s": "v4", "p": "verbalization", "o": "students"},
     {"s": "v4", "p": "is", "o": "owl:Class"},
     {"s": "p3", "p": "is", "o": "<http://lodqa.org/vocabulary/sort_of>"}
   ], 
   "score": 1.0
 },  {
   "query": "SELECT ?v2 WHERE { ?v1 ?p1 ?v2 . } ", 
"slots": [ 
  • {"s": "v1", "p": "verbalization", "o": "Free University in Amsterdam"}, 
  • {"s": "v1", "p": "is", "o": "owl:NamedIndividual"}, 
  • {"s": "p1", "p": "verbalization", "o": "students"}, 
  • {"s": "p1", "p": "is", "o": "owl:DatatypeProperty"} 

], 
"score": 0.8 
}, 
... 
]

Approach

First, the question is linguistically analysed, annotating it with part-of-speech tags, dependency relations, and semantic role labels. Second, the resulting parse tree is transformed into a template, covering one possibility of the how natural language expressions correspond to constructs in the target SPARQL query. This is the template that is most faithful to the linguistic structure of the question. 

In order to also account for structural differences between the question and the target query, the template is modified by a sequence of steps that collapse or expand triples, yielding additional templates.

The scoring of the templates follows a simple heuristics computing the number of nodes in the query body that are neither projection variables nor slots. In addition, each rewriting operation reduces the score by a predetermined factor.

Code

Implementations are available on GitHub: 

Related work

The idea of template generation is also exploited in the following question answering systems:

Comments