One of the initial goals of the OKBQA collaboration was to devise a preliminary instantiation of a generic architecture for question answering on Linked Data. In the following, we begin by presenting key requirements to the architecture. We then present the architecture itself and describe the functionality of each of its module. We include preliminary implementations and identify components that must still be implemented. Finally, we describe our approach to evaluating our system.
The architecture was designed to be as generic as possible while remaining easy to understand, implement and use.
- Our first key requirement was to ensure that no programming language is imposed on the user. The motivation behind this requirement was simply that certain programming language as better suited for certain tasks. Given the variety of tasks that are required to achieve high-quality question answering, enforcing a programming language would have been prohibitive to the functionality and extensibility of the framework. We thus decided that all modules would be implemented as web services.
- Our second requirement was to reuse existing standards as much as possible. We thus decided that all services are to generate and consume JSON objects according to the architectural design below.
- Our third requirement was that of provenance tracking. We thus chose to add the ID of each service to its JSON output, making the contribution of each module easy to track throughout the QA process.
Several types of architecture can be envisaged for QA. We assumed the QA process to be a workflow in which a controller decides on the workflow to employ, stores metadata on the current workflow and is free to call component in the order it requires. Each component on the other hand assumes a particular type of JSON object as input and returns JSON as output. Depending on their implementation, components are free to access as many other components as required.
Overall, 8 modules were specified as integral parts of the QA process.
- Question Generation module takes a question or question fragment as input and returns a set of scored questions or question fragments as output. Questions are generated by sample templates and word dictionaries. Autocomplete user interface is used in this module.
- Template Generation: Takes in a question and generates pseudo-queries as well as a list of strings (so-called slots) for which data from the knowledge base is needed. The pseudo-queries are scored. This module can be agnostic of the underlying knowledge-base.
- Disambiguation: This module takes the output of the template generation and returns URIs that map the slots.
- Query Generation: This module combines the results of the disambiguation and the template generation, and compose corresponding SPARQL queries.
- Answer Generation: The answer generation module takes a list of queries and endpoints as input and returns the results of the QA system.
- Rendering: This module renders the results of the QA system in a user-friendly manner.
- Control module links the inputs and outputs of all modules (template generation modules, disambiguation modules, query generation modules) and returns final answers of an input question string using an answer generation module embedded in the control module. This module supports the address configuration of each module and SPARQL endpoints.
- Evaluation: This module takes in a control module and a benchmark data set, runs the control module over the data set, and reports the performance of the control module against the gold answers encoded in the data set.
Other modules that might be considered when building a QA system include:
- Named Entity Recognition
- Path Finder
- Full-Text Index
OKBQA (Core) Module Flow