YodaQA on Docker

Martin Matulík October 13, 2016 at 11:35 am

YodaQA on Docker

screen-shot-2016-10-08-at-2-37-50-pm

YodaQA is a question answering system started by Petr Baudis and currently being developed in eClub. YodaQA is a quite complex set of several NLU and ML components. These include datasets to look up potential answers, parsers and other text processing tools to analyse a question, web front-ends etc. All components are implemented as web services which can be used separately as well as in the main QA pipeline. The main pipeline is built on top of Apache UIMA supporting communication between components. All components also offer REST APIs to use them for other projects. My task was to make the components easy to use and make them portable. I have decided to use the Docker deployment service.

What is Docker?

Docker is a platform for web applications that allows them to run in sandbox environments (called containers) hosted in an operating system such as Linux. The advantage of containers over more commonly used virtual machines is their efficiency and low overhead. Another Docker advantage is the simplicity of creating, configuring and deploying the applications inside the containers. It can be done easily with few commands. The Docker Compose tool even allows the user to easily run multiple applications communicating among themselves. The containers are created from blueprints called images. Docker users share images and can improve existing ones.

Deploying to Docker

To run a container, you need an image. Each image is derived from a base image, e.g. image of an application written in Python language will be based on Python image. The base images are available from Docker repository.

The images are created using scripts called Dockerfiles.

vystrizek

The Dockerfile is a sequence of commands which first sets the OS environment of a container. This includes obtaining the base image (using FROM keyword), putting everything required for running the application (like code itself, dependencies or basic shell for manual control – using ADD keyword) and opening a communication port for APIs (using EXPOSE keyword).

Once the Dockerfile is completed or obtained, the image can be built with docker build command. The container can then be run using docker run command.

YodaQA Docker cloud

The address of the server is cloud.ailao.eu. Currently, YodaQA is composed from these Docker components:

  • Live demo – Main question answering client. It is available on port 4567
  • Other versions of the demo – Another two versions of YodaQA. First answers questions related to movies (4000), the other is able to answer questions in Czech.
  • Datasets – DBpedia (3037) and Wikipedia (8983) data dumps.
  • Labels – this component allows the user to link his query string to a DBpedia entity (5000, 5001).
  • Czech parser – parser built on Google’s Syntaxnet which is able to assign part-of-speech tags to Czech words (4571).
  • Javadoc – whole documentation generated for main YodaQA client (13880).

Documentation

The YodaQA Javadoc documentation is on the server. The more detailed documentation are wiki pages that are available at http://3c.felk.cvut.cz/dokuwiki/doku.php?id=yodaqa. They cover all parts of the question answering pipeline as well as information about setup and function of various external components.