Experience taken from Alexa prize

Petr Marek
November 26, 2017 at 6:08 am

I spent last year working on chatbot Alquist for Amazon Alexa Prize 16/17. I would like to share with you the current situation in the field of conversational AI from my practical point of view. We found out all of this the hard way. I share it with you so you don’t have to.

Focus on the content, not machine learning only

It is tempting to try to solve the conversation by machine learning only. Our initial idea was to collect as many message-response pairs and use some information retrieval method (find the closest pair to user’s message). We decided to use Reddit comments as our source. We hoped, that if you have enough of these pairs, we can recreate any conversation. But this approach failed to create interesting dialogues. We thought that it is because we didn’t have enough pairs. Our next step was to limit the topic of dialogue to movies only. We had no success with this approach again. The main problem was, that dialogues were not coherent. The AI was jumping from topic to topic or referencing to something which was never said. Answers could be considered as OK, but they didn’t form coherent dialogue together.

A big problem with this approach is, that you don’t have any control what will AI say. AI will learn only what is in the data. It will not be able to make any sensible dialogue about any entity. It will say you its opinion and will deny it in the next dialog turn. It is not able to ask you for your opinion and react to it anyhow. It will not provide you any useful information. Dialogue with such AI is not beneficial, nor funny. It is just ridiculous.

We used machine learning to dialogue management only. Machine learning is not generating responses for us (except for one exception). It classifies intent, finds entities or detects sentiment. The content of the dialogue was handcrafted by us. This way we can ensure, that the dialogue is engaging and coherent, and if not we can tweak it.

Initial meeting with Amazon

Seq2Seq model doesn’t work

This is closely related to the previous point. I consider it important enough to put it to a separate point. There is the paper A Neural Conversational Model. It describes how to apply standard Seq2Seq architecture for machine translation to dialogues. There are really astonishing examples of dialogues produced by this model. We were amazed by it when we saw it for the first time. However, when we tried to use it by ourselves, it didn’t produce this type of results. The model’s responses are much shorter and generic. We used movie subtitles and Reddit comments as training datasets. We spend at least two months trying to make it work without any significant success.

We use this model at the end. We call it the “chitchat” and it is used only if all other methods of response generation fail. “Chitchat” usually responses to questions like “How are you?” or “What will you do tonight.” We allow only three responses in the row produced by this model and then we recommend some new topic to the user. This rule is important, otherwise the user is stuck in a dull and boring conversation about nothing. “Chitchat” is our only machine learning based method which generates answers.

Build dialogues with premade pieces

Our final approach was to represent dialogue as state automata. We have automat for each topic (sports, movies, music etc.). Each state of automat has “execute” and “transition” functions. The state can process user’s input or access some API in the “execute” function for example. And the “transition” function decides to which state it will go next, based on the result of the “execute” function.

This simple design of our states allows us to do various things. We realized during the development, that we create many same states over and over again. So we decided to prepare the most used states, which we used to build dialogues. This speeded up the development and made changes much easier. If we wanted to add recognition of yes/no to our automat, we just used the “YesNo” state. If we wanted to improve how the yes/no is recognized in the user’s messages, we had to do it only in single place.

Our premade states are:

  • Say sentence, wait for user’s message and transition to next state
  • Say sentence, don’t wait for user’s message and transition to next state
  • Say sentence A every n-th visit of state and say sentence B otherwise, transition to next state (This is useful when you want to remind some information to the user, like “Remember that you can say ‘Let’s talk about sports.’ Anyway let’s go back to the books. What is your favorite?”)
  • Wait for user’s message and transition to next state
  • Recognize Yes or No (Agreement, disagreement) in the last user’s message and transition to one of two next states based on the result of the recognition
  • Recognize entity in the last user’s message
  • Switch to different automat

These states made approximately 60%-80% of our automata. The rest of states are rare, so it didn’t pay off to premade them.

Alexa Prize Alquist Team

Divide dialog as much as possible for easy maintenance

We created a dialogue about movies. It is huge, around 300 states. Its development took at least one month for one person of our team. Maintenance of such dialogue is a nightmare. This dialogue is about favorite movies, TV series, actors, directors, and genres. It also contains some chitchat about movies in general, like where the user usually watch movies and so on. And all of this is mixed in the single huge automat with a lot of transitions.

We also developed a dialogue about fashion. It is about favorite clothes, make-up, hairstyle and it can give some tips how to take photos of a new outfit for example. All of these parts are divided into smaller state automata, each containing around 20 states (the biggest one 60 state). The development took around 14 days for a single person. Maintenance is piece of cake because you need to debug only single small automat. My advice is to split topics as much as possible.

Goal isn’t to give as much information as possible but to entertain the user

Our initial approach to dialogue design was to ask user for his favorite entity (movie, video game, sport…), give him some information about entity (which genre it belongs to, actor who starred in it…), ask him some generic question (“Why do you like it?”, “What is your favorite part?”, “Do you play this game often?”…) and ask him for his next favorite entity again. A user was in the conversation loop and the duration of the conversation increased. However, this works only up to some point. The user does only two rounds and then he becomes bored. Main causes of this are: the same or very similar structure of dialogue, we give him information about his favorite entity which he probably already knows (because it is his favorite) and we don’t react to answers of our generic question (we replied by “I see.” or “It is interesting.” only).

The first and the second problem are quite easy to solve. The third one is challenging. You solve the first one by making more variants of dialogue. Ask user for favorite genre and movie, then transition to his favorite director of this genre or ask him whether he cares about online ratings of movies for example. You will need some really good conversation designer. Such person is usually not a programmer. So you will also need some way how to create dialogues with as few programming as possible for him.

We found out, that giving trivia and fun-facts to user solves the second problem. Don’t give him only raw data, put it into perspective etc. Great source of fun-facts is Reddit (https://www.reddit.com/r/todayilearned or https://www.reddit.com/r/Showerthoughts) which we used a lot. Is user talking about Matrix? Search todayilearned subreddit for word matrix and insert “Did you know? Will Smith turned down the role of Neo in The Matrix (1999). He instead took part in the film “Wild Wild West” which was a huge flop at the box office.” into the conversation. This is much better than generic “Matrix was released in 1999.” isn’t it?

The problem of not reacting to questions is hard to solve. Because the user can answer you by anything. There is almost infinite amount of possibilities. He can answer you by a relevant answer, but you can’t react to anything:

– Alquist: “Do you like this movie?”
– User: “Yes, but my girlfriend hates it.”
– Alquist: “I see!”

He can answer you and ask you at the same time:

– Alquist: “Do you like this movie?”
– User: “Yes, I like this movie. Why do you ask?”
– Alquist: “I see!”

Or he can say something completely irrelevant.

– Alquist: “Do you like this movie?”
– User: “I had an egg for breakfast.”
– Alquist: “I see!”

The possibilities are to ask fewer questions, which I don’t recommend. Users like when you ask them for their opinion. You can prepare answers for the most common answers and you can try to detect irrelevant answers. However, how to correctly react to any user input is still an open question.

Alexa Prize Summit 2017

Make dialogue not dependent on the user (sometimes)

You want to lead the user to some point in the conversation sometimes. We wanted to lead the user through our initial chat at the beginning of the conversation for example. We asked him how was his day or what is his name. If he replied some nonsense, we just continued with something like: “Ok, you don’t have to tell us. It is not so important information after all.” You can lead the user to some more interesting parts of dialogue in this way. However, don’t over-use it.

If you ask “Do you have any favorite actor?” don’t expect name only. Expect “yes” or “no” too

This problem appeared probably because our team doesn’t have any native English speaker. We used the question formulation from the headline. We expected, that user will tell us a name of his favorite actor. This happened, but there were also cases in which user answered us “yes”. And we answered, “I don’t know this actor.” We added reaction “Which one?” to all such states after we found out this problem. However, you have to count even with few users who answer “no”. Some form of recommendation is a good idea in such cases to push the dialogue forward.

Alquist team presenting at Machine Learning Meetup Prague

Look into data for unexpected user’s inputs and new topic suggestions

This advice is tightly connected to previous one. Look into data. Examine the whole conversations which users have with the system. How do they answer the questions? Which topics are the most favorite? Is everything working as expected? Does your topic classification work for all inputs correctly? Do you recognize all entities?

All of this is very important. So make some way how to collect the data about conversations. Visualize the data, look into it often and make tweaks to the system. You will probably have a lot of data. Clustering helps in such cases. We used clustering to group similar user’s answers. We could prepare answers to the most common answers and it also helped us to annotate our datasets. We didn’t have to label each user’s message but individual clusters only.

End every response by question or suggestion what to do next

THIS ONE IS SUPER IMPORTANT! Every time you say something to the user, end it with some question or suggestion what to do next. This approach has several benefits. It helps to keep the conversation going. It also lowers the number of possible user’s messages because most of the time user will do what you suggest. You can also lead him to a conversational topic which he hasn’t tried yet.

Be ready to jump out of context and return back to it

We included question-answering module and so-called “single-line answers” module into our system. Question-answering module answered general-knowledge questions and “single-line” answers module handled questions to which we had hardcoded answers, like “What is your name?” or “How old are you?” Both of these modules could be used anytime during the conversation and it originally looked like this:

– Alquist: “Have you been in Brazil?”
– User: “What is the population of Brazil?”
– Alquist: “It is around 200 000 000.”
– User: “Well ok.”
What is wrong with this conversation? It doesn’t use the previous advice. The answer doesn’t offer any suggestion what to do next and the conversation lost the track. Even worse was that our system was not able to produce any interesting response in this point of conversation. We solved this problem by returning back to the previous context by line “anyway I was saying.”
– Alquist: “Have you been in Brazil?”
– User: “What is the population of Brazil?”
– Alquist: “It is around 200 000 000. Anyway I was saing, have you been in Brazil.”
– User: “No, not yet.”
– Alquist: “Brazil is fascinating country…”

Team Leader Honza tests Alquist

Make responses as variable as possible

One way how to not bore the user with same answers over and over is to add variance to your responses. We load responses from yaml files, in which we use our syntax. You can write several responses on single line thanks to it and several variants too. It looks like this:

– “(|Hmm.) I didn’t (enjoy|like) (it|that one) (|very) much.”
– “That one didn’t impress me (|very) much.”
So when you want to comment some bad book, you just call function make_response(“book_not_enjoy”) and it randomly chooses one of following responses:
  • Hmm. I didn’t enjoy it much.
  • I didn’t enjoy that one much.
  • I didn’t like that one very much.
  • That one didn’t impress me much.
  • That one didn’t impress me very much.

This saves time because you don’t have to write whole sentences, you just write variants of phrases.

ASR is not perfect

ASR is not perfect. It has difficulties mainly when you are in the noisy environment or the user is not native English speaker. This was a case of our whole team. I really struggled with phrases like “Let’s talk about movies.” which was recognized as “What’s talk about movies.” and classified by dialogue manager as input to our question answering and not “movies” intent (I had to use the phrase “Tell me about movies” which worked fine).

Errors of ASR were responsible for a lot of our problems with intent and entity recognition. We tried to solve it by looking at confidence scores. If the confidence score was not above our threshold, we asked the user to repeat his message (“I didn’t understand you. Can you repeat it?”).

We tried A/B testing later in the development. We disabled the confidence score for half of the users. Do you know what happened? Ratings for both groups remained almost the same. So either solve ASR errors by some more clever method than we did or don’t bother with it at all. Amazon is improving it all the time, so this problem will maybe not exist in the future.

Alquist team

Filter all responses from any source you don’t have under control for profanities

Profanities… this was the most common reason why Amazon stopped us. Saying profanities was not our intention of course. The reason was, that we used a lot of texts from the internet. Reddit comments for chitchat, fun-facts from Reddit etc.

We used a combination of two ways how to filter texts containing profanities. The first one was simple string comparison against the list of profanities. The list consisted of a combination of some profanities lists which we found on the internet and later we added a list provided by Amazon. The list was several hundred phrases long, many of which I never heard before.

This worked up to some point (altho we had to add a lot of more phrases which we discovered over time). But this approach doesn’t detect hate-speech without profanities. One such example is a response “Kill your parents.” So we used machine learning to train detector of hate-speech. We trained it on two datasets of hate-speech from Twitter. You will surely be able to find them. This improved detection, but there are still some problems which we are gradually removing. I recommend you to spend a time to develop a way how to check all texts which you didn’t write by yourself.


This is my 13 practical recommendations to anyone trying to create their own conversational AI. They are based on my experience, which I and our team gathered over one year of competing in the first Amazon Alexa Prize 16/17. I hope that this list will help you to create something really clever and human-like because one of my dreams is to have a truly intelligent conversation with AI. Good luck and keep pushing the frontiers!

Originally published on www.petr-marek.com

Python interface for Turris Gadgets

Alexej Popovic
September 5, 2017 at 12:29 pm

We have developed a Python 3 library for communication with Turris Gadgets. It’s easy to use and provides great flexibility to create complex applications for Internet of Things. The library allows complete control over Turris Gadgets devices including their management, requesting their states, adding listeners to the events invoked by them, etc.

In the recent weeks, we have been working on several projects dealing with IoT and voice assistants like Amazon Alexa or Google Assistant. Our general target was the ability to control smart home devices by voice with these assistants. A simple diagram shows the basic principle:Untitled DiagramSeveral IoT devices are connected to a local server, in this case, Raspberry Pi 3. The server then communicates with an AWS Lambda service using MQTT protocol. AWS Lambda provides a way to access many different Amazon services. On the left of the diagram is Amazon Echo, which is used to provide voice input to AWS Lambda, which forwards the request to our local server using the MQTT protocol. The next simplified diagram shows the local server principle that we wanted to implement:

diagram in reality 2

The local server does all the hard work, parsing requests coming either from Amazon, from the devices themselves (like pressing a button) or from other scripts and applications. One significant advantage of such layout is the fact that the system keeps working even when one of the controllers fails. For example, when the connection to the Internet is not available, you could still control the devices manually using mechanical buttons, etc. Automated and scheduled tasks would still work too. As can be seen, the primary communication protocol for IoT devices is MQTT. In the real world, there is a problem; not all IoT devices handle the MQTT interface. That’s why some protocol translator is needed. Our translator is a library converts the MQTT to other manufacturer specific protocols.

One of such groups of devices that we wanted to use for our projects is Turris Gadgets. For those who don’t know, Turris Gadgets is a joined CZ.NIC and Jablotron Group activity. The project aims to create a smart home network using the Turris router. The Turris Gadgets set contains several sensors, such as PIR motion detectors, shock detectors, etc. It also includes actuators like remote controlled outlets, relays, and a wireless siren. All of these sensors and actuators use a manufacturer’s proprietary wireless communication protocol. Jablotron provides a dongle with a radio, which plugs into the Turris router USB. With this and a little bit of software installed on the router, we can set-up a smart home very quickly.

You can find some open source solutions supporting Turris Gadgets. They usually are ready-made applications for complete home automation and connecting them to our application would present too much of a hassle. Also, the communication protocol used by the Turris Gadgets is rather simple and developing an interface for them is pretty straightforward. That’s why we decided to create our library to communicate with them.

We have chosen to implement the library in Python. The development was very convenient and fast. The main control program APIs that we used in our project were written in Python too. The library is elementary and starting the communication with the devices requires just a few lines of code.


import jablotron.events as events
import jablotron.devices as devices

dongle = devices.Dongle(port="/dev/ttyUSB0")

def blink(event):
   for i in range(3):

events.bind(events.Event.ev_PIR_motion, blink)

This simple example demonstrates just how easy it is to work with the library. First, we import the necessary sub-modules, and then we create the dongle instance, which opens a serial port and starts the transmitting and receiving threads. Then we call dongle.init(), which fetches all the information that the dongle has in its memory, which means that we get all registered peripherals. Let’s define an example function. This function just turns on and off an outlet three times in a row, which is done by calling dongle.req_send_state(), which broadcasts a state message to all registered devices. In the message, we can specify the required states of the outlets, alarm, beeper, etc. Finally, we bind this function to the event called ev_PIR_motion, which is an event that is triggered when any of the PIR sensors detects motion. The event object that is passed to the function contains an information about the time of the event occurrence, exact ID of the PIR sensor and other useful information. As can be seen, the library provides a simple way to incorporate the Turris Gadgets in any IoT project. Moreover, it offers great flexibility for controlling the devices. It’s also easy to register new peripherals to the dongle’s memory or delete them.

In the future, we will further improve the functionality of the library and add new features like keeping track of the old state of every device or storing the dongle’s memory in a file, so it’s not needed to fetch it on every start of the application. Stay tuned we will soon show how to use the Alexa and Google home to take advantage of this library.

Alquist made it to the Alexa finals

Jan Sedivy
August 30, 2017 at 2:55 pm

Screen Shot 2017-08-30 at 12.52.09

The CVUT Alquist team managed to get with other two teams to the finals of a $2.5 million Alexa Prize, university competition. Our team has developed the Alquist social bot.

The whole team has met in the eClub during summer 2016. That time we have been working on a question answering system YodaQA. YodaQA is a somewhat complex system, and students learned the classic NLP. Of course, everybody wanted to use Neural Networks and design End to End systems. That time we have also been playing with simple conversational systems for home automation. Surprisingly Amazon announced the Alexa Prize and all clicked together.  We have quickly put together the team and submitted a proposal. One Ph.D., three MSc, and one BSc student completed a team with strong experience in NLP. In the beginning, we have been competing with more than a hundred academic teams trying to get to the top twelve and receive the 100k USD scholarship funding. We were lucky, and once we were selected in November 2016, we began working hard.  We started with many different incarnations of NNs (LSTM, GRU, attention NN, ….) but soon we have realized the bigger problem, a lack of high-quality training data. We tried to use many, movies scripts, Reddit dialogues, and many others with mixed results. The systems performed poorly. Sometimes they picked an interesting answer, but mostly the replies were very generic and boring. We have humbly returned to the classical information retrieval approach with a bunch of rules. The final design is a combination of the traditional approach and some NNs. We have finally managed to put together at least a little reasonable system keeping up with a human for at least tenths of seconds. Here started the forced labor. We have invented and implemented several paradigms for authoring the dialogues and acquiring knowledge from the Internet. As a first topic, we have chosen movies since it is also our favorite topic. Then, we have step by step added more and more other dialogues. While perfecting dialogues, we have been improving the IR algorithms. We had improved the user experience when Amazon introduced the SSML. Since then Alexa voice started to sound more natural.

While developing Alquist, we have gained a lot of experience. A significant change is a fact that we have to look at Alquist more as a product than an interesting university experiment. The consequences are dramatic. We need to keep Alquist running, which means we must very well test a new version. Conversational applications testing is by itself a research problem. We have designed software to evaluate users behavior statistically. First, a task is to find dialogues problems, misunderstanding, etc. Second, we try to estimate how happy are users with particular parts of the conversation to make further improvements. Thanks to the Amazon we have reasonably significant traffic, and while we are storing all conversations, we can accumulate a large amount of data for new experiments. Extensive data is a necessary condition for training more advanced systems. We have many new ideas in mind for enhancing the dialogues. We will report about them in future posts.

Many thanks for the scholarship go to Amazon since it was a real blessing for our team. It helped us to keep the team together with a single focus for a real task. Students worked hard for more than ten months, and it helped us to be successful.

Today we are thrilled we made it to the finals with the University of Washington in Seattle and their Sounding Board and the wild card team from Heriot-Watt University in Edinburgh, Scotland, with their What’s up Bot. Celebrate with us and keep the fingers crossed. There is a half a million at stake.


Originally published at jsedivy.blogspot.com

ALEXA TUTORIAL – How to create a Google Drive note by voice

Jan Presperin
July 1, 2017 at 4:02 pm

Difficulty – Beginner Skill

Time – 2 – 3 hours with the tutorial

In this post we will look at how you can upload files with your desired text content using your voice and Echo enabled device, which could be the Echo itself,  or your phone using Reverb app.

The article assumes that the reader has previous elementary experience with making Alexa Skills and understands the concept of how skills are made, but we will cover all of the important aspects to make this skill.

This tutorial consists of 3 parts:

  1. Uploading the code to AWS Lambda service
  2. Setting up the Google Drive API and Account Linking

  3. Authenticating the user

So let’s get started with the first section.

1. Uploading the code to AWS Lambda service

First, we need to make a Lambda function and put our Python 3.6 code into it. This is the stuff that does all the backend logic for us when we ask Alexa to do something.

The first issue is, given that the skill has certain dependancies (python modules that the skill uses), we cannot just paste the code into the code editor, we need to make a so called Deployment Package (http://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html). To make a long story short, we just need to download ’googleapiclient’ , ’httplib2’ and ’oauth2client’ packages and put them into the same location on our computer as the file with the lambda function code itself is and compress it (make a .zip file).

We will then upload this .zip file into AWS lambda.

The code itself is found in the ConnectToDriveApp.py file  ( https://drive.google.com/drive/folders/0B6brXj4ch4-ycnNocXd6WjNaRVE?usp=sharing )

Here is the page where we upload .zip file with ConnectToDriveApp.py and ’googleapiclient’ , ’httplib2’ and ’oauth2client’ packages.

Screen Shot 2017-06-21 at 14.12.19-12

Copy ARN  the text in the red field( *1 )– we will use this later in the Development portal

2. Setting up the Google Drive API and Account Linking

Next we want to set up the application on Google’s side, where we obtain certain IDs, which we will use later for linking the Amazon Skill and our Google account.

We have to go to https://console.developers.google.com, log in with our Google account and Create a project. Choose any name you want.

Then we go to Credentials > Create credentials > OAuth Client ID and there we select Application type: Web Application

Then we choose some name, leave the other parts empty and click Create. A pop-up windows will appear and we copy both ClientID ( *2 ) and Client Secret ( *3 ), we will use that later.

We open another tab and go here: https://developer.amazon.com

We go to Alexa tab and create a new skill, set up the name and Invocation name.

Screen Shot 2017-06-21 at 14.53.21-14

Then we open the Interaction Model tab and Alexa Skill Builder opens. We open the Code editor tab and paste there the content of InteractionModel.json. We click on Build Model, wait couple of minutes and then go to Configurations tab, this will close the Skill Builder interface.

Screen Shot 2017-06-21 at 14.51.27-16

On the Configuration page, we choose AWS Lambda endpoint, select North America and copy previously saved ARN ( *1) from Amazon Console into the field.

Screen Shot 2017-06-21 at 14.58.45-18

Now we are going to set up the Account Linking, which will enable the Skill to call the Drive API after receiving an access token from Google, which will authorise the user upon each request.

Screen Shot 2017-06-21 at 15.12.28-20

Authorization URL – This is where you have to authenticate and allow the Amazon Skill to access the DRIVE API. You give the permission only once through the Alexa app on your phone.

It is:


Be careful to change the redirect_uri part to the pitangui link in Redirect URIs section under Scope.

Client ID ( *2 ) and Client Secret ( *3 ) are the ones you have from Google Console.

Access Token URI is: https://accounts.google.com/o/oauth2/token?grant_type=authorization_code

Scope: https://www.googleapis.com/auth/drive

Now comes an important step. Copy both Redirect URIs from the Account Linking Section and go back to your Project in https://console.developers.google.com. Go to Credentials and click on the Credentials you created. Paste both Redirect URIs from Amazon Developer Console to Redirect URIs field in Google Developer Console. This was the final step in linking both services together.

Screen Shot 2017-06-21 at 21.53.47-22

3. Authenticating the user

Now you only need to download the Alexa app on your phone ( you need to use an Apple ID with U.S. location to be able to download the app if you are using iPhone), log in with your account, click on Skills > Your Skills and enable the skill you just created and link the accounts. You should be redirected to the Authorization URI you set in the Amazon Developer Console. You sign in with your Google account there and from then on each request to Alexa Skill Service should now contain the Access token.

Good job, you are now ready to make notes with your Echo and upload them to your Google Drive. 🙂 

Alquist Editor

Vaclav Pritzl
June 13, 2017 at 2:10 pm

I would like to describe a project I have been working on – Alquist Editor in this blog. Alquist is an intelligent assistant interpreting program written in a simple YAML. In the heart of the Alquist is a dialogue manager, which is programmed in YAML language defining the dialog states i.e. the flow of a conversation between the user and machine. Sometimes these dialogs become quite complicated. To ease the development I have designed a web editor helping the programer to visualize the graph of the dialog flow. The graph structure describes the dialogue where the nodes represent different states of the dialogue (e.g.  user input or bot response) and  edges represent transitions between these states. Alquist Editor displays this graph structure and simultaneously the code in the same window.

How to use the editor? First the editor must be installed. To install it, please, refer to the GitHub documentation. The editor is designed as a web application. The server stores the code and  the whole Alquist dialogue manager as well. To access the editor, open the address /editor on the URL where it is running (for example

The editor opens with an index page with options for selecting an existing project or creating a new one. When you create a new project, you can import existing files to it. Let’s create a new project from scratch.

Empty project

The editor window has three panes. The left pane contains a file manager for the project. It supports basic file and folder management and drag and drop upload. So far it contains only two folders – flows and states. Flow is a folder where you  store the yml files defining the bot structure. Many bot applications require custom states implemented in python. All python code is stored in the states folder.

The middle pane displays a dialogue graph. At the beginning it is empty because

there are no states defined.

Finally, the right pane contains a code editor with the YAML code. The code can be edited. The files are selected in the file manager and saved by clicking on the button below. Furthermore, you can revert unsaved changes or download the whole project in a .zip file using the remaining buttons.

During the development of your bot you have the option to test it. You can just click the button in the bottom left corner and the dialogue with your bot will open in a new tab.

Now let’s look at an example project.

Example project

You can see that the graph is divided into groups each representing one YAML file where the appropriate parts of the dialogue are defined. The editor displays the initial state in yellow and unreachable states in red. This way, mistakes in dialogue structure can be easily detected. Furthermore, it paints targets of intent transitions in green.

You can highlight any node of the graph by clicking on it and the code editor on the right will automatically open the appropriate file and scroll to the definition of the selected node. You can see that in the picture node “xkcd_set_context” is selected.

If you would like to try to create your own bot you can download the whole project from GitHub. For more detailed explanation refer to the GitHub documentation of Alquist and Alquist Editor.  Detailed instructions how to write your own bot can be found here.

Enjoy playing with the bot and the editor and let me know how you like it.

What is new in eClub

Jan Sedivy
June 4, 2017 at 3:35 pm



The 2017 eClub Summer Camp is starting. For the first time in the new CIIRC building. We will focus on AI, IoT, and Internet. In particular Conversational IA, how to program Amazon Echo to control your household, Natural Language Processing and other topics, see the projects.

The main topic this summer is all that we need to create a great Echo application. Echo is a voice controlled smart speaker made by Amazon. You can simply ask to play music, ask factoid question, carry a simple dialog or control your household. There is an amazing technology behind the set of new Amazon devices. First of all the speech recognition, directional microphone, conversational AI, knowledge database, etc. The eClub team is among the first in the world working directly with the Amazon research group on making the Alexa even smarter. We want to make her sexy, catchy and entertaining and it requires a lot of different skills. Starting with the linguistics up to Neural Networks design. We have many well separated problems for any level of expertise. Come to see us, we are preparing an introductory course to teach you how they do it. We will help you to create your first app with interesting skills. You can meet lot of students who work in the Conversational AI who will help you to get over the basic problems.

We want to make the Conversational apps not only entertaining but also knowledgeable. Alexa must also be very informative. It must know for example the latest news in politics, the Stanley Cup results, what are the best movies and I am sure we can continue with many other topics. The knowledge is endless, and it is steadily growing. To handle to alway increasing data requires processing many news feeds, different sources, accessing different databases, accessing the web etc. The news streams must be understood, and the essential information must be extracted. There are many steps before we retrieve the information. Especially today we need to be careful and every piece of information must be verified. We try to create a canonical information using many sources of the same news. As soon as the information is clean we need to store it in a knowledge database. The facts need to be linked to information already in the database. And how about the fake news, how to recognize them?

Building the Conversational AI does not include only the voice controlled devices. We may want to build a system automatically replying to the user email or social media requests. Imagine for example a helpdesk where users are asking many different question from IT to HR topics. For example very frequently how to reset a password, or how to operate a printer or a projector, why not to answer them automatically? And we can be much more ambitious. Many devices are quite complex and it is not easy to read a manual. It is much faster to ask a question such as “How do I reset my iPad”, or “How do I share my calendar”. These apps are put together from two major parts. The understanding of the question and a preparation of the answers. Both use extensively the NLP pipeline. If you expand on this idea, you may find a million of applications with a similar scenario. An automated assistant can at least partly handle every company-customer interaction. To make a qualified decision, the executives need fast access to business intelligence. Why not ask questions such as “What was the company performance last week”, “What is the revenue of my competitors” etc.

Let me mention another aspect of our effort. The latest manufacturing lines are extensively using robots, manipulators etc. (INDUSTRY 4.0) The whole process is controlled by a large number of computers. What if something stops working, it is a very complicated task to fix a line like this? Every robot or manipulator might be from a different manufacturer, programmable in a slightly different dialect. Is there anybody in the company who can absorb the whole knowledge to be effective in localizing the problem? Yes it is a robot, which has all the knowledge in a structured form. The robot can apply optimization to find the best set of measurements or tests to help the maintenance technician. To make this happen we need in addition to a productive dialog and knowledge database also an optimization to suggest the shortest path for fixing a problem. The robot can guide humans to repair the problem most efficiently.

Yes, I have almost forgotten. It is recently very popular to use the robots to control the household. Alexa, turn off all the lights. Alexa, what is the temperature in the wine seller? We want to invent and build some of these goodies to our new eClub space during the summer. Our colleagues have developed a Robot Barista application shaking drinks on demand. A voice user interface will make it even more entertaining. We have other exciting devices and small gizmos deserving voice control. You also may come with your ideas. Join us we will assist you to be successful.

These are just few use cases we will try to tackle during this season. If you want to learn the know-how behind joins we will help you and we also will award a scholarship.

Dungeons and Dragons players wanted

Jan Sedivy
May 21, 2017 at 10:00 am

Screen Shot 2017-05-21 at 11.32.36 AM

Join us in designing an interactive Alexa D&D handbook and to improve level and XP progression. We want to teach Alexa quickly answering questions like:

  • Tell me about grappling?
  • How much is a longsword?
  • What is level 3 experience threshold?
  • List all one handed weapons.

We plan to design and implement a conversational AI application for Amazon Alexa products. The Alquist team works hard on Alexa conversations. Last autumn Amazon selected Alquist as one of the top 12 teams to compete in the Alexa Award competition. We have great support from Amazon. Enlarge the team and experience the fun in designing catchy dialogs. Enjoy a free AWS access and scholarships. We have an extensive experience with all Natural Language Processing including the latest Neural Network algorithms. Meet the team and learn the latest technology.
D&D is a highly engaging and addictive and we believe we can make the players experience even deeper. Join us, we are starting!

Neural Network Based Dialogue Management

Jan Pichl
May 14, 2017 at 8:41 pm

There are plenty of sub-tasks we need to deal with when we are developing chatbots. One of the most important sub-task is called intent recognition or dialogue management. It is a part of the chatbot which decides what the topic a user want to talk about is and then it runs the corresponding module or function.  Since it manages which part of the software should be triggered, it is called “dialogue management”. The “intent recognition” name comes from the fact that the decision is made according to the user intention which needs to be recognised.

All we want to do is to take the sentence of whatever just the user said and then decide which class is the most suitable. In Alquist, we have 16 classes. These classes correspond to the topics which the chatbot is capable of talking about.

We have experimented with several approaches to deal with this task. The first approach combines a logistic regressions classifier and cosine similarity of the GloVe word embeddings (similar to word2vec). The input of the classifier consists of the one-hot vectors of word uni- and bi-grams. The classifier estimates the probabilities of the coarse-grained classes such as chitchat, question answering, etc. More fine-grained classes are estimated using the cosine similarity distance between the average vector of the embeddings of the words from input sentence and the average of the embeddings of the words from reference sentences. The accuracy of this combined approach is 78%.

Another approach uses a neural network as an intent recognizer. The neural network has three different inputs. The first one is the actual utterance, the second one consists of the sequence of the concepts and the third one is the class of the previous utterance. The concepts are retrieved using heuristic linguistic rules and Microsoft Concept Graph. The previous label is the output of the very same neural network for the previous utterance or “START”  if the utterance is the first message. The structure of the neural network is shown in the following figure.

Network Structure

The network consists of separate convolutional filter for input sentence and the list of concepts. We use the filters of lengths from 1 to 5. Max pooling layer follows the convolutional layer and the outputs of the input sentence and the concepts branches are concatenated. Additionally, the class of the previous utterance is concatenated to the vector as well. Finally, we use two fully connected layers with dropout in the architecture.

This neural network based approach achieves the accuracy of 84% and it represents a more robust solution for the presented task. Additionally, it takes advantage of the information about the previous class which is often a crucial feature.


Recap of the last months aka. how we teach Alquist to talk

Petr Marek
April 8, 2017 at 5:09 pm
Roman working on Alquist

Lots of things happened during the last months. The biggest new is that Alquist will go to testing this Monday. Despite the original plan, it will be available only to the Amazon employees for next thirty days, but still, they are the real users. I can’t wait to see how Alquist will perform.

The Alquist evolved a lot since the last blog post. It progressed from some simple and without any purpose conversations to the focused speaker. Alquist can now speak about movies, sports results, news, holidays and encyclopedic knowledge, and about books and video games very soon.

How do we know which topics Alquist should learn? Amazon offered all teams the possibility to run closed beta test. We used this opportunity of course, as some of you might know. We decided to make more “open” beta test because we had space for 2000 testers. So we publicly announced the test and waited for the result. We used an only tiny fraction of space available, to be honest. But it was still enough to learn from mistakes and to find ideas how to improve Alquist. I would like to thank all of you, who helped us. Thank you!

The public launch should happen at the beginning of May. Until then you can follow the progress of Alquist on the Twitter or Facebook, where you can find some cool demo videos of Alquist in action.

Voice Controlled Smart Home

Petr Kovar
March 24, 2017 at 3:29 pm

Do you remember Iron Man’s personal assistant called J.A.R.V.I.S.? It is just a fictional technology from a superhero movie, and I am getting close with HomeVoice. HomeVoice is designed to become your personal voice controlled assistant whose primary task is to control and secure your smart home. You can switch the lights, ask for a broad range of values (temperature, humidity, light states, etc.), manage your smart home devices and also provide the HomeVoice with your feedback to make it even better.

Let’s start at the beginning. My name is Petr Kovar, and I study cybernetics and robotics at CTU in Prague. I came to eClub Prague more than a year ago to participate in the development of Household Intelligent Assistant called Phoenix. Under the supervision of Jan Sedivy I built-up sufficient know-how about speech recognition, natural language understanding, speech synthesis and bots in general. A few months later I turned to Jan Sedivy again for help with a specification of my master’s thesis.

As time went on, we decided to utilize the accumulated experience for the development of a voice controlled smart home. I started with the selection of smart home technology. I decided to use Z-Wave the leading wireless home control technology in the market. I have selected the Raspberry Pi as a controller. It runs the Raspbian equipped with Z-Wave module and Z-Way control software.

The main task was to monitor my house by voice using a mobile device. I decided to write an Android app called HomeVoice. The app turns any Android tablet or smartphone into a smart home remote control. It works both locally and over the internet (using remote access via find.z-wave.me). Whereas other Z-Way Android apps offer only one-way communication (tablet downloads data from the control unit on demand), HomeVoice receives the push notifications informing the user as soon as a control unit discovers an alarm or something urgent. Imagine that you are at work when suddenly a fire occurs in your home. HomeVoice informs you about it in less than 500 ms which gives you enough time to ensure appropriate rescure actions.

HomeVoice supports custom hot-word detection (similar to “Hey, Siri” or “Ok, Google”), transcribes speech to text, understands natural language and responds using synthesized speech. Many different technologies are used to achieve this behavior from CMUSphinx (hot-word detection), through SpeechRecognizer API and cloud service wit.ai (natural language understanding) to TextToSpeech API (speech synthesis). HomeVoice interconnects all these technologies into a complex app and adds its context processing and dialog management.

It is still quite far from Iron Man’s J.A.R.V.I.S., but I hope that someday HomeVoice will become the usefull smart home assistant.