ALEXA TUTORIAL – How to create a Google Drive note by voice

Jan Presperin
July 1, 2017 at 4:02 pm

Difficulty – Beginner Skill

Time – 2 – 3 hours with the tutorial

In this post we will look at how you can upload files with your desired text content using your voice and Echo enabled device, which could be the Echo itself,  or your phone using Reverb app.

The article assumes that the reader has previous elementary experience with making Alexa Skills and understands the concept of how skills are made, but we will cover all of the important aspects to make this skill.

This tutorial consists of 3 parts:

  1. Uploading the code to AWS Lambda service
  2. Setting up the Google Drive API and Account Linking

  3. Authenticating the user

So let’s get started with the first section.

1. Uploading the code to AWS Lambda service

First, we need to make a Lambda function and put our Python 3.6 code into it. This is the stuff that does all the backend logic for us when we ask Alexa to do something.

The first issue is, given that the skill has certain dependancies (python modules that the skill uses), we cannot just paste the code into the code editor, we need to make a so called Deployment Package ( To make a long story short, we just need to download ’googleapiclient’ , ’httplib2’ and ’oauth2client’ packages and put them into the same location on our computer as the file with the lambda function code itself is and compress it (make a .zip file).

We will then upload this .zip file into AWS lambda.

The code itself is found in the file  ( )

Here is the page where we upload .zip file with and ’googleapiclient’ , ’httplib2’ and ’oauth2client’ packages.

Screen Shot 2017-06-21 at 14.12.19-12

Copy ARN  the text in the red field( *1 )– we will use this later in the Development portal

2. Setting up the Google Drive API and Account Linking

Next we want to set up the application on Google’s side, where we obtain certain IDs, which we will use later for linking the Amazon Skill and our Google account.

We have to go to, log in with our Google account and Create a project. Choose any name you want.

Then we go to Credentials > Create credentials > OAuth Client ID and there we select Application type: Web Application

Then we choose some name, leave the other parts empty and click Create. A pop-up windows will appear and we copy both ClientID ( *2 ) and Client Secret ( *3 ), we will use that later.

We open another tab and go here:

We go to Alexa tab and create a new skill, set up the name and Invocation name.

Screen Shot 2017-06-21 at 14.53.21-14

Then we open the Interaction Model tab and Alexa Skill Builder opens. We open the Code editor tab and paste there the content of InteractionModel.json. We click on Build Model, wait couple of minutes and then go to Configurations tab, this will close the Skill Builder interface.

Screen Shot 2017-06-21 at 14.51.27-16

On the Configuration page, we choose AWS Lambda endpoint, select North America and copy previously saved ARN ( *1) from Amazon Console into the field.

Screen Shot 2017-06-21 at 14.58.45-18

Now we are going to set up the Account Linking, which will enable the Skill to call the Drive API after receiving an access token from Google, which will authorise the user upon each request.

Screen Shot 2017-06-21 at 15.12.28-20

Authorization URL – This is where you have to authenticate and allow the Amazon Skill to access the DRIVE API. You give the permission only once through the Alexa app on your phone.

It is:

Be careful to change the redirect_uri part to the pitangui link in Redirect URIs section under Scope.

Client ID ( *2 ) and Client Secret ( *3 ) are the ones you have from Google Console.

Access Token URI is:


Now comes an important step. Copy both Redirect URIs from the Account Linking Section and go back to your Project in Go to Credentials and click on the Credentials you created. Paste both Redirect URIs from Amazon Developer Console to Redirect URIs field in Google Developer Console. This was the final step in linking both services together.

Screen Shot 2017-06-21 at 21.53.47-22

3. Authenticating the user

Now you only need to download the Alexa app on your phone ( you need to use an Apple ID with U.S. location to be able to download the app if you are using iPhone), log in with your account, click on Skills > Your Skills and enable the skill you just created and link the accounts. You should be redirected to the Authorization URI you set in the Amazon Developer Console. You sign in with your Google account there and from then on each request to Alexa Skill Service should now contain the Access token.

Good job, you are now ready to make notes with your Echo and upload them to your Google Drive. :) 

Alquist Editor

Vaclav Pritzl
June 13, 2017 at 2:10 pm

I would like to describe a project I have been working on – Alquist Editor in this blog. Alquist is an intelligent assistant interpreting program written in a simple YAML. In the heart of the Alquist is a dialogue manager, which is programmed in YAML language defining the dialog states i.e. the flow of a conversation between the user and machine. Sometimes these dialogs become quite complicated. To ease the development I have designed a web editor helping the programer to visualize the graph of the dialog flow. The graph structure describes the dialogue where the nodes represent different states of the dialogue (e.g.  user input or bot response) and  edges represent transitions between these states. Alquist Editor displays this graph structure and simultaneously the code in the same window.

How to use the editor? First the editor must be installed. To install it, please, refer to the GitHub documentation. The editor is designed as a web application. The server stores the code and  the whole Alquist dialogue manager as well. To access the editor, open the address /editor on the URL where it is running (for example

The editor opens with an index page with options for selecting an existing project or creating a new one. When you create a new project, you can import existing files to it. Let’s create a new project from scratch.

Empty project

The editor window has three panes. The left pane contains a file manager for the project. It supports basic file and folder management and drag and drop upload. So far it contains only two folders – flows and states. Flow is a folder where you  store the yml files defining the bot structure. Many bot applications require custom states implemented in python. All python code is stored in the states folder.

The middle pane displays a dialogue graph. At the beginning it is empty because

there are no states defined.

Finally, the right pane contains a code editor with the YAML code. The code can be edited. The files are selected in the file manager and saved by clicking on the button below. Furthermore, you can revert unsaved changes or download the whole project in a .zip file using the remaining buttons.

During the development of your bot you have the option to test it. You can just click the button in the bottom left corner and the dialogue with your bot will open in a new tab.

Now let’s look at an example project.

Example project

You can see that the graph is divided into groups each representing one YAML file where the appropriate parts of the dialogue are defined. The editor displays the initial state in yellow and unreachable states in red. This way, mistakes in dialogue structure can be easily detected. Furthermore, it paints targets of intent transitions in green.

You can highlight any node of the graph by clicking on it and the code editor on the right will automatically open the appropriate file and scroll to the definition of the selected node. You can see that in the picture node “xkcd_set_context” is selected.

If you would like to try to create your own bot you can download the whole project from GitHub. For more detailed explanation refer to the GitHub documentation of Alquist and Alquist Editor.  Detailed instructions how to write your own bot can be found here.

Enjoy playing with the bot and the editor and let me know how you like it.

What is new in eClub

Jan Sedivy
June 4, 2017 at 3:35 pm



The 2017 eClub Summer Camp is starting. For the first time in the new CIIRC building. We will focus on AI, IoT, and Internet. In particular Conversational IA, how to program Amazon Echo to control your household, Natural Language Processing and other topics, see the projects.

The main topic this summer is all that we need to create a great Echo application. Echo is a voice controlled smart speaker made by Amazon. You can simply ask to play music, ask factoid question, carry a simple dialog or control your household. There is an amazing technology behind the set of new Amazon devices. First of all the speech recognition, directional microphone, conversational AI, knowledge database, etc. The eClub team is among the first in the world working directly with the Amazon research group on making the Alexa even smarter. We want to make her sexy, catchy and entertaining and it requires a lot of different skills. Starting with the linguistics up to Neural Networks design. We have many well separated problems for any level of expertise. Come to see us, we are preparing an introductory course to teach you how they do it. We will help you to create your first app with interesting skills. You can meet lot of students who work in the Conversational AI who will help you to get over the basic problems.

We want to make the Conversational apps not only entertaining but also knowledgeable. Alexa must also be very informative. It must know for example the latest news in politics, the Stanley Cup results, what are the best movies and I am sure we can continue with many other topics. The knowledge is endless, and it is steadily growing. To handle to alway increasing data requires processing many news feeds, different sources, accessing different databases, accessing the web etc. The news streams must be understood, and the essential information must be extracted. There are many steps before we retrieve the information. Especially today we need to be careful and every piece of information must be verified. We try to create a canonical information using many sources of the same news. As soon as the information is clean we need to store it in a knowledge database. The facts need to be linked to information already in the database. And how about the fake news, how to recognize them?

Building the Conversational AI does not include only the voice controlled devices. We may want to build a system automatically replying to the user email or social media requests. Imagine for example a helpdesk where users are asking many different question from IT to HR topics. For example very frequently how to reset a password, or how to operate a printer or a projector, why not to answer them automatically? And we can be much more ambitious. Many devices are quite complex and it is not easy to read a manual. It is much faster to ask a question such as “How do I reset my iPad”, or “How do I share my calendar”. These apps are put together from two major parts. The understanding of the question and a preparation of the answers. Both use extensively the NLP pipeline. If you expand on this idea, you may find a million of applications with a similar scenario. An automated assistant can at least partly handle every company-customer interaction. To make a qualified decision, the executives need fast access to business intelligence. Why not ask questions such as “What was the company performance last week”, “What is the revenue of my competitors” etc.

Let me mention another aspect of our effort. The latest manufacturing lines are extensively using robots, manipulators etc. (INDUSTRY 4.0) The whole process is controlled by a large number of computers. What if something stops working, it is a very complicated task to fix a line like this? Every robot or manipulator might be from a different manufacturer, programmable in a slightly different dialect. Is there anybody in the company who can absorb the whole knowledge to be effective in localizing the problem? Yes it is a robot, which has all the knowledge in a structured form. The robot can apply optimization to find the best set of measurements or tests to help the maintenance technician. To make this happen we need in addition to a productive dialog and knowledge database also an optimization to suggest the shortest path for fixing a problem. The robot can guide humans to repair the problem most efficiently.

Yes, I have almost forgotten. It is recently very popular to use the robots to control the household. Alexa, turn off all the lights. Alexa, what is the temperature in the wine seller? We want to invent and build some of these goodies to our new eClub space during the summer. Our colleagues have developed a Robot Barista application shaking drinks on demand. A voice user interface will make it even more entertaining. We have other exciting devices and small gizmos deserving voice control. You also may come with your ideas. Join us we will assist you to be successful.

These are just few use cases we will try to tackle during this season. If you want to learn the know-how behind joins we will help you and we also will award a scholarship.

Dungeons and Dragons players wanted

Jan Sedivy
May 21, 2017 at 10:00 am

Screen Shot 2017-05-21 at 11.32.36 AM

Join us in designing an interactive Alexa D&D handbook and to improve level and XP progression. We want to teach Alexa quickly answering questions like:

  • Tell me about grappling?
  • How much is a longsword?
  • What is level 3 experience threshold?
  • List all one handed weapons.

We plan to design and implement a conversational AI application for Amazon Alexa products. The Alquist team works hard on Alexa conversations. Last autumn Amazon selected Alquist as one of the top 12 teams to compete in the Alexa Award competition. We have great support from Amazon. Enlarge the team and experience the fun in designing catchy dialogs. Enjoy a free AWS access and scholarships. We have an extensive experience with all Natural Language Processing including the latest Neural Network algorithms. Meet the team and learn the latest technology.
D&D is a highly engaging and addictive and we believe we can make the players experience even deeper. Join us, we are starting!

Neural Network Based Dialogue Management

Jan Pichl
May 14, 2017 at 8:41 pm

There are plenty of sub-tasks we need to deal with when we are developing chatbots. One of the most important sub-task is called intent recognition or dialogue management. It is a part of the chatbot which decides what the topic a user want to talk about is and then it runs the corresponding module or function.  Since it manages which part of the software should be triggered, it is called “dialogue management”. The “intent recognition” name comes from the fact that the decision is made according to the user intention which needs to be recognised.

All we want to do is to take the sentence of whatever just the user said and then decide which class is the most suitable. In Alquist, we have 16 classes. These classes correspond to the topics which the chatbot is capable of talking about.

We have experimented with several approaches to deal with this task. The first approach combines a logistic regressions classifier and cosine similarity of the GloVe word embeddings (similar to word2vec). The input of the classifier consists of the one-hot vectors of word uni- and bi-grams. The classifier estimates the probabilities of the coarse-grained classes such as chitchat, question answering, etc. More fine-grained classes are estimated using the cosine similarity distance between the average vector of the embeddings of the words from input sentence and the average of the embeddings of the words from reference sentences. The accuracy of this combined approach is 78%.

Another approach uses a neural network as an intent recognizer. The neural network has three different inputs. The first one is the actual utterance, the second one consists of the sequence of the concepts and the third one is the class of the previous utterance. The concepts are retrieved using heuristic linguistic rules and Microsoft Concept Graph. The previous label is the output of the very same neural network for the previous utterance or “START”  if the utterance is the first message. The structure of the neural network is shown in the following figure.

Network Structure

The network consists of separate convolutional filter for input sentence and the list of concepts. We use the filters of lengths from 1 to 5. Max pooling layer follows the convolutional layer and the outputs of the input sentence and the concepts branches are concatenated. Additionally, the class of the previous utterance is concatenated to the vector as well. Finally, we use two fully connected layers with dropout in the architecture.

This neural network based approach achieves the accuracy of 84% and it represents a more robust solution for the presented task. Additionally, it takes advantage of the information about the previous class which is often a crucial feature.


Recap of the last months aka. how we teach Alquist to talk

Petr Marek
April 8, 2017 at 5:09 pm
Roman working on Alquist

Lots of things happened during the last months. The biggest new is that Alquist will go to testing this Monday. Despite the original plan, it will be available only to the Amazon employees for next thirty days, but still, they are the real users. I can’t wait to see how Alquist will perform.

The Alquist evolved a lot since the last blog post. It progressed from some simple and without any purpose conversations to the focused speaker. Alquist can now speak about movies, sports results, news, holidays and encyclopedic knowledge, and about books and video games very soon.

How do we know which topics Alquist should learn? Amazon offered all teams the possibility to run closed beta test. We used this opportunity of course, as some of you might know. We decided to make more “open” beta test because we had space for 2000 testers. So we publicly announced the test and waited for the result. We used an only tiny fraction of space available, to be honest. But it was still enough to learn from mistakes and to find ideas how to improve Alquist. I would like to thank all of you, who helped us. Thank you!

The public launch should happen at the beginning of May. Until then you can follow the progress of Alquist on the Twitter or Facebook, where you can find some cool demo videos of Alquist in action.

Voice Controlled Smart Home

Petr Kovar
March 24, 2017 at 3:29 pm

Do you remember Iron Man’s personal assistant called J.A.R.V.I.S.? It is just a fictional technology from a superhero movie, and I am getting close with HomeVoice. HomeVoice is designed to become your personal voice controlled assistant whose primary task is to control and secure your smart home. You can switch the lights, ask for a broad range of values (temperature, humidity, light states, etc.), manage your smart home devices and also provide the HomeVoice with your feedback to make it even better.

Let’s start at the beginning. My name is Petr Kovar, and I study cybernetics and robotics at CTU in Prague. I came to eClub Prague more than a year ago to participate in the development of Household Intelligent Assistant called Phoenix. Under the supervision of Jan Sedivy I built-up sufficient know-how about speech recognition, natural language understanding, speech synthesis and bots in general. A few months later I turned to Jan Sedivy again for help with a specification of my master’s thesis.

As time went on, we decided to utilize the accumulated experience for the development of a voice controlled smart home. I started with the selection of smart home technology. I decided to use Z-Wave the leading wireless home control technology in the market. I have selected the Raspberry Pi as a controller. It runs the Raspbian equipped with Z-Wave module and Z-Way control software.

The main task was to monitor my house by voice using a mobile device. I decided to write an Android app called HomeVoice. The app turns any Android tablet or smartphone into a smart home remote control. It works both locally and over the internet (using remote access via Whereas other Z-Way Android apps offer only one-way communication (tablet downloads data from the control unit on demand), HomeVoice receives the push notifications informing the user as soon as a control unit discovers an alarm or something urgent. Imagine that you are at work when suddenly a fire occurs in your home. HomeVoice informs you about it in less than 500 ms which gives you enough time to ensure appropriate rescure actions.

HomeVoice supports custom hot-word detection (similar to “Hey, Siri” or “Ok, Google”), transcribes speech to text, understands natural language and responds using synthesized speech. Many different technologies are used to achieve this behavior from CMUSphinx (hot-word detection), through SpeechRecognizer API and cloud service (natural language understanding) to TextToSpeech API (speech synthesis). HomeVoice interconnects all these technologies into a complex app and adds its context processing and dialog management.

It is still quite far from Iron Man’s J.A.R.V.I.S., but I hope that someday HomeVoice will become the usefull smart home assistant.

Automatic ontology learning from semi-structured data

Filip Masri
March 16, 2017 at 5:48 pm
Reconstruction of the relations in the table.

Today I am going to write about the topic of my diploma thesis “Automatic ontology learning from semi-structured data.” I try to exploit semi-structured data like web tables (<html>) to create domain specific ontologies.

What is an ontology?

The term ontology was specified by Thomas Gruber as “An ontology is a specification of a conceptualization. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.”
Two basic building blocks of an ontology are concepts and relations. Concepts represent classes of entities, and their individual members are called instances. Relations among concepts are called semantic relations. Moreover, ontologies can be created for different domains and serve as a foundation for a knowledge database containing instances of given concepts.

What is the approach of the proposed work?

Lots of domain specific information is presented on web pages in tabular data, for example in HTML <table> elements. However, retrieving suitable web tables from pages and reconstructing relations among its entities consists of several subtasks.
First, we have to identify proper tables have for retrieval from the pages. The process is called WEB table type classification. The WEB table header classification identifies rows/columns headers. Finally, the table has to be transformed into an ontology, and that process is called Table understanding.


WEB table type classification

There are several types of tables on the web, such as ENTITY, RELATION, MATRIX, LAYOUT, OTHER tables. A machine learning algorithm (Random forest) classifies these types. The classifier uses several features such as an average number of cells, image elements, header elements, cell length deviation, etc.


ENTITY TABLE - referring to a single entity

ENTITY TABLE – referring to a single entity


LAYOUT TABLE - positioning elements in the web page

LAYOUT TABLE – positioning elements in the web page


MATRIX TABLE - taking more complex relations

MATRIX TABLE – taking more complex relations


RELATION TABLE - taking more instances of the same class

RELATION TABLE – taking more instances of the same class


OTHER TABLE – tables where one is unsure about the content

OTHER TABLE – tables where one is unsure about the content

WEB table header classification

Once we identify the table type, we have to locate the table header. One would object that all header cells are marked with an <th> element. Unluckily, that is not true. Thus, a classification method (again Random Forest) was chosen in order to predict whether a table column/row is HEADER/DATA column/row. Table understanding depends a lot on a correct header location.

Table understanding

The final process is to mine relations among entities from the table. The relations are derived from a table annotated with header location marks. More specifically, the reconstruction of relations uses heuristic rules, resulting in a graph of entities, as shown in the following figure. The MobilePhone is a class. RAM and Item Weight are properties belonging to the MobilePhone, and they have a Quantitative value as its range. Finally, iOS is an instance of an OS class (Operating system) and belongs to the MobilePhone class.

Reconstruction of the relations in the table.

Reconstruction of the relations in the table.

What is the application?

This method can be applied when building domain specific knowledge databases that should be later integrated with more general ontologies/concepts. More domain ontologies are learned by crawling sites with similar content (like mobile phones on,, etc…). Derived ontologies differ in structure and content. Therefore, methods for merging the ontologies should be the next step in the project.

Part of the ontology generated by crawling

Part of the ontology generated by crawling

New projects – join us!

Jan Sedivy
March 13, 2017 at 8:03 am

eClub will again organize the Summer Camp (ESC). The ESC 2016 was incredibly successful. Five eClubbers worked on a question answering system YodaQA. At the end of summer, they have entered the Amazon Alexa Prize competition and got in between top twelve selected for development of a social bot. They received $100k scholarship for a social bot development. Currently, they are busy working on the first version of Alquist.

We would like to continue in the direction of developing dialog applications in ESC 2017. The social bot task is very challenging, and many required technologies are still in development. While developing YodaQA, we have looked at the well known classical NLP algorithms as well as to new mainly neural networks based ones, such as LSTM, GRU, etc. to process text in many different ways.

It is beginning of March, but we are already prepared to incubate new, eager students interested joining us on this journey toward smarter systems. We have strong support not only from Amazon but also from the local company Seznam. Seznam is one of the few competing successfully with Google on the domestic market. They are 100% Machine Learning company with a lot of problems of mutual interest.

Here is a sneak preview of the projects for this year. Join us, and work with us tomorrow, if you are interested. We are offering scholarship equivalent to what you would earn working for a company. We are moving to the new CVUT building, it is gorgeous, as soon as it opens, it is just a matter of weeks. Join us! You can start any time.

Automatic email reply generation
In this project, we want to research methods for automatic generation of short responses to emails or social networks messages. Specifically, on a cell phone, it can be a great advantage to select from a selection of semantically diverse replies. We want first cover few words long messages. The initial steps will include a review of Recurrent Neural Networks architectures and a meaningful training set construction.

Amazon Echo conversational application
We have a set of tasks we would like to cover as a spoken dialog. We want to design interactive conversational bots for Amazon Echo. We want the application to be engaging, entertaining and informative, bringing the user latest news from specific areas, such as sports, celebrities, movies, etc. This project is OK for students who are only entering the field with no or small experience.

Knowledge extraction
There is vast of information on the Internet. A lot of the information is in the form of a text; the information is unstructured. In this project, we want to review the methods for retrieving and extracting the information, learning the dependencies of statements in the texts. We want to create ontologies from a selected, limited content and store the knowledge for further use. These are very challenging problems but do not hesitate to join. We have students currently working on these topics. We know what the first steps are.

Text summarization
Journalists write the Internet news in a particular language, frequently using idioms, slang or infrequent expressions. In this project, we want to extract what is important and create a summary in a clear language. Initially, we want to summarize long sentences. Next, select a suitable method, implement and test it on a chosen domain.

Events Extraction from Text
This project is only an extension of the previous one. We want to design and implement a system for extracting events from the Internet. The primary goal is selecting news messages based on identified topics (or events). Extraction of economic events like mergers & acquisitions, stock splits, dividend announcements, etc., play an essential role in decision making, in risk analysis applications and monitoring systems.

Young Transatlantic Innovation Leaders Initiative

Radka Lamková
January 19, 2017 at 4:44 pm
young innovation leaders - sedivy


Emerging entrepreneurs and innovative community leaders can now apply for the 2017 Young Transatlantic Innovation Leaders Initiative (YTILI).  Sponsored by the U.S. Department of State., it will give 100 outstanding leaders from across Europe the chance to expand their leadership and entrepreneurial experience through fellowships at businesses and civil society organizations across the U.S. in 2017. Selected Fellows will build networks and lasting partnerships to attract investments and support for their entrepreneurial ventures.

Applications are due by February 6, 2017.  You can find more information on the YTILI  Fellowship eligibility criteria and the application instructions at:

In addition to the YTILI Fellowship Program, the YTILI Network will offer ongoing professional and networking opportunities to anyone interested in being connected.  As part of the YTILI Network, you’ll have the chance to connect with senior business and NGO leaders in the United States and Europe working to create change in their communities.

Copied from ytili