RDF databases

Filip Masri September 19, 2016 at 10:39 am


We work on many different projects in eClub during this summer. I am in the Chatbot team and  I am going to briefly share my task.


What is the project about?

The general goal of the project is creating a chatbot application that helps users to choose a suitable smartphone. The whole project is divided into several sub-problems: application communicating with a client in a specific workflow, entity recognition for parsing the intent of the client’s request and finally retrieving relevant information from a domain targeted database. I am focusing on analysing and solving the last sub-problem – creating a database with smartphone data. This problem can be divided into several parts. First, smartphones information retrieval, which is extracted from a chosen domain. To effectively store the retrieved  information we need to create a good database schema. Once the schema is created the extracted data have to be transformed into a relevant format, properties have to assign to relevant data types. Finally, the data is uploaded to a database. The task seems pretty simple. However, it is interesting, since I have chosen to use an RDF database and SPARQL query language. A combination that is becoming popular for storing the knowledge in semantic web and linked data.


What is the RDF and why we use it?

Resource description framework (RDF) is a framework based on triples. A triple consists of subject-predicate-object. The subject is the source node of the relation, the predicate is the edge and object is the destination node. Triple is also called a statement. A statement generally captures a relation between two entities. So RDF database is a storage of different statements which are interconnected, creating a directed graph. Such structures, also called Linked data, are only mirroring the entities relations in the real world. It is easy to understand because it is close to human thinking. Also, the subject-predicate-object syntax is the same as the usual sentence structure which could be relevant to text processing applications. More to read here.


What is SPARQL and why we use it?

SPARQL is a query language that can be used for querying RDF databases. It is human readable (easy to understand even for non-programmers) and can simply form complex queries which are a big advantage compared to SQL.
More to read here.


Is it worth it?

RDF + SPARQL is really easy to use and also extensible. Extensible in a way that you can combine your data/queries with ontologies available on the web, such as http://schema.org. However, our knowledge domain is quite specific (Mobile Phones data) and small, consequently  the next question would be: “What about the performance?”


What about the performance?

According to the Berlin benchmark, a combination of SPARQL + RDF is better than SPARQL + Relational database. However, the results show that SQL + Relational database is much faster than  SPARQL + RDF for certain queries. And that could be a stumbling block for the  SPARQL + RDF combination in domains with few entities in a production environment.


And further steps?

I am going to compare the performance of SPARQL + RDF database with SQL + Relational database on my server. The SQL + Relational database will be a simplified database of mobile phones.


I hope you found this article interesting! Have a nice rest of the day :)