bigdataBIG DATA

Big Data

The data growth is enormous and we need to develop infrastructure and tools for processing and extracting the information. This is the main areas of what we do: 

    • Big Data and Analytics are these days almost synonyms. We focus on many of these aspects.
    • We offer the Metacenter cluster with large number of computers and storage.
    • Data center optimization and modelling of power usage in virtualized datacenters.             
    • Autoscaling / Hadoop scaling / cloud, web. prog.
    • Monitoring and visualization of infrastructure in OpenStack / cloud,
    • Combine the Apache Spark and REST API for large AI systems such as Question Answering engine

Big Data hints

We have a great introduction to the Big Data. The materials come from the CVUT course Big Data Technologies introduces (BDT) course. All lectures and the accompanying materials are available on line.

    • BDT introduces the basics for creating account, locating data etc. in the Metacentrum – Large data center, we use in our research.
    • BDT introduces to the Hadoop and Mapreduce, includes practical examples of the simplest algorithms, such as dictionary creation, histogram of words, inverted index for full text search, a simple HBase usage.



Our interest in analytics of Big Data includes a lot of different directions. Here is a list of the technologies we have worked on recently:

    • Search results ranking – SERP ranking
    • Advanced search query processing
    • Learning to Rank algorithms for ordering SERP or Questions
    • Categorization of text documents, product description
    • Search engines based on Solr, Elastic Search etc.
    • The automatic categorization and catalog generation from e-shops web pages
    • PIcture categorization
    • REST APIs for analytics, creation and testing on Amazon Web Services
    • Creation of training databases
    • Selection and utilization of the Deep Neural Network Frameworks (Caffe, Keras, Tensorflow etc.)
    • Running experiments
    • Web Page JavaScript programming for the text extraction
    • Spam filtering – classification for filtering mail, newsletter, phishing etc.  spam.
    • Focused WEB crawling – find all mentions of a XY item.