The data growth is enormous and we need to develop infrastructure and tools for processing and extracting the information. This is the main areas of what we do:
- Big Data and Analytics are these days almost synonyms. We focus on many of these aspects.
- We offer the Metacenter cluster with large number of computers and storage.
- Data center optimization and modelling of power usage in virtualized datacenters.
- Autoscaling / Hadoop scaling / cloud, web. prog.
- Monitoring and visualization of infrastructure in OpenStack / cloud,
- Combine the Apache Spark and REST API for large AI systems such as Question Answering engine
Big Data hints
We have a great introduction to the Big Data. The materials come from the CVUT course Big Data Technologies introduces (BDT) course. All lectures and the accompanying materials are available on line.
- BDT introduces the basics for creating account, locating data etc. in the Metacentrum – Large data center, we use in our research.
- BDT introduces to the Hadoop and Mapreduce, includes practical examples of the simplest algorithms, such as dictionary creation, histogram of words, inverted index for full text search, a simple HBase usage.
Our interest in analytics of Big Data includes a lot of different directions. Here is a list of the technologies we have worked on recently:
- Search results ranking – SERP ranking
- Advanced search query processing
- Learning to Rank algorithms for ordering SERP or Questions
- Categorization of text documents, product description
- Search engines based on Solr, Elastic Search etc.
- The automatic categorization and catalog generation from e-shops web pages
- PIcture categorization
- REST APIs for analytics, creation and testing on Amazon Web Services
- Creation of training databases
- Selection and utilization of the Deep Neural Network Frameworks (Caffe, Keras, Tensorflow etc.)
- Running experiments
- Spam filtering – classification for filtering mail, newsletter, phishing etc. spam.
- Focused WEB crawling – find all mentions of a XY item.