Released the date-extraction library for Ruby

Hi all, On my github account you will find a Ruby version of the date.extraction library: https://github.com/raimonbosch/ruby-date-extraction This is a project that uses regular grammars in order to understand raw dates and convert them into the string format:“%Y-%m-%d %H:%M:%S“ although it can deliver also the timestamp format or any format you may need. The methodology can…

UCC 2016, Shanghai, A Methodology for Full-System Power Modeling in Heterogeneous Data Centers

  On 6th of December I was in Shanghai presenting one of our last papers that we wrote at Barcelona Supercomputing Center for the 9th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2016). Our current research work focuses on the area of Energy-aware Management for Modern Distributed Computing Systems. Our goal is to develop management algorithms for…

Multihost configuration on Docker

Entire tutorial here: http://chunqi.li/2015/11/09/docker-multi-host-networking/ This is an excellent alternative if your system is based in Docker and you want to enable communication between dockers placed in different hosts. Using the overlay network each docker will get a 10.0.0.X IP that will be visible by all containers regardless of its host. In our use case we have used…

Probabilistic actions to extract artists from a text

Author: Raimon Bosch Abstract One of the main problems of software technologies is that often they only provide one possible solution for a problem. This obligates us to create human validation systems to cover this cases that are not that general, those exceptions that a computer can’t detect. But what if we provide several solutions…

Sentiment Analysis: Incremental learning to build domain models

Raimon Bosch, Master thesis – Intelligent Interactive Systems, Universitat Pompeu Fabra (2013), Prof. Dr. Leo Wanner Abstract Nowadays, social contacts are vital to find relevant content. We need to connect with people with similar interests because they provide content that matters. Every day is more clear that in the future of document recommendations will be necessary to…

Personalization on search engines using social signals

Raimon Bosch, Departament de Tecnologies de la Informació i les Comunicacions (DTIC), Pompeu Fabra University Abstract This is a study analyzing the different state of the art techniques to generate personalized search results. We will focus on how user’s interactions in social networks are being used to improve user’s experience. We will also investigate if sentiment analysis…

Text Categorization with K-Nearest Neighbors using Lucene

Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. Text categorization is a complex problem to solve, for solving it you need to provide a variable for each important word in your text. Maybe not stopwords or very common…

How to use Near Real Time Search in Solr

As you might know Solr has prepared a cool new feature for its release 4.0: Near Realtime Search. With this new feature our search engine will be able to perform in-memory commits a.k.a. soft commits without having to perform a real commit that can cause some seconds of bad performance to your users. If you…

How to create a Solr index and speed up your data

If you are designing a website and you want to have a solid backend Solr is an exceptional choice not only because its search capabilities and all the integration with the lucene ecosystem also because its capacity to shard your data and get very good response times. But which is the best approach in order…