How to use Near Real Time Search in Solr

As you might know Solr has prepared a cool new feature for its release 4.0: Near Realtime Search. With this new feature our search engine will be able to perform in-memory commits a.k.a. soft commits without having to perform a real commit that can cause some seconds of bad performance to your users.

If you want to install this new feature in the Solr installation first you will have to work with the Solr trunk since the feature is not released yet (Solr is still in the version 3.6, hopefully they will release 4.0 at the end of this year). So if you want to use some of the versions of Solr trunk you must read the nightly builds section where you will be able to download the most suitable version for your pruposes. In my use case I have decided to use the apache.snapshots repository for maven.

In the <repositories> section of your pom.xml you must add:

<repository>
  <id>apache.snapshots</id>
  <name>Apache Snapshots Repository</name>
  <url>http://repository.apache.org/snapshots</url>
  <releases>
    <enabled>false</enabled>
  </releases>
</repository>

And in the <dependencies> section of your pom.xml you must add:

<dependency>
  <groupId>org.apache.solr</groupId>
  <artifactId>solr-core</artifactId>
  <version>4.0-SNAPSHOT</version>
  <scope>compile</scope>
  <type>jar</type>
</dependency>

With this two changes you can run your code with the new version of Solr so it’s probable that you might find a large (if you’re lucky not so large) list of incompatibilities. In my case I had to perform this changes in the solrconfig.xml:

  1. The section <indexDefaults> and <mainIndex> is replaced by a unique section called <indexConfig>.
  2. You must change the lucene version in the tag <luceneMatchVersion> to LUCENE_40.
  3. All the lib dependencies must point to libraries from solr 4.0 so you must download a compiled version of solr 4.0 to have this libraries in your installation. <lib dir=”${link_to_your_installation}” />
  4. Add the 4.0 UpdateHandler <requestHandler name=”/update” class=”solr.UpdateRequestHandler”/> in the resuestHandler’s area. This updater is designed for the new version and allows updates via JSON and XML at same time.

Furthermore, in the schema.xml I had to change my instances of ISOLatin1AccentFilterFactory by ASCIIFoldingFilterFactory. ISOLatin1AccentFilterFactory was the filter responsible of transforming your accented characters in the ISO Latin 1 character set (ISO-8859-1) by their unaccented equivalent. ASCIIFoldingFilterFactory is a little bit more complete but in my case I wanted to preserve the same behaviour so I renamed my file mapping-ISOLatin1Accent.txt to mapping-FoldToASCII.txt.

Also is very probable that you must do some changes in your code since there are lot of changes between solr 3.x and 4.0. In my case I had to change a little bit the way of initializing the EmbeddedSolrServer that now it obligates to have defined a solr.xml with the cores you want to create.

Once you have your compatibility issues solved you can write your soft commit code. After looking for different options to perform the soft commit I have followed the instructions in this guide (UpdateJSON). So in this way you can send your data to Solr in JSON via curl much easier than using SolrJ or a curl update via XML.

For each SolrInputDocument that you want to commit you must create a json version of it:

JSONObject j = new JSONObject();
for(String key : doc.keySet()){
  Object v = doc.getFieldValue(key);
  if(v != null){
    j.put(key, v.toString());
  }
}

This would be the structure of the JSON you have to send:

{
  "add": {"doc": {"id" : "TestDoc1", "title": "test1"} },
  "add": {"doc": {"id" : "TestDoc2", "title": "another test"} }
}

And finally you can use the solr.UpdateRequestHandler to perform a soft commit via curl:

String cmd = "curl 'http://localhost:8389/solr/update?commit=true&softCommit=true' " + 
             " -H 'Content-type:application/json' -d '" + jsonObject.toString() + "'";

So with this changes from now you will be able to see and search documents in real time in your index. This kind of feature is perfect for applications where the user interaction is the center. Have in mind that soft-commits are in-memory so from time to time you will have to perform a real commit or generate a full import with your user data after a crash. Feel free to comment if you follow the tutorial.


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s