[2/25] Searchable: Full text indexed search in grails

Are you reading my blog’s feed? Be the first to know when I publish some interesting article signing up to my feed and following me on twitter!

Introduction

Groovy Version: 1.6
Grails Version: 1.1
Plugin Version: 0.5.3
Plugin Docs: http://www.grails.org/plugin/searchable
Download resources: source code screencast

-

Overview

The Searchable Plugin provides integration between Grails and, IMO, one of the most powerful open source libraries that we have. The Apache Lucene Project. I must admit that I’m a Lucene Lover, since my last project where I was leading a technical team for the largest brazilian e-commerce company and fourth worldwide. The project was totally lucene-driven to store everything you see there (yes, no database, believe me!); products, prices, categories, everything. Of course, the integration processes running backstage took all responsibility for update product prices and other stuff. For this project, we also used other important frameworks such as Apache Solr. I recommend you all look into Apache Lucene. It’s the base of the Compass Project, that is the framework that the Searchable Plugin integrates into our app.

All of this will provide us an excellent indexing tool to index our domain classes that will be searchable across our application. Searching in the Lucene index is infinitely lighter and faster than doing a “LIKE” select in any kind of relational database, and that’s why it is so awesome. So, let’s do it!

-

Download and Install

To do this example, we’ll create an application that searches in our posts archive! I’ll not save a lot of fake news articles in our bootstrap (as everybody is used to). I’ll use this tutorial to also show how to read a remote feed/rss! So, I will ask technorati what people are writing about groovy, and we’ll search on this database, I believ that this is a more realistic example :)

We’ll have a simple Post class that has only the post title, link and text, and make it searchable.

Creating the application

grails create-app postsearch
cd postsearch
grails install-plugin searchable
grails create-domain-class Post

This is the Post class

class Post {
    String title
    String link
    String body

    static searchable = true
    static constraints = {
        //constraints...
    }
}

Note that doing this:

static searchable = true

we are telling the searchable plugin that all instances of this domain class have to be indexed so we can search it later.

Take a look, now in action:

screencast

-

Technorati Integration

To get the technorati feed we’ll use to search, I build a simple class that will get the search results feed and iterate over the results and save one post for each entry. On technorati, I’ll search the following words: groovy, grails, java, griffon, springsource, g2one, acegi, groovyws, and codehaus. This will give us approximately 200 posts. I’ll create a simple controller that will just do this.

grails create-controller technorati

and this is its content

class TechnoratiController {
    def index = {
        def totalPosts = 0
        def wordList = ['groovy', 'grails', 'java', 'griffon', 'springsource',
                'g2one', 'acegi', 'groovyws', 'codehaus'].each() { word ->
            def rss = "http://feeds.technorati.com/search/${word}"
            def rssObj = new XmlSlurper().parse(rss)
            rssObj.channel.item.each { item ->
                def post = new Post(title: item.title.toString(),
                        link: item.link.toString(),
                        body: item.description.toString())
                if (post.save())
                    totalPosts++
            }
        }
        render "${totalPosts} posts indexed"
    }
}

Maybe we can turn this into a plugin later! :) That’s it, no view for it, we just need to request it to feed our database.

-

Searching with SearchController

After this you can go to the SearchableController that is installed in our application:

http://localhost:8080/postsearch/searchable

Try searching for “grails” or any other word that may have been in our technorati posts.

Note that this view uses the toString() method, so lets beautify it.

String toString() {
    return "${title}: ${body}"
}

SearchController screen

-

Changing the way fields are indexed

Our Post class is indexed using the default configuration for the Searchable plugin and that’s not the best way since the post URL is indexed as well and currently has the same relevance as its title (this is wrong, believe me). IMO, the link should not be indexed, just the title and the text of the post, and the title is much more important that its description.

To do this, we’ll use some plugin options. This plugin has A LOT of options, (it deserves a book of it, really), and all the options are described here. I strongly recommend you to read this if you use this plugin in your production environment.

Here we’ll just stick to the basics, we’ll exclude some properties being indexed and boost one field (title) that is more important. This means that when you search for “grails”, posts with “grails” in the title will come with a higher score than posts with “grails” only in the body of it.

Excluding link from being indexed

This is easy! We’ll change the static searchable = true for this one with the ‘except’ property.

static searchable = {
    except = ['link']
}

That’s it, no link will be indexed anymore. It’s recommended to index ONLY properties you really ‘ll need, otherwise your lucene index can grow to be quite large.

Boosting the title

This is easier (I don’t remember anything difficult using grails) than the last one, we’ll add the property boost to our title, and this is the final mapping closure:

static searchable = {
    except = ['link']
    title boost: 2.0
}

This will give our searches what we really want.

-

Searching – Domain classes

After installed, the plugins offer us (for domain classes marked as searchable) some methods that will search on the index. Here I’ll explain some of the most important ones.

search

The main method of this plugin. Will search across all instances of this domain class for the requested string (and options)

def postsListSeachResult = Post.search("grails")
def postsListOrderedSearchResult = Post.search("grails", [sort: 'title'])

Remember that ordering searches is not a good idea since you will lose all effective relevance-based scoring that lucene gives to each hit entry.

countHits

This method returns just the number of hits that your query retrieved in the index, useful to know how many entries will be returned if the search method was used instead. You can use as search method.

[groovy]def postsListSeachResultCount = Post.countHits("grails")
def postsListOrderedSearchResultCount = Post.countHits("grails", [sort: 'title'])

moreLikeThis and suggestQuery

“moreLikeThis” and “suggestQuery” (aka spell checking) can be done easily with Seachable Plugin, all you have to do is set these properties to the mapping closure.

Take a look here and here for more information.

-

Conclusion

This plugin is one of my favorites. If you’re planning a grails website in a production environment, this one will be your friend.

Ohh remember that this plugin is much more powerful than shown here, most configuration options available for Compass and Lucene have not been demonstrated here. This is just a small part of it!

Are you reading my blog’s feed? Be the first to know when I publish some interesting article signing up my feed and following me on twitter!

Last Tutorial:  [1/25] Acegi: Secure your grails application with no pain

Next Tutorial: [3/25] Quartz: Easy job scheduling plugin.

13 Comments so far

  1. Henrique Weissmann (Kico) on March 25th, 2009

    Excelent tutorial! Thanks!

  2. Robert Anderson (ranophoenix) on March 25th, 2009

    Thanks! Very good!

  3. Uday on March 26th, 2009

    Hi,

    I was using the searchable plugin, but when i used many-to-many mappings the plugin started throwing Hibernate errors. Have you had an opportunity to try a many-to-many mappings and search using the searchable plugin?

    thank you.

  4. robjames on March 26th, 2009

    Well done Lucas! Another great tutorial, and good to see the code snippets and screencasts are now embedded!!

  5. Antoine on March 31st, 2009

    Once again, a very nice tutorial.
    I would be curious to know how to use Lucene to store and retrieve any data of an application…

  6. lucastex on March 31st, 2009

    @Antoine

    That’s what we did here! :) using the search method, you can retrieve any information indexed! :)

  7. Antoine on April 8th, 2009

    I was more curious about the storage without a database, just Lucene. In this case, we store the Post objects in a database.

  8. Cuneyt on May 1st, 2009

    Excellent implementation – couldn’t have gone smoother. What web site did you work on in Brazil? My company’s CMS is used for Estadao and Canal 13 in Chile. Cheers mate!

  9. Cuneyt on May 1st, 2009

    And I’d hate to ask, but where can we find more documentation on the returned searchResult object, as I’m looking to display fields/properties of each “hit” in the view. Cheers.

  10. Cuneyt on May 1st, 2009

    Forget it, I was able to dig it out. Obviously, it’s getting a bit late in my timezone!

    ${result.getProperty(“title”)}

    Perhaps for newbies, this would be a nice addition to the “Controller / View” section.

  11. Sreeraj on December 16th, 2009

    Great tutorial!! But I have a doubt. How can we implement partial word search using searchable plugin (without using wild card search). ie. If I search for ‘wash’ it should return washington.

  12. lucastex on December 16th, 2009

    Yes you can. You’ll change this using different analyzers in the corresponding fields you want. Take a look in the different Analyzers lucene provide and the other options in indexing runtime like Tokenization.

  13. Sreeraj on December 16th, 2009

    Thanks for your reply.
    Yes I have tried different analyzers. Some forums are telling it can be done using NgramTokenizer. But Its not working for me. May be I am doing in the wrong way. Can you show an example how to use NgramTokenizer.

Leave a Reply

Web Analytics