[3/25] Building a RSS Reader with Quartz Plugin – Grails Tutorial

Are you reading my blog’s feed? Be the first to know when I publish some interesting article signing up to my feed and following me on twitter!

 

Groovy Version: 1.6
Grails Version: 1.1
Plugin Version: 0.4.1-SNAPSHOT
Plugin Docs: http://grails.org/plugin/quartz
Download Resources: source code screencast 1 screencast 2

Hello,

In this tutorial, we’ll talk about the Quartz plugin used to schedule jobs executions in your application. The plugin is build on top of the Quartz Job Scheduler Library from OpenSymphony. OpenSympony is the company that built the WebWork framework, that is now called Struts2 after Apache “aquisition”.

“Scheduling jobs” is very useful in your application to cover background needs. Some tasks you’ll need to execute undercover your application some times (invalidating old users that have not logged for more than 1 month) or even async processes that you’ll have to do if you do not have a JMS infrastructure, for example, sending e-mails to a lot of people.

In our example, we’ll build a simple RSS Reader that will use the quartz plugin to schedule fetchs it will be done in the feeds and insert in the database. Our application will mainly have one domain class called Post (seen in the last tutorial), a Feed domain class to store our feeds and a similar RSS Parser from technorati.  (Yes, I love the RSS format).

Initially, we’ll create the app, install the quartz plugin, create the domain classes and the Feed scaffold structure

grails create-app feedreader
cd feedreader
grails install-plugin quartz
grails create-domain-class Post

We’ll insert the Post domain class code

class Post {
    String title
    String link
    String body
}

We have to create the Feed domain class and its scaffold structure.

grails create-domain-class Feed
class Feed {
    String word
    String url
}

Scaffolding…

grails generate-all Feed

screencast-1
screencast

After this, we’ll create our Technorati Feed Parser from this code above.

class TechnoratiService {
    boolean transactional = false
    def parseAndSave(rss) {
        def rssObj = new XmlSlurper().parse(rss)
        rssObj.channel.item.each {
            def post = new Post(title: it.title.toString(),
                    link: it.link.toString(),
                    body: it.description.toString())
            post.save()
            println "Post [${post}] saved."
        }
    }
}

We’ll run our application using the grails run-app command and insert some feeds. Note that we’ve configured our datasource to use hsqldb storing in the filesystem instead of regular memory setup. 

Note that we have one JobController that Quartz install for us, forget about it, ok? We’ll create our own job after the second screencast.

screencast-2
screencast

Now, we have to understant some quartz properties and commands. 

When we install the quatz plugin, it installs another command for us the grails create-job MyJob, with it we’ll create our FeedParserJob. Note that we use convention over configuration with all jobs having *Job names. 

grails create-job FeedParser

Job classes have to implement the execute() method. This method is the one that Quartz will trigger when it’s time to execute the job. To define when the job it will be executed and what’s the interval between executions, I suggest you read the plugin documentation witch shows N ways to do this. In our example we’ll use a cron expression similar to *N*X OS systems setting our job to execute once in five minutes.

Our cron expression will be like this:  “0 0/5 * * * ?”

Depending on your jobs requisites, it may run concurrently with another instance of it or not. In our case, we’ll not start other job execution if the last on is still running. To prevent this behavior, we can set the concurrent property to false

def concurrent = false

Our job will essentially look for the feeds we’ve inserted on the database, and for each one it will call the Technorati service asking for new Posts. The final source for our job is the one below:

class FeedParserJob {
    def concurrent = false
    def cronExpression = "0 0/5 * * * ?"
    
    def technoratiService

    def execute() {
        def feedList = Feed.findAll()
        for (Feed feed : feedList) {
            println "Reading feed ${feed.word} @ ${feed.url}"
            technoratiService.parseAndSave(feed.url)
        }
    }
}

As you can see in the example above, you can inject any spring bean in your job, just declare it as I did with my TechnoratiService! :) (this is really great!)

That’s it, if you run you application you’ll see that every 5 minutes (minutes 0,5,10,15…) the job will be called and every posts technorati returns will be inserted on your database. Note that in this simple example we did not check if the post had been already inserted in the database before inserting it, this will just grow our database with a lot of instances representing the same post. This can be avoided checking if the post already exists before inserting it  (just check if you have any Post with the same link), but I’ll left this for you!

Before finish this, let’s just improve a little bit our post list view.

 

Tela de posts

 

 

Now, try to enrich its interface, adding some ajax to get only the new posts since the last fech! Maybe you can start from this your new Google Reader killer! :P

Now, let me know, are you using this plugin in your production environment? What for? What kind of jobs you do with it? 

Thanks!

Are you reading my blog’s feed? Be the first to know when I publish some interesting article signing up to my feed and following me on twitter!

 

Next tutorial: [4of25] Jasper Plugin

Past tutorials:
        [2of25] Searchable Plugin
        [1of25] AcegiSecurity Plugin

4 Comments so far

  1. Rob James on April 2nd, 2009

    Well done Lucas, another great tutorial!! Keep it up.

    I just wanted to add a couple of things that I had issues with in this plugin that may help others. Firstly, if you set the CRON expression on the job, this will schedule the job to run at certain intervals (such as your example above will run every 5 minutes), and is set at startup. You can also schedule a job at runtime. To do this, you will need to import the following libraries (in your controller or service);

    import org.quartz.JobDetail
    import org.quartz.TriggerUtils
    import org.quartz.SimpleTrigger
    import org.quartz.Scheduler

    and then do something like this
    def quartzScheduler = new Scheduler()
    def myScheduledJob = new JobDetail(”My Job Name”, “My Job Group”, nameOfJobClass)
    quartzScheduler.addJob(myScheduledJob, true)
    def trigger = new SimpleTrigger()
    trigger.startTime = new Date() //set to when you want the job to run
    trigger.name = “My Trigger Name”
    trigger.setJobGroup(”My Job Group”)
    trigger.setJobName(”My Job Name”)
    quartzScheduler.scheduleJob(trigger)

    The other thing that you may want to do, is pass variables to the job. For example, in your RSS example above, you may want to pass the RSS Feed URLS to the job, so the job will only query those URLs – this is also possible by passing data in the jobMap.

    To do this, you can simply add the following line after the first line of code above;
    jobDetail.getJobDataMap().put(”MyVariableName”, “MyVariableValue”);

    The above passes a string (but You have other options – see Quartz doco). Then to use this in your Job class, import the following classes;

    import org.quartz.JobExecutionContext
    import org.quartz.JobDataMap

    And in the execute() method, do the following;

    void execute(JobExecutionContext context) {
    JobDataMap dataMap = context.getJobDetail().getJobDataMap()
    def myParsedVariable = dataMap.getIntegerFromString(”MyVariableName”)
    }

    Notice how we used the execute method that allows passing the context, this is not the default one used when creating jobs.

    Anyway – I hope this helps others if they need to know this, and BTW: I am sure there are many ways of handling this, but this is the method that I used.

  2. lucastex on April 2nd, 2009

    @Rob

    Excellent Rob! Thanks so much!
    When we need some context in our jobs, (e.g. in the e-mail example I gave), we can use the context.

    This is very very useful to avoid jobs querying the database for more info.

    I’ll try to format the source code you’ve posted here.

    Thanks again!

  3. Sergey Nebolsin on April 6th, 2009

    Btw, there’s more simple way of dynamic jobs scheduling in SNAPSHOT versions of the plugin.

    It’s described here: http://www.nabble.com/Quartz-Plugin%3A-Dynamic-Jobs-Scheduling-Implemented-td18643469.html#a18643469

  4. Ryan Green on May 20th, 2010

    RSS Feeds are really very helpful and you could get site and news updates from it.~-”

Leave a Reply

Web Analytics