We talked recently to various people interested in providing search over parliamentary data in other Parliaments. One thing that I hope came across when we described search on the site is that search doesn’t have to be complicated or expensive to implement, and certainly shouldn’t be complicated for users.
By keeping our pages simple and self-descriptive and by making sure that each one reflects a logical piece of content (e.g. one debate rather than a physical page from the original Hansard volume), we try to make relevant information from Hansard easy to find from search engines outside the site.
The search on the site itself is implemented through Solr, a web service wrapper around the Lucene Java search engine library, a long-established Open Source project. Solr supports faceted search, so we can show people using the site how the speeches relevant to their query break down over time, by speaker and by the type of debate they appeared in, and let them use these facets to home in on specific results.
We integrate our Rails application with Solr using the acts_as_solr plugin. We’ve made a few changes to the plugin, mostly to speed up the process of indexing content, but basically we’re using Solr out of the box. We have 13 million speeches indexed at the moment, and queries on the site usually return in under 10 seconds.
Our initial focus was on making sure that our internal search was providing something useful. Now we’ll also be working on making it speedy!