What was the aim of the work?
hansardsearch on heroku.com was a prototype used to examine –
- methods of consuming and indexing contemporary Hansard published on the web
- separating the index data from the content
- methods of displaying information from the generated index
What is the current status of the work?
Currently a site exists at http://hansardsearch.heroku.com. It presents a publicly visible basic HTML front-end to the generated index using lightweight Sinatra code over the top of a solr index.
Simple queries can be made over the data set and results are displayed in a style similar to the majority of public search engine results.
Each returned result consists of –
- a title, deep-linked to the result on parliament.uk
- a generated breadcrumb trail, linked to containing pages
- a generated Hansard reference
- generated metadata, where possible: for example, PQ numbers
- a text excerpt, with the query term highlighted
- generated named contributors, where possible
Basic generated faceting is also available, where the result set supports it. A result set can be faceted to show only Written Answers; Debates and Oral Answers; Written Ministerial Statements or all results.
All data used was scraped or otherwise implied from the Hansard web pages on published on the official parliament.uk site.
No publicly-visible work has been done on the site since 2011. There is no intention to do any more work on the site at present.
There is no substantial risk in allowing the site to continue, so there are currently no plans to close it. However the site is contingent on current free-to-use services from Heroku and WebSolr – if either of these service is withdrawn (or substantially altered), the site is unlikely to be revived.
The source code for the front-end is is currently available at https://github.com/lizconlan/websolr-demo