Plug into HyperEstraier with acts_as_searchable
Posted by marcel April 06, 2006 @ 09:18 PM
Patrick Lenz has announced his acts_as_searchable plugin which integrates ActiveRecord models with HyperEstraier, an open source fulltext search engine.
It’s available as a gem so you can just do sudo gem install acts_as_searchable.
You can then take a look at the API docs, which provide a few examples.
Full text searching just got as simple as:
class Article < ActiveRecord::Base
acts_as_searchable
end
Article.fulltext_search('biscuits AND gravy')

Nice!
How does HyperEstraier compare to Ferret? Ferret is a Ruby port of the Apache Lucene project, and I’m wondering if anyone has done a good comparison. Both look great, and my main curiosity is if anyone has any suggestions on when to use one vs. the other.
Clearly, implementing this gem with acts_as_searchable makes it dead simple for Rails developers.
Note, to install ferret:Ferret integrates just as easily into Rails with the plugin acts_as_ferret. Ferret is actively beeing developed by David Balamin (here) and acts_as_ferret by Jens Krämer and myself (here).
Yeah, I’d like a comparison too.
One great thing about HyperEstraier is it scales really well if you have a lot of data to index with it’s built in P2P clustering of index servers.
First look at the HyperEstraier setup docs is a bit scary. I wonder if there is a quickstart guide?
I’m certainly concious of the scalability issues, but at this point I want something nice and simple to implement. I’m a wimp like that.
pop this into any active record class def you have if you want something simple to implement ;)
Is it possible to add new indexes runtime for hyperestraier? Dealing with multiple clients, I’d like to be able to use “an index per client” rather than “an index per model type”.
Of course, the point of having an index is to find things quickly. While working, the code random8r provided doesn’t use indexes and will therefore resort to a full table scan to locate the matching records, potentially multiple times for lots of keywords.
Of course, the point of having an index is to find things quickly. While working, the code random8r provided doesn’t use indexes and will therefore resort to a full table scan to locate the matching records, potentially multiple times for lots of keywords.
I’m confused. What are the various tradeoffs involved in considering HyperEstraier compared to Ferret compared to Tsearch2 (postgres) or mysql’s own full text index? I know that adding acts_as_searchable is convenient, but I’m curious about overall complexity, capabilities, etc.
In the app I’m developing I am using tsearch2 with a series of UNIONs to query the various models based on a single search box… with an ORDER BY at the end to sort by the rank that tsearch2 assigns.
I realize that tsearch2 is highly database specific, but it’s fortunately quite simple to activate, implement and use.
Any thoughts on the various tradeoffs would be much appreciated!
I’m confused. What are the various tradeoffs involved in considering HyperEstraier compared to Ferret compared to Tsearch2 (postgres) or mysql’s own full text index? I know that adding acts_as_searchable is convenient, but I’m curious about overall complexity, capabilities, etc.
In the app I’m developing I am using tsearch2 with a series of UNIONs to query the various models based on a single search box… with an ORDER BY at the end to sort by the rank that tsearch2 assigns.
I realize that tsearch2 is highly database specific, but it’s fortunately quite simple to activate, implement and use.
Any thoughts on the various tradeoffs would be much appreciated!
Here’s another curious wonderer. Some information on the Ferret vs. Estraier debate is up at http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/574efe4d2a43eb01 though it certainly doesn’t seem to be exhaustive or authorative.
Anyone already in posession willing to mirror the windows Binary?
Thanks Kasper and everyone else who has posted information about Ferret, which is based on the Lucene search engine. I’ve read some interesting notes about Lucene and its technology. Proximity searching is one of the features which can be highly useful to find structured queries and segment data.
Part of the huge promise I see for ruby on rails is building dynamic applications/portals for focused communities of the web. High quality search engines such as these will allow sites and communities to have efficient ways to search and find material within their sites. The search engines for popular content management systems such as Drupal (based on PHP) have not always been up to par and do not appear to be as easy to customize and extend to the unique needs of the project.
Do HyperEstraier or Ferret manage western languages accents ? So that searching for “ubercook” may find “übercool” ?
hhhhhhhhh