Plug into HyperEstraier with acts_as_searchable

Posted by marcel April 06, 2006 @ 09:18 PM

Patrick Lenz has announced his acts_as_searchable plugin which integrates ActiveRecord models with HyperEstraier, an open source fulltext search engine.

It’s available as a gem so you can just do sudo gem install acts_as_searchable.

You can then take a look at the API docs, which provide a few examples.

Full text searching just got as simple as:
class Article < ActiveRecord::Base
  acts_as_searchable
end

Article.fulltext_search('biscuits AND gravy')

Posted in Releases, Tools | 16 comments

Comments

  1. Brian on 06 Apr 21:57:

    Nice!

    How does HyperEstraier compare to Ferret? Ferret is a Ruby port of the Apache Lucene project, and I’m wondering if anyone has done a good comparison. Both look great, and my main curiosity is if anyone has any suggestions on when to use one vs. the other.

    Clearly, implementing this gem with acts_as_searchable makes it dead simple for Rails developers.

    Note, to install ferret:
      gem install ferret
  2. Kasper Weibel on 06 Apr 23:09:

    Ferret integrates just as easily into Rails with the plugin acts_as_ferret. Ferret is actively beeing developed by David Balamin (here) and acts_as_ferret by Jens Krämer and myself (here).

  3. Branstrom on 07 Apr 00:28:

    Yeah, I’d like a comparison too.

  4. Wayne on 07 Apr 00:30:

    One great thing about HyperEstraier is it scales really well if you have a lot of data to index with it’s built in P2P clustering of index servers.

  5. Mr eel on 07 Apr 02:25:

    First look at the HyperEstraier setup docs is a bit scary. I wonder if there is a quickstart guide?

    I’m certainly concious of the scalability issues, but at this point I want something nice and simple to implement. I’m a wimp like that.

  6. random8r on 07 Apr 04:34:

    pop this into any active record class def you have if you want something simple to implement ;)

    def self.easyfind(argHash)
        fieldnames = argHash[:fieldnames]
        keywords = argHash[:keywords]
        order = argHash[:order]
        incl = argHash[:include]
        unless keywords.empty?
            keywordArray = []
            theSqlArray = keywords.inject([]) do |agg, keyword| 
                aLineArray = fieldnames.inject([]) {|lineSectionsArray, aFieldname| 
                        keywordArray << '"' + keyword.downcase + '"'
                        lineSectionsArray << aFieldname + " LIKE LOWER(?)" 
                    }
                aLine = aLineArray.join(" OR ")
                aLine = "(" + aLine + ")" 
                agg << aLine
            end
            theSql = theSqlArray.join(" AND ")
            result = self.find(:all, :conditions => [theSql] + keywordArray, :order => order, :include => incl)
        else
            result = []
        end
        return result
    end
  7. Morten on 07 Apr 07:32:

    Is it possible to add new indexes runtime for hyperestraier? Dealing with multiple clients, I’d like to be able to use “an index per client” rather than “an index per model type”.

  8. Wayne on 07 Apr 10:31:

    Of course, the point of having an index is to find things quickly. While working, the code random8r provided doesn’t use indexes and will therefore resort to a full table scan to locate the matching records, potentially multiple times for lots of keywords.

  9. Wayne on 07 Apr 10:44:

    Of course, the point of having an index is to find things quickly. While working, the code random8r provided doesn’t use indexes and will therefore resort to a full table scan to locate the matching records, potentially multiple times for lots of keywords.

  10. Grandalf on 07 Apr 11:05:

    I’m confused. What are the various tradeoffs involved in considering HyperEstraier compared to Ferret compared to Tsearch2 (postgres) or mysql’s own full text index? I know that adding acts_as_searchable is convenient, but I’m curious about overall complexity, capabilities, etc.

    In the app I’m developing I am using tsearch2 with a series of UNIONs to query the various models based on a single search box… with an ORDER BY at the end to sort by the rank that tsearch2 assigns.

    I realize that tsearch2 is highly database specific, but it’s fortunately quite simple to activate, implement and use.

    Any thoughts on the various tradeoffs would be much appreciated!

  11. Grandalf on 07 Apr 11:06:

    I’m confused. What are the various tradeoffs involved in considering HyperEstraier compared to Ferret compared to Tsearch2 (postgres) or mysql’s own full text index? I know that adding acts_as_searchable is convenient, but I’m curious about overall complexity, capabilities, etc.

    In the app I’m developing I am using tsearch2 with a series of UNIONs to query the various models based on a single search box… with an ORDER BY at the end to sort by the rank that tsearch2 assigns.

    I realize that tsearch2 is highly database specific, but it’s fortunately quite simple to activate, implement and use.

    Any thoughts on the various tradeoffs would be much appreciated!

  12. Roderick van Domburg on 07 Apr 11:38:

    Here’s another curious wonderer. Some information on the Ferret vs. Estraier debate is up at http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/574efe4d2a43eb01 though it certainly doesn’t seem to be exhaustive or authorative.

  13. DoH on 07 Apr 13:59:

    Anyone already in posession willing to mirror the windows Binary?

  14. SEO G on 07 Apr 19:41:

    Thanks Kasper and everyone else who has posted information about Ferret, which is based on the Lucene search engine. I’ve read some interesting notes about Lucene and its technology. Proximity searching is one of the features which can be highly useful to find structured queries and segment data.

    Part of the huge promise I see for ruby on rails is building dynamic applications/portals for focused communities of the web. High quality search engines such as these will allow sites and communities to have efficient ways to search and find material within their sites. The search engines for popular content management systems such as Drupal (based on PHP) have not always been up to par and do not appear to be as easy to customize and extend to the unique needs of the project.

  15. Jerome on 11 Apr 12:17:

    Do HyperEstraier or Ferret manage western languages accents ? So that searching for “ubercook” may find “übercool” ?

  16. null on 15 Apr 15:09:

    hhhhhhhhh