Insoshi Ruby on Rails blog

July 17, 2008

Searching a Ruby on Rails application with Sphinx and Ultrasphinx

Filed under: Ferret, Insoshi, Ruby on Rails, Sphinx, Ultrasphinx — mhartl @ 4:46 pm

We recently switched the Insoshi social networking platform from a Ferret search engine to Sphinx (and Ultrasphinx), due to the well-known problems encountered with Ferret and due to our own experience of its instability on the Insoshi developer site. (Sphinx is currently running on our demo site, and anyone who wants the Sphinx-enabled source can grab edge Insoshi as described in the Rails 2.1 upgrade post. We’ll merge it into the master branch within a couple weeks.)

The switch did not always go smoothly, and there are several gotchas that I thought might be helpful to discuss in case other people run into them. I’ve also included some material on using Ultrasphinx, since its documentation is a bit sparse. For pedagogical purposes, I’ve simplified the Insoshi source slightly for this discussion; you don’t have to be familiar with the Insoshi codebase to follow this post. (N.B. The actual production code contains a trick for dealing with more advanced filtering requirements, which will probably be the subject of a future post.)

Installing Sphinx

The first step, naturally enough, is to install Sphinx. You can get the latest and greatest version at the Sphinx download page. (This blog post uses version 0.9.8, which was released just a couple of days before this post was written.) Download the source, and then install it as follows:

$ tar zxf sphinx-0.9.8.tar.gz
$ cd  sphinx-0.9.8
$ ./configure --with-pgsql
$ make
$ sudo make install

The configure step ensures that Sphinx gets compiled with PostgreSQL support (MySQL comes for free). We’ve had trouble getting all the Postgres stuff to work properly, but it doesn’t hurt to have it. If you’d rather omit the Postgres support, just use ./configure by itself.

Installing Ultrasphinx

The second step is to install the Ultrasphinx plugin, which has one gem dependency:

$ sudo gem install chronic

The installation itself is trickier than it sounds; although there are plenty of tutorials that tell you how to do it, as far as I can tell they don’t work. I tried a couple of different tacks, both of which failed. First, I tried

$ svn export svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk vendor/plugins/ultrasphinx
Export complete.

The only problem is, this didn’t do anything; there was literally no change to my working copy. I then tried a plugin install:

$ script/plugin install svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk
Export complete.

Still nothing. After some time flailing about, I finally found a James on Software Sphinx/Ultrasphinx post, which suggested cloning his GitHub fork of Ultrasphinx. That worked at first, but later on I encountered a clash with the latest version of will_paginate:

WillPaginate: You are using a paginated collection of class
Ultrasphinx::Search which conforms to the old API of WillPaginate::Collection
by using `page_count`, while the current method name is `total_pages`. Please
upgrade yours or 3rd-party code that provides the paginated collection.

Luckily, with some judicious Googling I was able to find a second repository at GitHub, whose most recent commit as of this writing is updating the code to work with the latest will_paginate, which certainly looked promising. And, indeed, it worked beautifully, so I’m happy to recommend it:

$ git clone git://github.com/DrMark/ultrasphinx.git vendor/plugins/ultrasphinx
$ rm -rf vendor/plugins/ultrasphinx/.git

(This is one of the many reasons GitHub rocks; if the “official” version of a plugin is unavailable or out of date, you still might be able to find an updated fork on GitHub.)

Configuring Ultrasphinx

To configure Ultrasphinx, I followed the config instructions at the main Ultrasphinx site:

Next, copy the examples/default.base file to RAILS_ROOT/config/ultrasphinx/default.base.
This file sets up the Sphinx daemon options such as port, host, and index location.

Since many of the Insoshi fields allow HTML, the search results are better if we strip HTML tags first:

config/ultrasphinx/default.base

index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

N.B. This is a replacement for the older strip_html syntax, used inside the source section:

config/ultrasphinx/default.base

source
{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 5000
  sql_query_post =
  strip_html = 1
}

If you get a warning like

WARNING: key 'strip_html' is deprecated in config/ultrasphinx/development.conf line 24;
use 'html_strip (per-index)' instead.

just remove the strip_html line and put an html_strip line in its place (taking care to put it in the index section of the configuration file).

Bootstrapping Ultrasphinx

Now we’re ready to fire up Ultrasphinx, which uses Sphinx to build up a search index of our database:

$ rake ultrasphinx:bootstrap

There’s just one hitch: many people (including me) get an error at this stage:

dyld: Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient.15.dylib
  Referenced from: /usr/local/bin/indexer
  Reason: image not found

I found a solution using the canonical “Google the error message” method. There’s something screwy with the location of the MySQL libraries, but it’s nothing a little symlink couldn’t fix:

$ sudo ln -s /usr/local/mysql/lib /usr/local/mysql/lib/mysql

Testing Sphinx and Ultrasphinx

In principle, things are working now under the hood; we just need to add in some code to our models and controllers to execute the searches. I prefer test-driven development, though, so the next priority is to get Sphinx and Ultrasphinx working in a test environment.

It’s important to stop the Ultrasphinx daemon, which might be running in development mode if you used rake ultrasphinx:bootstrap above:

$ rake ultrasphinx:daemon:stop

Then make a test-specific configuration file:

config/ultrasphinx/test.base

{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 999999999
  sql_query_post =
}
.
.
.
index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

The line sql_range_step = 999999999 here is key. The sql_range_step variable controls how much Ultrasphinx increases the ids of the rows as it indexes; by default, it’s 5000, but Insoshi uses foxy fixtures, which often create objects with huge ids. As a result, the indexing step can take a long time (several minutes), even for a tiny test database. Setting sql_range_step to a larger step size solves the problem.

With that done, we’re ready to fire things up:

$ rake ultrasphinx:bootstrap RAILS_ENV=test

One problem we run into is that the Sphinx test daemon might not always be running, so it would be nice to skip the search tests (or specs) if this is the case. For example, suppose that we have a Searches controller (whose index action will handle searches). Here is a skeleton for the Searches controller specs that runs only when Sphinx is running:

spec/controllers/searches_controller_spec.rb


# Return a list of system processes.
def processes
  process_cmd = case RUBY_PLATFORM
                when /djgpp|(cyg|ms|bcc)win|mingw/ then 'tasklist /v'
                when /solaris/                     then 'ps -ef'
                else
                  'ps aux'
                end
  `#{process_cmd}`
end

# Return true if the search daemon is running.
def testing_search?
  processes.include?('searchd')
end

describe SearchesController do
  .

  .
  .
end if testing_search?

(A blog post on testing with Ultrasphinx proved useful in this context.)

Writing the first tests

OK, now we’re ready to write some concrete tests. Some basic tests (using RSpec) might look like these:

spec/controllers/searches_controller_spec.rb


describe SearchesController do

  describe "Person searches" do

    it "should search by name" do
      get :index, :q => "quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end

    it "should search by description" do
      get :index, :q => "I'm Quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end
  end
end if testing_search?

Here we’ve passed a model parameter in anticipation of using a single action to search multiple models.

The specs fail, of course:

$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 2 failures

Apart from the if testing_search? clause, there’s nothing here beyond vanilla RSpec, so in what follows I won’t bother showing any more specs.

Person: Basic indexing

Now we’re ready for some basic searching. Suppose we have a Person model with name and description fields, which we want to enable for searching. We need the is_indexed method from Ultrasphinx:

app/models/person.rb


class Person < ActiveRecord::Base
  is_indexed :fields => [ 'name', 'description' ]
  .
  .
  .
end

Then a sample Searches controller index might look like this:

app/controllers/searches_controller.rb


def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  @search = Ultrasphinx::Search.new(:query => query,
                                    :page => page,
                                    :class_names => model,
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Note the use of a :page option; Ultrasphinx works with the will_paginate plugin out of the box.

A sample search box partial might look like this:

app/views/searches/_box.html.erb


<% form_tag searches_path, :method => :get do %>
  <fieldset>
    <%= text_field_tag :q, h(params[:q]), :maxlength => 50 %>
    <%= submit_tag "Search" %>
    <%= hidden_field_tag "model", search_model %>
  </fieldset>
<% end %>

where search_model is just a helper that inspects params and returns the name of the model being searched. (For example:

app/helpers/searches_helper.rb


module SearchesHelper

  # Return the model to be searched based on params.
  def search_model
    return "Person"    if params[:controller] =~ /home/
    return "ForumPost" if params[:controller] =~ /forums/
    params[:model] || params[:controller].classify
  end
end

where params[:controller].classify automagically returns the string “Person” inside the People controller and “Message” inside the Messages controller.)

As long as the test database contains the appropriate user (in our case, Quentin from restful_authentication), the specs should pass once we reindex:

$ rake ultrasphinx:bootstrap RAILS_ENV=test
$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 0 failures

If they fail, chances are that either (1) there’s some rogue development daemon running or (2) we forgot to reindex the test database after changing a model. If this happens, you can be extra paranoid by recycling everything:

$ rake ultrasphinx:daemon:stop
$ rake ultrasphinx:bootstrap RAILS_ENV=test

Message: Ultrasphinx with conditions and filtering

One common task is to put a condition on a search result. For example, suppose we have a Message model with a subject and content we want to index, but with “trashed” messages we want to exclude. Suppose further that recipients trash messages by setting a recipient_deleted_at attribute in the Message model. Untrashed messages would then have a NULL value for recipient_deleted_at:

app/models/message.rb


class Message < ActiveRecord::Base
  is_indexed :fields => [ 'subject', 'content', 'recipient_id' ],
             :conditions => "recipient_deleted_at IS NULL"
  .
  .
  .
end

Of course, when searching through messages for a particular person, we should only return messages actually sent to that person. This is why we added the recipient_id to the index fields above; this way, we can use an Ultrasphinx filter to restrict the results appropriately in the Searches controller:

app/controllers/searches_controller.rb


def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  if model == "Message"
    # Restrict message results to those sent to the current person.
    filters['recipient_id'] = current_person.id
  end
  @search = Ultrasphinx::Search.new(:query => params[:q],
                                    :page => params[:page] || 1,
                                    :class_names => params[:model],
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Of course, this requires an appropriately defined current_person object in line 8, which we assume is taken care of by the application’s authentication scheme.

ForumPost: Ultrasphinx with Single Table Inheritance (STI) and associations

Our final example combines conditions with an include. Insoshi has a ForumPost model that inherits from a Post base class (which is also used for blog posts) using Single Table Inheritance (STI). We want to restrict forum searches to the body of forum posts, excluding blog posts. We also want to include the topic name in searches, so that a post “Lorem ipsum” under topic “Foobar” will show up for both the queries “Lorem” and “Foobar”. We can achieve this by using a conditions clause on the STI type, while using an include for the topic association:


class ForumPost < Post
  is_indexed :fields => [ 'body' ],
             :conditions => "type = 'ForumPost'",
             :include => [{:association_name => 'topic', :field => 'name'}]
  belongs_to :topic
  .
  .
  .
end

(If we leave out the type condition, Ultrasphinx happily indexex all the blog posts as well. Rails then complains when trying to make a new ForumPost using a BlogPost id.)

With that, we’ve covered all our basic search needs. As noted above, there’s one more advanced technique being used at Insoshi (handling searches on boolean attributes such as deactivated), which I’ll probably cover in a later post. It’s also worth noting that, unlike Ferret, Sphinx doesn’t update the search index with every Active Record update; you need to update the index periodically with a cron job. Take a look at the Ultrasphinx deployment notes for more details.

TextMate Footnotes and Ultrasphinx

Finally, there’s a minor incompatibility between Ultrasphinx and the latest (Rails 2.1-compatible) TextMate Footnotes, which gives the following error (at least when using vendored Rails):

activesupport/lib/active_support/dependencies.rb:275:in `load_missing_constant':
uninitialized constant Footnotes::Filter (NameError)

This is because Ultrasphinx is looking for the Rails file initializer.rb, but instead it finds initializer.rb as defined by Footnotes. The fix is to change “initializer” to something else (say, “loader”) everywhere; see my fork of Footnotes at GitHub for an example.

15 Comments »

  1. Thanks for the write-up. Very helpful.

    I’m currently using Sphinx and Thinking_Sphinx in one of my apps and it’s running great.

    Comment by MikeInAZ — July 17, 2008 @ 5:15 pm

  2. @MikeInAZ: Glad it helped! I looked at thinking_sphinx, but only briefly; if you know of any big advantages over Ultrasphinx, please let me know.

    Comment by mhartl — July 17, 2008 @ 6:17 pm

  3. For me, it was easier to understand and setup.

    This blog post goes more in-depth:
    http://reinh.com/blog/2008/07/14/a-thinking-mans-sphinx.html

    Comment by MikeInAZ — July 17, 2008 @ 7:30 pm

  4. Nice post, Mike.

    Busy week for searching in Rails. In addition to Rein’s post, I posted earlier in the week about us implementing UltraSphinx on MindBites on Thursday and then replacing the entire thing with Xapian and deploying that following Monday.

    http://locomotivation.com/2008/07/15/mulling-over-our-ruby-on-rails-full-text-search-options

    I’m still working on a post describing our Xapian thoughts but so far so good. Much simpler than UltraSphinx for us. We never did look at ThinkingSphinx.

    Comment by Jim — July 17, 2008 @ 9:11 pm

  5. One thing that’s still not clear to me is how you get ultrasphinx working simultaneously in test and development. (assuming you want to do autotest-style BDD).

    Do you start two daemons, one for each environment? Does that require them to be manually set for different ports? Or can one searchd instance handle both? What command-line commands do you execute to make this happen?

    Comment by Evan Dorn — July 22, 2008 @ 1:06 pm

  6. @Evan: As far as I know, there’s no way to run development and test daemons simultaneously. Yes, that kinda bites. Let me know if you find a workaround.

    Comment by mhartl — July 22, 2008 @ 1:32 pm

  7. @mhartl: this is an interesting, if S L O W solution to the problem:

    http://tadatoshi.blogspot.com/2008/05/ultrasphinx-setup-part2.html

    Of course, it’s hard for me to test any of these approaches, since I’m having trouble getting sphinx to work at all (as per the email I sent you a while back).

    Comment by idahoev — July 22, 2008 @ 2:59 pm

  8. @Evan: You’ll need to run an instance of the searchd daemon for each environment, on different ports. Personally I just stub the search calls though, saves having to worry about the test instance.

    Comment by Pat Allan — July 23, 2008 @ 6:53 pm

  9. I just put up a blog post detailing the fix to two-daemon problem, along with two other big issues I faced while implementing the insoshi sphinx upgrade in an Insoshi fork I’m working on.

    There’s a lot of good stuff in there that I’m hoping will help insoshi upgraders and other ultrasphinx users.

    Comment by Evan Dorn — July 24, 2008 @ 10:46 am

  10. Thanks for the writeup! Very helpful and practical! - Cheri

    Comment by Cheri — July 24, 2008 @ 3:00 pm

  11. [...] with this post if you’re just starting out with sphinx. Instead, go read this much better introductory tutorial from the guys over at Insoshi. Then if you have problems, come back here and you may find [...]

    Pingback by LRBlog » Blog Archive » Fixing problems with sphinx search — July 25, 2008 @ 6:45 pm

  12. @Cheri: You’re welcome!

    Comment by mhartl — July 27, 2008 @ 5:28 pm

  13. Thanks a lot! However, I am still struggling with the following error after trying to run rake ultrasphinx:index:

    FATAL: no sources found in config file.

    Any ideas? I followed all the instrutions upto that point. (ultrasphinx:configure gets me this: Rebuilding configurations for development environment
    Available models are
    Generating SQL

    Thanks again!
    Justus

    Comment by Justus — July 30, 2008 @ 1:09 am

  14. [...] fix is detailed at the bottom of Searching a Ruby on Rails application with Sphinx and Ultrasphinx with the specific implementation details available via this github [...]

    Pingback by Binary Code » Collision Between Ultrasphinx and TextMate Footnotes — August 7, 2008 @ 11:18 am

  15. [...] installing Ultraspinx (perhaps per these instructions from Inoshi, which are the best I’ve found thus far) and you run into this error when time [...]

    Pingback by Binary Code » Ultrasphinx Bootstrap Error — August 7, 2008 @ 2:29 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.