Insoshi Ruby on Rails blog

July 17, 2008

Searching a Ruby on Rails application with Sphinx and Ultrasphinx

Filed under: Ferret, Insoshi, Ruby on Rails, Sphinx, Ultrasphinx — mhartl @ 4:46 pm

We recently switched the Insoshi social networking platform from a Ferret search engine to Sphinx (and Ultrasphinx), due to the well-known problems encountered with Ferret and due to our own experience of its instability on the Insoshi developer site. (Sphinx is currently running on our demo site, and anyone who wants the Sphinx-enabled source can grab edge Insoshi as described in the Rails 2.1 upgrade post. We’ll merge it into the master branch within a couple weeks.)

The switch did not always go smoothly, and there are several gotchas that I thought might be helpful to discuss in case other people run into them. I’ve also included some material on using Ultrasphinx, since its documentation is a bit sparse. For pedagogical purposes, I’ve simplified the Insoshi source slightly for this discussion; you don’t have to be familiar with the Insoshi codebase to follow this post. (N.B. The actual production code contains a trick for dealing with more advanced filtering requirements, which will probably be the subject of a future post.)

Installing Sphinx

The first step, naturally enough, is to install Sphinx. You can get the latest and greatest version at the Sphinx download page. (This blog post uses version 0.9.8, which was released just a couple of days before this post was written.) Download the source, and then install it as follows:

$ tar zxf sphinx-0.9.8.tar.gz
$ cd  sphinx-0.9.8
$ ./configure --with-pgsql
$ make
$ sudo make install

The configure step ensures that Sphinx gets compiled with PostgreSQL support (MySQL comes for free). We’ve had trouble getting all the Postgres stuff to work properly, but it doesn’t hurt to have it. If you’d rather omit the Postgres support, just use ./configure by itself.

Installing Ultrasphinx

The second step is to install the Ultrasphinx plugin, which has one gem dependency:

$ sudo gem install chronic

The installation itself is trickier than it sounds; although there are plenty of tutorials that tell you how to do it, as far as I can tell they don’t work. I tried a couple of different tacks, both of which failed. First, I tried

$ svn export svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk vendor/plugins/ultrasphinx
Export complete.

The only problem is, this didn’t do anything; there was literally no change to my working copy. I then tried a plugin install:

$ script/plugin install svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk
Export complete.

Still nothing. After some time flailing about, I finally found a James on Software Sphinx/Ultrasphinx post, which suggested cloning his GitHub fork of Ultrasphinx. That worked at first, but later on I encountered a clash with the latest version of will_paginate:

WillPaginate: You are using a paginated collection of class
Ultrasphinx::Search which conforms to the old API of WillPaginate::Collection
by using `page_count`, while the current method name is `total_pages`. Please
upgrade yours or 3rd-party code that provides the paginated collection.

Luckily, with some judicious Googling I was able to find a second repository at GitHub, whose most recent commit as of this writing is updating the code to work with the latest will_paginate, which certainly looked promising. And, indeed, it worked beautifully, so I’m happy to recommend it:

$ git clone git://github.com/DrMark/ultrasphinx.git vendor/plugins/ultrasphinx
$ rm -rf vendor/plugins/ultrasphinx/.git

(This is one of the many reasons GitHub rocks; if the “official” version of a plugin is unavailable or out of date, you still might be able to find an updated fork on GitHub.)

Configuring Ultrasphinx

To configure Ultrasphinx, I followed the config instructions at the main Ultrasphinx site:

Next, copy the examples/default.base file to RAILS_ROOT/config/ultrasphinx/default.base.
This file sets up the Sphinx daemon options such as port, host, and index location.

Since many of the Insoshi fields allow HTML, the search results are better if we strip HTML tags first:

config/ultrasphinx/default.base

index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

N.B. This is a replacement for the older strip_html syntax, used inside the source section:

config/ultrasphinx/default.base

source
{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 5000
  sql_query_post =
  strip_html = 1
}

If you get a warning like

WARNING: key 'strip_html' is deprecated in config/ultrasphinx/development.conf line 24;
use 'html_strip (per-index)' instead.

just remove the strip_html line and put an html_strip line in its place (taking care to put it in the index section of the configuration file).

Bootstrapping Ultrasphinx

Now we’re ready to fire up Ultrasphinx, which uses Sphinx to build up a search index of our database:

$ rake ultrasphinx:bootstrap

There’s just one hitch: many people (including me) get an error at this stage:

dyld: Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient.15.dylib
  Referenced from: /usr/local/bin/indexer
  Reason: image not found

I found a solution using the canonical “Google the error message” method. There’s something screwy with the location of the MySQL libraries, but it’s nothing a little symlink couldn’t fix:

$ sudo ln -s /usr/local/mysql/lib /usr/local/mysql/lib/mysql

Testing Sphinx and Ultrasphinx

In principle, things are working now under the hood; we just need to add in some code to our models and controllers to execute the searches. I prefer test-driven development, though, so the next priority is to get Sphinx and Ultrasphinx working in a test environment.

It’s important to stop the Ultrasphinx daemon, which might be running in development mode if you used rake ultrasphinx:bootstrap above:

$ rake ultrasphinx:daemon:stop

Then make a test-specific configuration file:

config/ultrasphinx/test.base

{
  # Individual SQL source options
  sql_ranged_throttle = 0
  sql_range_step = 999999999
  sql_query_post =
}
.
.
.
index
{
  .
  .
  .
  # HTML-specific options
  html_strip = 1
}

The line sql_range_step = 999999999 here is key. The sql_range_step variable controls how much Ultrasphinx increases the ids of the rows as it indexes; by default, it’s 5000, but Insoshi uses foxy fixtures, which often create objects with huge ids. As a result, the indexing step can take a long time (several minutes), even for a tiny test database. Setting sql_range_step to a larger step size solves the problem.

With that done, we’re ready to fire things up:

$ rake ultrasphinx:bootstrap RAILS_ENV=test

One problem we run into is that the Sphinx test daemon might not always be running, so it would be nice to skip the search tests (or specs) if this is the case. For example, suppose that we have a Searches controller (whose index action will handle searches). Here is a skeleton for the Searches controller specs that runs only when Sphinx is running:

spec/controllers/searches_controller_spec.rb


# Return a list of system processes.
def processes
  process_cmd = case RUBY_PLATFORM
                when /djgpp|(cyg|ms|bcc)win|mingw/ then 'tasklist /v'
                when /solaris/                     then 'ps -ef'
                else
                  'ps aux'
                end
  `#{process_cmd}`
end

# Return true if the search daemon is running.
def testing_search?
  processes.include?('searchd')
end

describe SearchesController do
  .

  .
  .
end if testing_search?

(A blog post on testing with Ultrasphinx proved useful in this context.)

Writing the first tests

OK, now we’re ready to write some concrete tests. Some basic tests (using RSpec) might look like these:

spec/controllers/searches_controller_spec.rb


describe SearchesController do

  describe "Person searches" do

    it "should search by name" do
      get :index, :q => "quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end

    it "should search by description" do
      get :index, :q => "I'm Quentin", :model => "Person"
      assigns(:results).should == [people(:quentin)].paginate
    end
  end
end if testing_search?

Here we’ve passed a model parameter in anticipation of using a single action to search multiple models.

The specs fail, of course:

$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 2 failures

Apart from the if testing_search? clause, there’s nothing here beyond vanilla RSpec, so in what follows I won’t bother showing any more specs.

Person: Basic indexing

Now we’re ready for some basic searching. Suppose we have a Person model with name and description fields, which we want to enable for searching. We need the is_indexed method from Ultrasphinx:

app/models/person.rb


class Person < ActiveRecord::Base
  is_indexed :fields => [ 'name', 'description' ]
  .
  .
  .
end

Then a sample Searches controller index might look like this:

app/controllers/searches_controller.rb


def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  @search = Ultrasphinx::Search.new(:query => query,
                                    :page => page,
                                    :class_names => model,
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Note the use of a :page option; Ultrasphinx works with the will_paginate plugin out of the box.

A sample search box partial might look like this:

app/views/searches/_box.html.erb


<% form_tag searches_path, :method => :get do %>
  <fieldset>
    <%= text_field_tag :q, h(params[:q]), :maxlength => 50 %>
    <%= submit_tag "Search" %>
    <%= hidden_field_tag "model", search_model %>
  </fieldset>
<% end %>

where search_model is just a helper that inspects params and returns the name of the model being searched. (For example:

app/helpers/searches_helper.rb


module SearchesHelper

  # Return the model to be searched based on params.
  def search_model
    return "Person"    if params[:controller] =~ /home/
    return "ForumPost" if params[:controller] =~ /forums/
    params[:model] || params[:controller].classify
  end
end

where params[:controller].classify automagically returns the string “Person” inside the People controller and “Message” inside the Messages controller.)

As long as the test database contains the appropriate user (in our case, Quentin from restful_authentication), the specs should pass once we reindex:

$ rake ultrasphinx:bootstrap RAILS_ENV=test
$ script/spec spec/controllers/searches_controller_spec.rb
2 examples, 0 failures

If they fail, chances are that either (1) there’s some rogue development daemon running or (2) we forgot to reindex the test database after changing a model. If this happens, you can be extra paranoid by recycling everything:

$ rake ultrasphinx:daemon:stop
$ rake ultrasphinx:bootstrap RAILS_ENV=test

Message: Ultrasphinx with conditions and filtering

One common task is to put a condition on a search result. For example, suppose we have a Message model with a subject and content we want to index, but with “trashed” messages we want to exclude. Suppose further that recipients trash messages by setting a recipient_deleted_at attribute in the Message model. Untrashed messages would then have a NULL value for recipient_deleted_at:

app/models/message.rb


class Message < ActiveRecord::Base
  is_indexed :fields => [ 'subject', 'content', 'recipient_id' ],
             :conditions => "recipient_deleted_at IS NULL"
  .
  .
  .
end

Of course, when searching through messages for a particular person, we should only return messages actually sent to that person. This is why we added the recipient_id to the index fields above; this way, we can use an Ultrasphinx filter to restrict the results appropriately in the Searches controller:

app/controllers/searches_controller.rb


def index
  query = params[:q].strip
  page  = params[:page] || 1
  model = params[:model]
  filters = {}
  if model == "Message"
    # Restrict message results to those sent to the current person.
    filters['recipient_id'] = current_person.id
  end
  @search = Ultrasphinx::Search.new(:query => params[:q],
                                    :page => params[:page] || 1,
                                    :class_names => params[:model],
                                    :filters => filters)
  @search.run
  @results = @search.results
end

Of course, this requires an appropriately defined current_person object in line 8, which we assume is taken care of by the application’s authentication scheme.

ForumPost: Ultrasphinx with Single Table Inheritance (STI) and associations

Our final example combines conditions with an include. Insoshi has a ForumPost model that inherits from a Post base class (which is also used for blog posts) using Single Table Inheritance (STI). We want to restrict forum searches to the body of forum posts, excluding blog posts. We also want to include the topic name in searches, so that a post “Lorem ipsum” under topic “Foobar” will show up for both the queries “Lorem” and “Foobar”. We can achieve this by using a conditions clause on the STI type, while using an include for the topic association:


class ForumPost < Post
  is_indexed :fields => [ 'body' ],
             :conditions => "type = 'ForumPost'",
             :include => [{:association_name => 'topic', :field => 'name'}]
  belongs_to :topic
  .
  .
  .
end

(If we leave out the type condition, Ultrasphinx happily indexex all the blog posts as well. Rails then complains when trying to make a new ForumPost using a BlogPost id.)

With that, we’ve covered all our basic search needs. As noted above, there’s one more advanced technique being used at Insoshi (handling searches on boolean attributes such as deactivated), which I’ll probably cover in a later post. It’s also worth noting that, unlike Ferret, Sphinx doesn’t update the search index with every Active Record update; you need to update the index periodically with a cron job. Take a look at the Ultrasphinx deployment notes for more details.

TextMate Footnotes and Ultrasphinx

Finally, there’s a minor incompatibility between Ultrasphinx and the latest (Rails 2.1-compatible) TextMate Footnotes, which gives the following error (at least when using vendored Rails):

activesupport/lib/active_support/dependencies.rb:275:in `load_missing_constant':
uninitialized constant Footnotes::Filter (NameError)

This is because Ultrasphinx is looking for the Rails file initializer.rb, but instead it finds initializer.rb as defined by Footnotes. The fix is to change “initializer” to something else (say, “loader”) everywhere; see my fork of Footnotes at GitHub for an example.

July 3, 2008

A Rails 2.1 case study: upgrading the Insoshi social networking platform

Filed under: Git, Insoshi, Ruby on Rails — mhartl @ 10:58 am

I’m happy to announce the release of a new edge branch of the Insoshi social networking platform, which is fully updated with Rails 2.1 support. (The Insoshi demo site is currently running off this edge branch.) The result is a good example of upgrading a real-life application to Rails 2.1, so I thought a blog post detailing the steps might be useful.

Most of the changes here are quite generic—updates to widely used plugins, for example—and are not specific to Insoshi. I’ve been especially careful to include error messages when applicable, so that search engines can index them; I’m sure I’m not the only programmer who follows the “Google the error message” algorithm when debugging. I’ve also included the Git commands I used, since I know a lot of Rails developers are working to learn Git and I thought some examples might be helpful.

It’s important to note at the outset that you do not need to follow these steps yourself to upgrade Insoshi. Insoshi contributors, and anyone else who wants the edge version of Insoshi, should follow the instructions at the Insoshi wiki (if they haven’t already) and then issue the following commands:

$ git fetch origin
$ git branch --track edge origin/edge
$ git checkout edge

This way, you’ll get all these changes for free. (Once hosting support for Rails 2.1 is more widespread (especially at Heroku), we’ll merge the Rails 2.1 Insoshi edge into the master branch.)

And now, on with our show. Here’s what it took to get Insoshi running under Rails 2.1.

Update RubyGems

Upgrading Insoshi to Rails 2.1 involves updating a bunch of plugins, many of which are now hosted at GitHub. Unfortunately, older versions of RubyGems can’t install directly from GitHub sources; for example, trying to install will_paginate with RubyGems 1.1.0 gives you this error message:

$ gem --version
1.1.0
$ sudo gem install mislav-will_paginate -s http://gems.github.com/
ERROR:  could not find mislav-will_paginate locally or in a repository

The solution is to update the system gems (including RubyGems) as follows:

$ sudo gem update --system
$ gem --version
1.2.1

Install Rails 2.1

Installing Rails 2.1 itself is the easiest step in the entire upgrade. (I think when people wonder, “How hard could upgrading to Rails 2.1 possibly be?”, they have mainly this step in mind.)

$ sudo gem update rails

Things now get a little trickier, since we want to freeze Rails 2.1 in vendor/rails to follow the best practice for production Rails apps. The first part goes smoothly:

$ git rm -r vendor/rails
$ git commit -a -m "Cleared out vendor/rails"

There’s a hitch with the second step. If you happen to have only Rails 2.1 (but not 2.0.2) on your machine, there’s a bootstrapping problem due to a variable in environment.rb:

$ rake rails:freeze:gems
Missing the Rails 2.0.2 gem. Please `gem install -v=2.0.2 rails`, update your
RAILS_GEM_VERSION setting in config/environment.rb for the Rails version you
do have installed, or comment out RAILS_GEM_VERSION to use the latest version
installed.

The solution is to update environment.rb with the proper Rails gem version:


# Specifies gem version of Rails to use when vendor/rails is not present
RAILS_GEM_VERSION = '2.1.0' unless defined? RAILS_GEM_VERSION

Then the freeze works fine:

$ rake rails:freeze:gems
$ git add .
$ git commit -a -m "Updated vendor/rails to Rails 2.1"

Finally, following the advice froma Akita’s Rolling With Rails 2.1, we add a new file called config/initializers/new_defaults.rb:


# These settings change the behavior of Rails 2 apps and will be defaults
# for Rails 3. You can remove this initializer when Rails 3 is released.

# Only save the attributes that have changed since the record was loaded.
ActiveRecord::Base.partial_updates = true

# Include ActiveRecord class name as root for JSON serialized output.
ActiveRecord::Base.include_root_in_json = true

# Use ISO 8601 format for JSON serialized times and dates
ActiveSupport.use_standard_json_time_format = true

# Don't escape HTML entities in JSON, leave that for the #json_escape helper
# if you're including raw json in an HTML page.
ActiveSupport.escape_html_entities_in_json = false
$ git add config/initializers/new_defaults.rb
$ git commit -m "Added new defaults initializer"

Update RSpec for Rails 2.1

A necessary step in verifying that the Insoshi application is working under Rails 2.1 is to get the test suite to pass. Our tests are written using RSpec, but older versions of RSpec don’t work with Rails 2.1, so we can’t even run the test suite. D’oh!

Since the old plugins don’t work, first we remove them:

$ git rm -r vendor/plugins/rspec*
$ git commit -a -m "removed outdated RSpec plugins"

Then we need to install the most recent versions of the RSpec plugins from GitHub:

$ script/plugin install git://github.com/dchelimsky/rspec.git
$ script/plugin install git://github.com/dchelimsky/rspec-rails.git
$ script/generate rspec

In my case, I was careful not to overwrite spec_helper.rb when running script/generate rspec in the last step, as the current spec helper contains several custom modifications that I didn’t want to lose.

Then we need to add the changes:

$ git add .
$ git commit -a -m "Added latest RSpec plugins from GitHub"

By the way, running specs from the command line won’t work:

$ spec spec/models/person_spec.rb
Your RSpec on Rails plugin is incompatible with your installed RSpec.

RSpec          : 20080526202855
RSpec on Rails : 20080628203842

To fix this, you can use script/spec in place of spec, or you can upgrade to the latest RSpec gem using the RSpec source from GitHub:

$ cd ~/tmp
$ git clone git://github.com/dchelimsky/rspec.git
$ cd rspec
$ rake gem
$ sudo gem install pkg/rspec-1.4.1.gem

(All of Insoshi’s tests use RSpec, but if you use Test::Unit you should know that as of Rails 2.1 default tests can’t be run at the command line using the ruby executable; for tests generated by Rails 2.1, you now have to include the test directory explicitly using the -I test flag. For a hypothetical resource foobar, for example, you would get this:

$ script/generate scaffold foobar baz:string
$ ruby test/functional/foobars_controller_test.rb
test/functional/foobars_controller_test.rb:1:in `require':
no such file to load -- test_helper (LoadError)
        from test/functional/foobars_controller_test.rb:1

It passes if you tell ruby about the test directory:

$ ruby -I test test/functional/foobars_controller_test.rb
Loaded suite test/functional/foobars_controller_test
Started
.......
Finished in 0.24375 seconds.

7 tests, 13 assertions, 0 failures, 0 errors

(N.B. Using rake still works fine.) It’s unclear why the Rails Core team decided to make this change, but it’s a definite gotcha so I thought it deserved mention.)

With the new RSpec installed we can at least run the test suite, but unfortunately there are huge amounts of breakage. This is mainly due to several plugin incompatibilities and a slight Rails 2.1/Insoshi conflict (discussed below). Let’s get started fixing them.

Update obsolete helper specs

One source of breakage is the helper specs (in the spec/helpers/ directory). All these specs have obsolete code, resulting in errors such as

NoMethodError in 'TopicsHelper should include the TopicHelper'
undefined method `metaclass'

By going to a temp directory and running a sample rspec_scaffold with the updated RSpec to get a sample spec template file, you can discover that the line


included_modules = self.metaclass.send :included_modules

needs to be changed to


included_modules = (class << helper; self; end).send :included_modules

in each helper spec. You can use search-and-replace in your text editor to make all the changes, and then commit them:

$ git commit -a -m "Updated outdated spec helpers"

This won’t fix everything in the helper specs; there are still warning messages like this:

Modules will no longer be automatically included in RSpec version 1.1.4

This is because the ActivitiesHelper module is not explicitly included in the helper spec. The fix is to add the relevant include:

spec/helpers/activities_helper_spec.rb


require File.dirname(__FILE__) + '/../spec_helper'
include ActivitiesHelper
.
.
.

Update will_paginate for Rails 2.1

The old will_paginate plugin won’t work with Rails 2.1. You get tons of errors like

SystemStackError in 'PeopleController people pages should have a working show page'
stack level too deep

We need to remove the old plugin and install an update from GitHub. The current recommended method is to install it as a gem, and luckily Rails 2.1 has a slick new method for dealing with gem dependencies. In principle, we just need to add the following lines to environment.rb


Rails::Initializer.run do |config|
  .
  .
  .
  # Custom gem requirements
  config.gem 'mislav-will_paginate', :version => '~> 2.3.2',
                                     :lib => 'will_paginate',
                                     :source => 'http://gems.github.com'
end

This tells Rails that our application requires will_paginate version 2.3.2 or later, and that it can be found at GitHub. We can do the installation like this:

$ sudo rake gems:install

Unfortunately, in our case this won’t work, since will_paginate doesn’t work as a gem if the application has Rails in vendor/rails. (This gotcha is buried in the will_paginate wiki at GitHub.) We have to install will_paginate as a plugin after all:

$ git rm -r vendor/plugins/will_paginate
$ script/plugin install git://github.com/mislav/will_paginate.git
$ git add .
$ git commit -a -m "Updated will_paginate plugin"

(Unfortunately, this still didn’t fix the problem for Insoshi, because there were extra files in the lib/ directory:

$ git rm -r lib/will_paginate*
$ git commit -a -m "Removed will_paginate from lib"

I’m not sure how that happened, but it sure took a while to figure out…)

Update TextMate footnotes for Rails 2.1

Now that will_paginate is fixed (and the corresponding specs pass), you’d think the relevant pages would work. You’d be wrong. We were using the edge version of TextMate foototes, and it turns out that the footnotes-edge plugin breaks horribly in Rails 2.1. Basically, every page gives you something like this in the browser:

ActionController::RenderError in HomeController#index

You called render with invalid options : {:layout=>false, :action=>"index"}, nil

To fix this, we need to install an update from GitHub (are you seeing a pattern here?):

$ git rm -r vendor/plugins/footnotes-edge
$ script/plugin install git://github.com/drnic/rails-footnotes.git
$ mv vendor/plugins/rails-footnotes vendor/plugins/footnotes
$ git add .
$ git commit -a -m "Updated TextMate footnotes"

Update attachment_fu for Rails 2.1

We’re almost there. A few photo specs still fail, with messages like

NoMethodError in 'PhotosController when logged in should create photo'
undefined method `callbacks_for' for #<Photo:0x529a61c>

The solution is to update attachment_fu:

$ git rm -r vendor/plugins/attachment_fu/
$ script/plugin install http://svn.techno-weenie.net/projects/plugins/attachment_fu/
$ git add .
$ git commit -a -m "Updated attachment_fu"

Fix the broken verify action

This would seem to complete the update, but unfortunately this isn’t the end; there’s one more (Insoshi-specific) problem. Insoshi includes the option to verify the email addresses of new members, using a custom action called verify inside the People controller. Unfortunately, the specs that test the email verification fail with the error

No action responded to verify

This looked to be a general problem with custom actions in Rails 2.1 tests, but it turns out that the culprit is the word verify. For some reason, an action called verify causes problems in Rails 2.1, but only in tests. (This took a long time to figure out.) The (rather inelegant) fix is to rename the controller action to verify_email, and then add a line in the routes file so that old email verification links still work:

app/controllers/people_controller.rb


def verify_email
  .
  .
  .
end

config/routes.rb


map.resources :people, :member => { :verify_email => :get,
                                    :common_contacts => :get }
map.connect '/people/verify/:id', :controller => 'people',
                                  :action => 'verify_email'

Of course, we also need to change get :verify to get :verify_email in the spec file (spec/controllers/people_controller_spec.rb). Once that is done, all the specs pass, and the upgrade is complete:

$ rake spec
..............................................................................
..............................................................................
..............................................................................
..............................................................................
..............................

Finished in 12.337675 seconds

342 examples, 0 failures

Phew!

Postscript: Rails 2.1 migrations, schema_info, and schema_migration

One benefit of moving to Rails 2.1 is the new method for handling migrations. We’ve already run into instances with the Insoshi project where we needed to merge updates with conflicting migration numbers, and it’s really not any fun with Rails 2.0. There is a potential gotcha, though, since in order to perform the new migration cleverness Rails 2.1 uses a table called schema_migration in place of the old schema_info table. The change is supposed to happen automatically when you first migrate after installing Rails 2.1, but we ran into some difficulties…

When making a Rails 2.0 to 2.1 upgrade, you shouldn’t run into any problems if you start from a clean database and run

$ rake db:migrate

If you have an existing database (for instance, our demo site database), the migration should automatically convert the schema_info table used in 2.0 (which stores only a single integer value) to schema_migration (which has entries for all the migrations that have been performed).

While that worked for us in development, we ran into an issue on our staging server (where we test everything before installing it on our production servers): the expected table conversion didn’t happen. Instead, the migration tried to recreate the tables from scratch. We got around this by bootstrapping the conversion before the migration, as follows.

  1. Create the schema_migrations table via SQL:
    
    $ mysql insoshi_production -u root
    mysql> CREATE TABLE schema_migrations (
           version VARCHAR(255) NOT NULL,
           UNIQUE KEY unique_schema_migrations (version)
           );
    
  2. Find out the current migration version number:
    
    mysql> SELECT version FROM schema_info;
    +---------+
    | version |
    +---------+
    |      26 |
    +---------+
    1 row in set (0.07 sec)
    
  3. Insert values from 1 to the current version as strings (i.e., as a VARCHAR array):
    
    mysql> INSERT INTO schema_migrations VALUES ('1'),('2'),('3'),('4'),('5'),
    ('6'),('7'),('8'),('9'),('10'),('11'),('12'),('13'),('14'),('15'),
    ('16'),('17'),('18'),('19'),('20'),('21'),('22'),('23'),('24'),
    ('25'),('26');
    

Of course, you shouldn’t have to do this—things should Just Work™—but then again, upgrading to Rails 2.1 wasn’t supposed to be difficult, either. :-)

July 1, 2008

Using Git to pull in a patch from a single commit

Filed under: Git, Insoshi, Ruby on Rails — mhartl @ 1:06 pm

Git is awesome at merging and branching, but what if you want to pull in just one patch from a single commit?

We ran into this recently with Insoshi at GitHub, where piotrj updated the README to be in RDoc format. Why not just merge in his changes?  Well, Piotr has also been working on image galleries, but in the mean time billsaysthis has picked up that torch and run with it.  As a result, Piotr’s image gallery changes would cause conflicts with the current master branch, and in any case we don’t want those changes just yet—we only want the RDoc-ified README for now. If only there were a way to use Git to cherry-pick just one commit…

Aha, git cherry-pick to the rescue!  Here are the steps for my particular case:

  1. I use
     $ git fetch piotrj

    to fetch Piotr’s changes to my local machine. (I had already connected to his GitHub fork using the steps from the relevant Insoshi Git guide.)

  2. Looking at GitHub for the commit label, I see that it’s 3b4257f0454fc31349a0505c9a883f691fe8889d. (I could also checkout Piotr’s branch locally and use git log to see the commits.) So all I need to do is switch to the master branch and cherry-pick the change:
    $ git checkout master
    $ git cherry-pick 3b4257f0454fc31349a0505c9a883f691fe8889d

    Note that I don’t have to reference Piotr’s branch explicitly; Git figures out the right branch to use from the commit label.

That’s it! Amazingly, I don’t even have to do a commit; Git adds Piotr’s message to my log automatically. After pushing the updated master branch to GitHub, the Insoshi README is noticeably improved.  Thanks to piotrj—and to git cherry-pick!

June 26, 2008

Working around the validates_uniqueness_of bug in Ruby on Rails

Filed under: Ruby on Rails — mhartl @ 8:16 pm

The useful validates_uniqueness_of validation in Active Record has a well-known flaw that bit us recently. (Apparently it wasn’t well-known enough. :-) Insoshi uses email addresses as unique logins, which naturally means we have a validation enforcing email uniqueness. And yet, until recently our database contained instances of people with the same email address. How can this be?

The answer is simple: despite its name, validates_uniqueness_of doesn’t actually guarantee uniqueness. Suppose that a new user registers using the address “foobar@example.com”. Rails performs the following steps:

  1. Check the database to see if there is already a person with email address “foobar@example.com”
  2. If not, insert the new record

The problem is when the following sequence occurs:

  1. HTTP request #1 tries to create a new record with email “foobar@example.com”, which Rails marks as valid
  2. HTTP request #2 tries to create a new record with email “foobar@example.com”, which Rails marks as valid
  3. The process from request #1 saves to the database
  4. The process from request #2 saves to the database

The result, contrary to the supposed “uniqueness validation”, is two records with the address “foobar@example.com”! Since the duplication happens silently, this can badly corrupt databases in some cases. (Luckily for us, our email verification uses a save! to save the person, which raises an exception for duplicate records.)

We’re not sure exactly how the problem arose—possibly from double-clicks on the registration button—but fixing it involves making changes at the database level. For the sake of illustration, consider a stripped-down Person model with only an email address and password. Here’s what the migration might look like:


class CreatePeople < ActiveRecord::Migration
  def self.up
    create_table :people do |t|
      t.column :email,    :string
      t.column :password, :string
      t.timestamps
    end
  end

  def self.down
    drop_table :people
  end
end

The validations look like this, including validates_uniqueness_of for the email attribute:


class Person < ActiveRecord::Base

  validates_presence_of     :email
  validates_uniqueness_of   :email
  validates_confirmation_of :password

  attr_accessible :email, :password, :password_confirmation
end

As we’ve seen, this doesn’t actually validate uniqueness in all cases. Let’s write an RSpec test to catch the problem. We’ll be enforcing email uniqueness at the database level, so we expect the database to raise an exception if we try to create two records with the same email address. As we’ll see once we add the constraint, the actual exception raised is of type ActiveRecord::StatementInvalid, so we’ll test for that:


require File.dirname(__FILE__) + '/../spec_helper'

describe Person do
  it "should prevent duplicate emails" do
    person = new_person
    person.save
    duplicate = new_person
    lambda do
      # Pass 'false' to 'save' in order to skip the validations.
      duplicate.save(false)
    end.should raise_error(ActiveRecord::StatementInvalid)
  end

  private

    def new_person(options = {})
      Person.new({ :email    => 'foobar@example.com',
                   :password => 'pass',
                   :password_confirmation => 'pass' }.merge(options)))
    end
end

The key here is the line that passes false to Active Record’s save method, which skips the validations (including validates_uniqueness_of). Initially, this test will fail:

$ spec spec/models/person_spec.rb

F

1)
'Person should prevent duplicate emails' FAILED
expected ActiveRecord::StatementInvalid but nothing was raised

Now we enforce data integrity by putting a unique index on the email field:

$ script/generate migration add_email_unique_index

class AddEmailUniqueIndex < ActiveRecord::Migration
  def self.up
    add_index :people, :email, :unique => true
  end

  def self.down
    remove_index :people, :email
  end
end

Now the test should pass:

$ rake db:migrate; rake db:test:prepare
$ spec spec/models/person_spec.rb
.

Finished in 0.138142 seconds

1 example, 0 failures

This catches the problem in the model, but the original problem—the (attempted) creation of an email duplicate—will still raise an exception in our application. The mere attempt to create a dupe isn’t by itself a problem, so it’s probably best simply to ignore the exception by catching it and redirecting somewhere sensible:


  def create
    @person = Person.new(params[:person])
    if @person.save
      self.current_person = @person
      redirect_to '/'
      flash[:notice] = "Thanks for signing up!"
    else
      flash[:error]  = "We couldn't set up that account, sorry."
      render :action => 'new'
    end
  rescue ActiveRecord::StatementInvalid
    redirect_to '/'
  end

With that, we’ve prevented duplication and handled any errors gracefully. Huzzah!

Postscript: If your database is infected with duplicate entries, there’s a quick way to find them using the console. Since people with non-unique email addresses are invalid, we can find and destroy them as follows:

$ script/console
>> duplicates = Person.find(:all).reject { |person| person.valid? }; 0
=> 0
>> duplicates.map(&:email)
=> ["foobar@example.com", "foobar@example.com", "bazquux@example.com", "bazquux@example.com"]
>> duplicate[0].destroy; duplicate[2].destroy

(The ; 0 in the second line just suppresses the (potentially long) list of Person objects.) If you have more duplicates than this, you might want to write a script or rake task to scrub your database, but using the console was sufficient to solve our problem at the Insoshi developer site.

June 20, 2008

Insoshi “weekly” update

Filed under: Insoshi — mhartl @ 7:38 pm

This week’s version of Insoshi fixes a significant annoyance, the creation of accounts with duplicate email addresses due to a limitation in the Active Record validates_uniqueness_of validation. You can check out the diffs at GitHub if you’re really motivated, but I’ll be making a blog post about it next week, so interested readers might want to wait for that.

The focus of Insoshi development has sharpened in the past several weeks, reflected in the updated roadmap and new milestones.  Now that the validates_uniqueness_of issue has been fixed, there are only four outstanding tickets to be resolved to solidify the Insoshi foundations. (Briefly, there are a couple of speed bottlenecks to clear, a necessary improvement to messaging, and a change from Ferret to Sphinx for search. Note: this last change will break compatibility with SQLite.) We also have a slew of mostly minor bits of polish to apply. Finally, a surprise contributor has made some significant improvements to the user experience (UX), so expect a significant site upgrade once we merge in his branch. We expect these three milestones to be met by the end of July.

We’re quite busy these days answering email on the Google group and tending to the developer site, as well as gearing up to talk to investors.  So, given the multitude of our non-development obligations, we’re going to turn down the heat on ourselves a bit by forgoing weekly updates* and only making new Insoshi posts when there are important changes to note. In concert with this reduced Insoshi-blogging schedule, we’ll be ramping up our Ruby on Rails blogging, with occasional illuminating Rails posts starting next week. Watch this space.  :-)

*Previous updates were posted to the Google group (and my internal Insoshi blog) rather than to this blog.

May 23, 2008

Insoshi social networking platform update

Filed under: Insoshi — mhartl @ 4:36 pm

We’ve hit a couple of big milestones here at Insoshi, so we thought now would be a good time for an update on where we’ve been, where we’re going, and what we are.

The first milestone was the launch of our open-source project. We pulled out all the stops preparing for it: installing and configuring the servers; setting up the source repository, issue tracker, and documentation; and, most important, making the Insoshi platform good enough to host the Insoshi developer site. The result has been robust growth with continuous improvement and minimal downtime. The second milestone was the completion of the initial administrative interface, especially the ability of the admin to communicate with users and manage various aspects of the community. These features have been largely invisible, since Long and I are the only admins at the developer site, but anyone downloading the source code and deploying it will benefit directly from these efforts.

In addition to the admin features, we also had the usual fires to put out: squashing subtle bugs, managing stability issues (particularly with Ferret), and implementing the kind of missing features you can only identify by having a live site with real users. There are still a few loose ends, such as a nasty bug in the validates_uniqueness_of Rails validation, a couple of speed bottlenecks in the feed-generation and message-sending subroutines, and a move from Ferret to Sphinx for search. We’ll be addressing these issues as soon as possible.

In the short run, in addition to the loose ends mentioned above, we’ll be focusing on ancillary issues such as polishing up current features, staying active on the developer site, and maybe even contributing to this blog! (We’re also both moving soon, a short-lived but significant disruption.) Due to these various factors, in the coming weeks we’ll have limited time for implementing new features, and as always we’ll gladly and gratefully accept contributions from the Insoshi community.

Finally, it’s worth noting that Insoshi is more than a project or even a product: Insoshi is a startup backed by Y Combinator, a leading seed-stage venture capital firm (or, as Paul Graham puts it, “a small, furry steam catapult”). It’s likely that we’ll soon start looking to raise a round of funding, which would allow us to hire some more developers and take Insoshi to the next level. We hope you’ll be a part of it!

April 28, 2008

Welcome to the Insoshi blog!

Filed under: Insoshi — mhartl @ 11:21 am

This is the blog for the Insoshi open-source social networking platform. We will occasionally mention things specific to Insoshi, but our main focus will be on Ruby on Rails, the web framework used to make Insoshi.

We’ve been focused on launching the Insoshi developer site, but in the coming weeks we’ll be working hard to make this a great Rails blog—so you should definitely subscribe to our feed. :-)

Blog at WordPress.com.