I wanted to add the ability to do ‘fuzzy’ searches to my rails app (without having to resort to stronger solutions like solr or sphinx). This is more than a simple ‘like’ clause, as you want to be able to match on things that sound similar, or are otherwise ‘very close’.
I came across this blog post
http://unirec.blogspot.com/2007/12/live-fuzzy-search-using-n-grams-in.html
which took me to https://github.com/bmaland/no_fuzz which doesn’t seem to work with rails 3. After looking at the branches of the plugin, I managed to get the plugin installed and running with my users table.
in your rails app directory
Pull the plugin
rails plugin install git://github.com/beno/no_fuzz.git
rails generate no_fuzz User
Then we need to actually create the table to hold our trigrams
rake db:migrate
Now we need to tell our model to populate the trigrams when it’s created, and I added a simple wrapper method for doing the fuzzy searches
class User < ActiveRecord::Base
include NoFuzz
fuzzy :username
def initialize
super
User.populate_trigram_index
end
#this returns an array of User objects
def self.fuzzy_search(search_term)
results = User.fuzzy_find(search_term, 10)
end
#this returns an active relation object
def self.search(search_term)
if search
where('username LIKE ?', "%#{search_term}%")
else
scoped
end
end
Now we can use both search methods for our data as shown here
User.fuzzy_find("ma", 10) returns an array of records
User.search("Mat") returns an active record relation
Edit. I wasn’t fully happy with the previous solution, since it wouldn’t play well with the sorting and pagination code that I use elsewhere. So, I ‘fixed’ the no_fuzz plugin to play nice with the rest of the app.
I edited the no_fuzz.rb file to modify the the fuzzy_find method to the following.
def get_fuzzy_matches(word, limit = 0)
word = " #{word} "
trigram_list = (0..word.length-3).collect { |idx| word[idx,3] }
trigrams = @fuzzy_trigram_model.where(["tg IN (?)", trigram_list])
trigrams = trigrams.group(@fuzzy_ref_id)
trigrams = trigrams.order('SUM(score) DESC')
trigrams = trigrams.includes(belongs_to_association)
trigrams = trigrams.limit(limit) if limit > 0
trigrams
end
#this just returns the ids for the records that match
def fuzzy_find_ids(word, limit = 0)
trigrams = get_fuzzy_matches(word, limit)
trigrams.all.collect do |trigram|
trigram[belongs_to_association + '_id']
end
end
#this returns the actual records that match.
def fuzzy_find(word, limit = 0)
trigrams = get_fuzzy_matches(word, limit)
trigrams.all.collect do |trigram|
trigram.send(belongs_to_association)
end
end
Then I modified my user model to the following
#this returns an array of User objects
def self.fuzzy_search(search)
results = User.fuzzy_find(search, 10)
end
#this returns an active relation object
def self.search(search)
if search
user_ids = User.fuzzy_find(search, 10)
where(["id IN (?)", user_ids])
else
scoped
end
end
Finally, in my controller I was able start using this thusly (I’m using Kaminari for my pagination)
@users = User.search(params[:search]).page(params[:page]).per(10)
Now that the code plays nice with AR, I was able to use it in my ajax’ified search box, so it provides lookahead suggestions as a user is typing (using jquery of course). It also integrates with the dynamic table view I have, to update the tables results based on what’s being searched for.
It should be noted that this has the side effect of possibly having your specified ordering override the ordering returned by no_fuzz, but if you’re only returning a small number of possible matches then it’s not a very big deal.