Inside Beijing’s Forbidden City.

Inside Beijing’s Forbidden City.

posted on January 17, 2010

The Forbidden City.

The Forbidden City.

posted on January 15, 2010

Thinking Sphinx: Searching By Location And Keyword

Let’s say that we’re building a Rails app to index independent coffee shops in our town. Every coffee shop has a name, a description and some comments, as well as a latitude and longitude value so that we can place it on a map. We want to allow our users to search through the coffee shops in our database by providing keywords and a location. We want our searches to take into account BOTH the relevance of the results based on keyword matches, as well as their proximity to the location given.

Thinking Sphinx

For this tutorial we’ll be using Thinking Sphinx, a great search library for Ruby projects.

First, we’ll need to install Sphinx and Thinking Sphinx. If you’re new to Sphinx, you might want to check out this great tutorial which explains the basics of indexing and searching.

Indexing the Model

Once Sphinx and Thinking Sphinx are installed, we’re ready to define the indexes on our model. This tells Sphinx which fields to store in its index for searching, which attributes we want to have available for sorting and filtering, as well as any other properties we want to define.

# Table name: coffee_shops
#
#  id              :integer(4)      not null, primary key
#  name            :string(255)
#  description     :text
#  lat             :float
#  lng             :float

class CoffeeShop < ActiveRecord::Base
  has_many :comments

  define_index do
    # fields
    indexes :name
    indexes :description
    indexes comments.body, :as => :comments

    # attributes
    has 'RADIANS(lat)', :as => :lat,  :type => :float
    has 'RADIANS(lng)', :as => :lng,  :type => :float

    # properties
    set_property :latitude_attr  => 'lat'
    set_property :longitude_attr => 'lng'
    set_property :field_weights  => { 'name'        => 10,
                                      'description' => 2,
                                      'comments'    => 1 }
  end
end

The fields in the index tell Sphinx that we want our search to look at the name and description of our coffee shops, as well as the body of any comments that have been made. Notice that we’re able to index not only fields that are in our coffee_shops table, but also fields from associated records - in this case the bodies of our comments.

Defining a lat and lng attribute are necessary for doing geography-based searches. The big gotcha here is that Sphinx needs these attributes to be stored as radians, whereas most geocoding APIs (such as Google) use decimal degrees. The SQL ‘RADIANS(lat)’ will automatically do this conversion for you. If you happen to have your lat and lng stored as radians already, however, you can just define your attributes like this:

# attributes
has :lat
has :lng

Finally, we define our properties. The :latitude_attr and :longitude_attr properties tell Sphinx which fields we’re using for our geography calculations. The :field_weights define how much weight we want to give to each indexed field. If we get a match on the name of one of our coffee shops, that should weigh heavier in the relevance than if we got a match on one of our comment bodies.

The Search Method

Next, we’ll redefine the CoffeeShop::search class method, allowing us to use a custom sort expression and a geocode object.

class CoffeeShop < ActiveRecord::Base

  # ...

  METERS_PER_MILE = 1609.344
  SORT_EXPRESSION = "@weight * @weight / @geodist"
  RADIUS = 5

  def self.search(keywords, var = {})
    search_options = {:page => var[:page] || 1, :per_page => PER_PAGE}

    if var[:geocode] && var[:geocode].success?
      lat = (var[:geocode].lat / 180.0) * Math::PI
      lng = (var[:geocode].lng / 180.0) * Math::PI

      search_options[:geo] = [lat, lng]
      search_options[:sort_mode] = :expr
      search_options[:sort_by] = SORT_EXPRESSION
      search_options[:with] = {"@geodist" => 0.0..(RADIUS * METERS_PER_MILE)}
    end

    super(keywords, search_options)
  end
end

That’s a lot of awesome. Let’s walk through our new search class and see what’s going on.

@geodist, @weight and SORT_EXPRESSION

Sphinx gives you some special attributes for sorting and filtering, including @weight and @geodist. @weight is the relevance of a search result (the larger the number, the more relevant the result) and is the default sorting option. @geodist is the distance (in meters) of the search result from the anchor point. By defining SORT_EXPRESSION to use both the @weight and @geodist attributes, we can sort in a way that takes both into account. You can add any other operators and attributes to this expression that you want to tailor how your results are sorted. For instance, if you had a ‘popularity’ attribute on your model and wanted more popular coffee shops to rank better, you could define your search expression as ‘@weight * @weight / @geodist + popularity’ (just make sure you add ‘popularity’ to the list of attributes on your model index).

:page and :per_page

If you have WillPaginate installed, Thinking Sphinx will automatically wrap your search results in a WillPaginate collection, allowing you to use all your normal WillPaginate view helpers. Neat!

:geo, :sort_mode, :sort_by and :with

These attributes affect the sorting and filtering of our search. :geo tells Sphinx that we are doing a geography search with specific lat and lng variables and tells it to add the @geodist attribute to each result. :sort_mode and :sort_by tell Sphinx that we want to sort results by our SORT_EXPRESSION constant. The :with option tells Sphinx that we only want to return results within five miles of our anchor point.

Finally, the last line performs the actual search based on the keywords and search options we’ve set up.

Executing Searches

Now that our model is set up, executing searches is as simple as:

@geocode = MultiGeocoder.geocode(params[:query_location])

@coffee_shops = CoffeeShop.search(params[:query_keywords],
                                  :page => params[:page],
                                  :geocode => @geocode)

This will execute our query with all the parameters we care about and return a paginated collection of coffee shops sorted by relevance and distance. Also, if any or all of the parameters are blank, nothing breaks! If all parameters are blank, a Sphinx search won’t even be performed, and a WillPaginate collection will be returned with our model’s default sorting and PER_PAGE attributes.

Happy searching!

posted on June 29, 2009

The desert outside the Huacachina Oasis in Peru.

The desert outside the Huacachina Oasis in Peru.

posted on June 15, 2008

Angkor Wat.

Angkor Wat.

posted on May 28, 2007