Chaining pg_search_scope using tsvector column with regular Model scope returns incorrect results

76 views
Skip to first unread message

Moncef Belyamani

unread,
May 7, 2014, 7:57:04 PM5/7/14
to casecom...@googlegroups.com
Hi,

I'm running into an issue that only happens when combining the pg_search_scope with another regular scope. Here's my pg_search_scope:

pg_search_scope :keyword_search, against: :tsv_body,
    using: {
      tsearch: {
        dictionary: "english",
        any_word: false,
        prefix: false,
        tsvector_column: 'tsv_body'
      }
    }

I also have other non-pg_search scopes defined, such as this one:

scope :has_language, ->(lang) { where("languages @@ :q", q: lang) if lang.present? }

If I combine them both, more often than not, it returns documents that don't match the keyword parameter.
Here is a sample failing query from the log:

SELECT "locations".*, ((ts_rank(("locations"."tsv_body"), (to_tsquery('english', ''' ' || 'food' || ' ''')), 0))) AS pg_search_rank FROM "locations" WHERE ((("locations"."tsv_body") @@ (to_tsquery('english', ''' ' || 'food' || ' ''')))) AND (languages @@ 'french') ORDER BY pg_search_rank DESC, "locations"."id" ASC LIMIT 30 OFFSET 0

Any idea what might be causing this? Is it because it's ordering by pg_search_rank, which is not something it can order by with the has_language scope? Does this all come back to this issue: https://github.com/Casecommons/pg_search/issues/1 ?

Thanks!

Moncef Belyamani

unread,
May 7, 2014, 8:20:17 PM5/7/14
to casecom...@googlegroups.com
I found something interesting. I looked at the pg_search_rank for the results and noticed that the ones that don't have any occurrence of the keyword parameter all have the same rank of 0.398494.

Here is what I ran in the console:

res = Location.has_keyword('food').has_language('french')

res.map(&:pg_search_rank)
 => [0.917802, 0.894996, 0.854458, 0.854458, 0.76326, 0.76326, 0.76326, 0.76326, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494]

So then I ran the query with just the pg_search_scope:

Location.has_keyword('food')

and noticed the same thing! It returns a ton of results, but only a fraction are actually relevant!

So, the question is, is it normal for pg_search to return results that don't match the string? I would think not. It seems like pg_search is returning results even if they don't match, but just ranking them lower. Is that the default behavior? If so, how can I change it? Would it be safe to assume that any rank below 0.398494 is not relevant?

Moncef

Moncef Belyamani

unread,
May 7, 2014, 8:47:14 PM5/7/14
to casecom...@googlegroups.com
One more update. It occurred to me that maybe the tsvector column did in fact contain 'food', and after looking in the tsv_body column, I can see 'food' in there, as well as other terms that aren't present anywhere else in the data! This makes me think something is wrong with how I populate my tsvector column.

I would greatly appreciate it if someone could tell me what I'm doing wrong here. Thanks!

class AddSearchVectorToLocations < ActiveRecord::Migration
  def up
    add_column :locations, :tsv_body, :tsvector
    add_index  :locations, :tsv_body, using: 'gin'

    execute <<-SQL
      CREATE OR REPLACE FUNCTION fill_search_vector_for_location() RETURNS trigger LANGUAGE plpgsql AS $$
      declare
        location_organization record;
        location_services_keywords record;
        location_services_description record;
        location_services_name record;
        service_categories record;

      begin
        select name into location_organization from organizations where id = new.organization_id;
        select string_agg(keywords, ' ') as keywords into location_services_keywords from services where location_id = new.id;
        select string_agg(description, ' ') as description into location_services_description from services where location_id = new.id;
        select string_agg(name, ' ') as name into location_services_name from services where location_id = new.id;
        select string_agg(categories.name, ' ') as name into service_categories from locations
        JOIN services ON services.location_id = locations.id
        JOIN categories_services ON categories_services.service_id = services.id
        JOIN categories ON categories.id = categories_services.category_id;

        new.tsv_body :=
          setweight(to_tsvector('pg_catalog.english', coalesce(new.name, '')), 'B')                  ||
          setweight(to_tsvector('pg_catalog.english', coalesce(new.description, '')), 'A')                ||
          setweight(to_tsvector('pg_catalog.english', coalesce(location_organization.name, '')), 'B')        ||
          setweight(to_tsvector('pg_catalog.english', coalesce(location_services_description.description, '')), 'A')  ||
          setweight(to_tsvector('pg_catalog.english', coalesce(location_services_name.name, '')), 'B')  ||
          setweight(to_tsvector('pg_catalog.english', coalesce(location_services_keywords.keywords, '')), 'A') ||
          setweight(to_tsvector('pg_catalog.english', coalesce(service_categories.name, '')), 'B');

        return new;
      end
      $$;
    SQL

    execute <<-SQL
      CREATE TRIGGER locations_search_content_trigger BEFORE INSERT OR UPDATE
        ON locations FOR EACH ROW EXECUTE PROCEDURE fill_search_vector_for_location();
    SQL

    Location.find_each(&:touch)
  end

  def down
    execute <<-SQL
      DROP TRIGGER locations_search_content_trigger ON locations;
      DROP FUNCTION fill_search_vector_for_location();
    SQL
    remove_column :locations, :tsv_body
  end
end

On Wednesday, May 7, 2014 7:57:04 PM UTC-4, Moncef Belyamani wrote:

Moncef Belyamani

unread,
May 7, 2014, 9:29:00 PM5/7/14
to casecom...@googlegroups.com
Doh! I feel so stupid now. I realize the mistake I made in the function. I had "locations.id" instead of "new.id"

Now everything is working as expected. Sorry for the false alarm!


On Wednesday, May 7, 2014 7:57:04 PM UTC-4, Moncef Belyamani wrote:
Reply all
Reply to author
Forward
0 new messages