Chaining pg_search_scope using tsvector column with regular Model scope returns incorrect results

Moncef Belyamani

unread,

May 7, 2014, 7:57:04 PM5/7/14

to casecom...@googlegroups.com

Hi,

I'm running into an issue that only happens when combining the pg_search_scope with another regular scope. Here's my pg_search_scope:

pg_search_scope :keyword_search, against: :tsv_body,

using: {

tsearch: {

dictionary: "english",

any_word: false,

prefix: false,

tsvector_column: 'tsv_body'

}

I also have other non-pg_search scopes defined, such as this one:

scope :has_language, ->(lang) { where("languages @@ :q", q: lang) if lang.present? }

If I combine them both, more often than not, it returns documents that don't match the keyword parameter.

Here is a sample failing query from the log:

SELECT "locations".*, ((ts_rank(("locations"."tsv_body"), (to_tsquery('english', ''' ' || 'food' || ' ''')), 0))) AS pg_search_rank FROM "locations" WHERE ((("locations"."tsv_body") @@ (to_tsquery('english', ''' ' || 'food' || ' ''')))) AND (languages @@ 'french') ORDER BY pg_search_rank DESC, "locations"."id" ASC LIMIT 30 OFFSET 0

Any idea what might be causing this? Is it because it's ordering by pg_search_rank, which is not something it can order by with the has_language scope? Does this all come back to this issue: https://github.com/Casecommons/pg_search/issues/1 ?

Thanks!

Moncef Belyamani

unread,

May 7, 2014, 8:20:17 PM5/7/14

to casecom...@googlegroups.com

I found something interesting. I looked at the pg_search_rank for the results and noticed that the ones that don't have any occurrence of the keyword parameter all have the same rank of 0.398494.

Here is what I ran in the console:

res = Location.has_keyword('food').has_language('french')

res.map(&:pg_search_rank)

=> [0.917802, 0.894996, 0.854458, 0.854458, 0.76326, 0.76326, 0.76326, 0.76326, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494, 0.398494]

So then I ran the query with just the pg_search_scope:

Location.has_keyword('food')

and noticed the same thing! It returns a ton of results, but only a fraction are actually relevant!

So, the question is, is it normal for pg_search to return results that don't match the string? I would think not. It seems like pg_search is returning results even if they don't match, but just ranking them lower. Is that the default behavior? If so, how can I change it? Would it be safe to assume that any rank below 0.398494 is not relevant?

Moncef

Moncef Belyamani

unread,

May 7, 2014, 8:47:14 PM5/7/14

to casecom...@googlegroups.com

One more update. It occurred to me that maybe the tsvector column did in fact contain 'food', and after looking in the tsv_body column, I can see 'food' in there, as well as other terms that aren't present anywhere else in the data! This makes me think something is wrong with how I populate my tsvector column.

I would greatly appreciate it if someone could tell me what I'm doing wrong here. Thanks!

class AddSearchVectorToLocations < ActiveRecord::Migration

def up

add_column :locations, :tsv_body, :tsvector

add_index :locations, :tsv_body, using: 'gin'

execute <<-SQL

CREATE OR REPLACE FUNCTION fill_search_vector_for_location() RETURNS trigger LANGUAGE plpgsql AS $$

declare

location_organization record;

location_services_keywords record;

location_services_description record;

location_services_name record;

service_categories record;

begin

select name into location_organization from organizations where id = new.organization_id;

select string_agg(keywords, ' ') as keywords into location_services_keywords from services where location_id = new.id;

select string_agg(description, ' ') as description into location_services_description from services where location_id = new.id;

select string_agg(name, ' ') as name into location_services_name from services where location_id = new.id;

select string_agg(categories.name, ' ') as name into service_categories from locations

JOIN services ON services.location_id = locations.id

JOIN categories_services ON categories_services.service_id = services.id

JOIN categories ON categories.id = categories_services.category_id;

new.tsv_body :=

setweight(to_tsvector('pg_catalog.english', coalesce(new.name, '')), 'B') ||

setweight(to_tsvector('pg_catalog.english', coalesce(new.description, '')), 'A') ||

setweight(to_tsvector('pg_catalog.english', coalesce(location_organization.name, '')), 'B') ||

setweight(to_tsvector('pg_catalog.english', coalesce(location_services_description.description, '')), 'A') ||

setweight(to_tsvector('pg_catalog.english', coalesce(location_services_name.name, '')), 'B') ||

setweight(to_tsvector('pg_catalog.english', coalesce(location_services_keywords.keywords, '')), 'A') ||

setweight(to_tsvector('pg_catalog.english', coalesce(service_categories.name, '')), 'B');

return new;

end

$$;

SQL

execute <<-SQL

CREATE TRIGGER locations_search_content_trigger BEFORE INSERT OR UPDATE

ON locations FOR EACH ROW EXECUTE PROCEDURE fill_search_vector_for_location();

SQL

Location.find_each(&:touch)

end

def down

execute <<-SQL

DROP TRIGGER locations_search_content_trigger ON locations;

DROP FUNCTION fill_search_vector_for_location();

SQL

remove_column :locations, :tsv_body

end

On Wednesday, May 7, 2014 7:57:04 PM UTC-4, Moncef Belyamani wrote:

Moncef Belyamani

unread,

May 7, 2014, 9:29:00 PM5/7/14

to casecom...@googlegroups.com

Doh! I feel so stupid now. I realize the mistake I made in the function. I had "locations.id" instead of "new.id"