pg_search compared to one tsvector column

658 views
Skip to first unread message

Karl Baum

unread,
Dec 14, 2011, 11:37:50 AM12/14/11
to casecom...@googlegroups.com
The peepcode postgres video had a great section on implementing full text search with rails and postgres. In their example they use a tsvector column which indexes the search fields with a trigger on creation of the object. There's a few benefits to this approach:

1) All of the stemming/indexing is done on creation/update of the object instead of per query
2) Because of #1, the size of the query is much smaller.
3) The tsvector column can be indexed

Because I am by no means an expert on postgres or postgres search, before I switch from pg_search to using one tsvector column, i wanted to check in with the members of this mailing list. What are the advantages to using pg_search dynamic queries vs populating a tsvector column on creation of the entity?

thx!

-karl

Kris Hicks

unread,
Dec 14, 2011, 3:03:14 PM12/14/11
to Case Commons Development
You don't need to switch away from using pg_search for this as
pg_search (specifically, pg_search_scope) recently began supporting
using a tsvector column for dmetaphone and tsearch search methods. The
pg_search README explains how to use the tsvector column option.

Briefly, using a tsvector column will greatly increase the speed of
your searches should you have many rows to search across. The reason
the support was added to pg_search was that searching across 1.6
million records with an expression-based index took around 18-22
seconds depending on the number of rows the query matched. Using
tsvector columns this went down to 400-1000ms, again depending on the
number of matching rows. Also with this dataset we chose to use GIN
indexes instead of GIST; I'd suggest looking into the PostgreSQL
documentation to find out which will work best for you.

Grant Hutchins

unread,
Dec 14, 2011, 3:13:04 PM12/14/11
to casecom...@googlegroups.com
By the way, I still need to push out a new gem version. So for now you will have to use the HEAD of git to use the tsvector functionality.

I'll get to this in short order, sorry for any confusion.

Grant
--
Grant Hutchins
Pivotal Labs

Karl Baum

unread,
Dec 14, 2011, 3:19:01 PM12/14/11
to casecom...@googlegroups.com
Gotcha.  I'll take a look.  Thanks!

Tim Uckun

unread,
Dec 14, 2011, 3:42:21 PM12/14/11
to casecom...@googlegroups.com
If you are going to roll your own solution do what pg_search does and
put your tsvector field in another table. That way AR does not fetch
them when you do a Person.find() and pg_admin does not fetch them when
you are running select * queries.

That goes for all large fields actually not just tsvector.

Grant Hutchins

unread,
Dec 14, 2011, 8:52:44 PM12/14/11
to casecom...@googlegroups.com
Version 0.4 is now released with tsvector support.

Grant

Karl Baum

unread,
Dec 29, 2011, 8:51:39 PM12/29/11
to casecom...@googlegroups.com
Trying to change over to the tsvector approach but I am currently using against and associated_against to search columns from multiple tables.  Is it possible to search agains two tsvector columns on two different tables?

thx!

dieg...@gmail.com

unread,
Oct 9, 2013, 8:47:44 AM10/9/13
to casecom...@googlegroups.com
I would love to hear an answer to that question!
I'm trying to do exactly that. Use associated_against with a tsvector option.
Reply all
Reply to author
Forward
0 new messages