Re: [gpdb-users] Re: Text search in Greenplum Database

429 views
Skip to first unread message
Message has been deleted
Message has been deleted

Jason Champion

unread,
Jan 18, 2016, 1:05:59 PM1/18/16
to gpdb-...@greenplum.org
I use this very heavily in Postgres and it's the biggest thing causing me to maintain both PG and GP databases in the same data center. Having full GIN/GIST support would eliminate the need for PG in my applications.

On 01/18/2016 09:53 AM, EManchester wrote:
How about pg_trgm gin and gist index support?  Any plans for this in the near term?

gpadmin=# CREATE TABLE test_trgm (id integer, t text) distributed by (id);
CREATE TABLE

gpadmin=# CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
ERROR:  operator class "gist_trgm_ops" does not exist for access method "gist"

gpadmin=# CREATE INDEX trgm_idx ON test_trgm USING gin (t gin_trgm_ops);
ERROR:  GIN indexes are not supported




The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~ and ~* queries. (These indexes do not support equality nor simple comparison operators, so you may need a regular B-tree index too.)

Example:

CREATE TABLE test_trgm (t text);
CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);

or

CREATE INDEX trgm_idx ON test_trgm USING gin (t gin_trgm_ops);

At this point, you will have an index on the t column that you can use for similarity searching. A typical query is

SELECT t, similarity(t, 'word') AS sml
  FROM test_trgm
  WHERE t % 'word'
  ORDER BY sml DESC, t;



On Monday, January 18, 2016 at 8:58:19 AM UTC-7, EManchester wrote:
I need an update on GP's capabilities specifically GPText or whatever it's called today.  The need stems from needing to index large files and being able to search on keywords.  The files would not be stored in GP but rather a link to whatever storage engine we select.

The want is
Incoming files -> index on keywords & phrases -> store keywords in DB w/ link to document & raw on the other side of the link

The need to store in a DB is to be able to correlate with other information best suited for a residing in a RDBMS.  Open to other possibilities too of course!

Thanks.


--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

-- 
Jason Champion
Senior Software Engineer
Mobile 971-266-3538
jason.c...@rentrakmail.com

Jon Roberts

unread,
Jan 18, 2016, 2:02:00 PM1/18/16
to Jason Champion, Greenplum Users
Have you tried the fuzzystrmatch package instead?  It is located in /usr/local/greenplum-db/share/postgresql/contrib


You get the benefits of matching text based on similarities without the overhead or maintenance of an index.  

Jon 

Jimmy Yih

unread,
Jan 18, 2016, 2:54:52 PM1/18/16
to Jon Roberts, Jason Champion, Greenplum Users
There definitely is interest in getting GIN enabled:
https://github.com/greenplum-db/gpdb/issues/239
Reply all
Reply to author
Forward
0 new messages