Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

9.2.4: Strange behavior when wildcard is on left side of search string

7 views
Skip to first unread message

ERR ORR

unread,
Apr 6, 2013, 7:12:25 PM4/6/13
to
Hi,

this is with 9.2.4_PGDG / FC18 / 64bit upgraded from 9.1.8 via dump/restore, settings kept for the most part.

Table has 1.5M records, the varchar(100) field in question has a varchar_ops and a varchar_pattern_ops btree index.

3 Cases:
  • "MYFIELD" like 'BLA BLA %': OK, about 7 msec
  • "MYFIELD" like 'BLA % BLA': OK, about 20 msec
  • "MYFIELD" like '% BLA BLA': NOT OK
In the third case, the query will take anywhere between 4200ms and over 83Kms ms to deliver 2 results, in one case I broke it off after 10 MINUTES. I never noticed this sort of behavior in the 9.x series.

According to explain, the query resolves into an index-only scan. I tried to turn this off to
see how it behaves but the toggle in the .conf apparently has no effect. 

What I find interesting is that, whereas with the default it would resolve into an index-only scan on the varchar_pattern_ops index, after setting indexscan_only=off, it would resolve into
an index-only scan on the varchar_ops index.

Another peculiarity is that the Explain for the bad case does not display the "Sort" icon for the order-by clause, whereas the OK cases do display it.

Also, the problem does not appear to be a resource problem, as I have the settings and resources to pull that whole table plus indexes into RAM, yet still, there is constant disk activity during the query in the bad case.

And, yes, I DID reindex and/or vacuum verbose analyze the table after each relevant change.

Thanks for any feedback on this. If you need any further info I'll be happy to help as possible.

RD

David Johnston

unread,
Apr 6, 2013, 9:19:21 PM4/6/13
to
r d-3 wrote
> Hi,
>
> this is with 9.2.4_PGDG / FC18 / 64bit upgraded from 9.1.8 via
> dump/restore, settings kept for the most part.
>
> Table has 1.5M records, the varchar(100) field in question has a *
> varchar_ops* and a *varchar_pattern_ops* btree index.
>
> 3 Cases:
>
> - "MYFIELD" like 'BLA BLA *%*': *OK, about 7 msec*
> - "MYFIELD" like 'BLA *%* BLA': *OK, about 20 msec*
> - "MYFIELD" like '*%* BLA BLA': *NOT OK*
>
> In the third case, the query will take anywhere between 4200ms and over
> 83Kms ms to deliver 2 results, in one case I broke it off after 10
> MINUTES.
> I never noticed this sort of behavior in the 9.x series.
>
> According to explain, the query resolves into an index-only scan. I tried
> to turn this off to
> see how it behaves but the toggle in the .conf apparently has no effect.
>
> What I find interesting is that, whereas with the default it would resolve
> into an index-only scan on the *varchar_pattern_ops* index, after setting
> indexscan_only=off, it would resolve into
> an index-only scan on the *varchar_ops* index.
>
> Another peculiarity is that the Explain for the bad case does not display
> the "Sort" icon for the order-by clause, whereas the OK cases do display
> it.
>
> Also, the problem does not appear to be a resource problem, as I have the
> settings and resources to pull that whole table plus indexes into RAM, yet
> still, there is constant disk activity during the query in the bad case.
>
> And, yes, I DID reindex and/or *vacuum verbose analyze* the table after
> each relevant change.
>
> Thanks for any feedback on this. If you need any further info I'll be
> happy
> to help as possible.
>
> RD

Going from slightly dated memory here but...

The further left the wildcard the larger the portion of the index that has
to be scanned. In the worse case, your third example, everything has to be
scanned and basically the index is only acting as a table surrogate as
opposed to an index. If you really want to anchor the start you should
index and search on the reverse of the string so you can write it as a
postfix wildcard. The other option to index words via the full-text search
capabilities.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/9-2-4-Strange-behavior-when-wildcard-is-on-left-side-of-search-string-tp5751086p5751095.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.


--
Sent via pgsql-bugs mailing list (pgsql...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Kevin Grittner

unread,
Apr 7, 2013, 9:52:08 AM4/7/13
to
David Johnston <pol...@yahoo.com> wrote:

> varchar(100)

> The other option to index words via the full-text search capabilities.

Or for columns this short,� a similarity search on a trigram index.

http://www.postgresql.org/docs/current/interactive/pgtrgm.html

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Kevin Grittner

unread,
Apr 7, 2013, 11:22:46 AM4/7/13
to
ERR ORR <rd0...@gmail.com> wrote:

> the fact that in 9.2.4 it is so bad that a simple query sometimes
> does not finish after 10+ minutes with incessant disk activity
> tends to look like a problem to me.

Try running a VACUUM FREEZE ANALYZE; command on the database, under
a database superuser ID.� If you upgraded with pg_dump (or some
other logical population of the data, versus a pg_upgrade run) you
are probably getting bitten by the initial setting of hint bits.

http://wiki.postgresql.org/wiki/Hint_Bits

See if it is still slow after that....

Christopher Browne

unread,
Apr 7, 2013, 11:45:49 AM4/7/13
to

This doesn't seem either buggy or strange...

An index on the ordering of that column is not helpful in handling a leading wildcard, and so the query optimizer will, in such cases, revert, correctly, to using a sequential scan and filtering the results.

If you have cases where this sort of wildcard needs to be fast, a functional index could help.

Create index foo on tbl (reverse(col));

And reverse the wildcard so the index is usable:

Select * from tbl where reverse(col) like 'esrever%';

That query can harness the reversed index.

Unfortunately, no ordered index helps with

Select * from too where col like '%something%';

For that, a full text search index can help, but that is a longer story.

At any rate, what you are observing is no surprise, and consistent with what many database systems do.

Tom Lane

unread,
Apr 7, 2013, 11:48:19 AM4/7/13
to
ERR ORR <rd0...@gmail.com> writes:
> My understanding from your replies is that this behavior with *b-tree
> indices* is not considered a bug but rather a case of "works as designed",
> yet still and apart from the solution of my particular problem, the fact
> that in 9.2.4 it is so bad that a simple query sometimes does not finish
> after 10+ minutes with incessant disk activity tends to look like a problem
> to me.

It is no better or worse in 9.2.4 than in any prior release.

regards, tom lane
0 new messages