PostgreSQL OR performance

Віталій Тимчишин

unread,

Nov 5, 2008, 6:12:32 AM11/5/08

to

Hello.

For a long time already I can see very poor OR performance in postgres.
If one have query like "select something from table where condition1 or condition2" it may take ages to execute while
"select something from table where condition1" and "select something from table where condition2" are executed very fast and
"select something from table where condition1 and not condition2 union all select something from table where condition2" gives required results fast

For example, in my current query for "condition1" optimizer gives 88252, for "condition1 and not condition2" it is 88258, for "condition2" it is 99814.
And for "condition1 or condition2" it is 961499627680. And it does perform this way.

All is more or less good when "select" part is easy and query can be easily rewritten. But for complex queries it looks ugly and if the query is autogenerated, moving autogeneration mechanism from creating simple clean "where" to unions is not an easy task.

So the question is: Do I miss something? Can this be optimized?

Tom Lane

unread,

Nov 5, 2008, 10:34:51 AM11/5/08

to

"=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?=" <tiv...@gmail.com> writes:
> For a long time already I can see very poor OR performance in postgres.

If you would provide a concrete example rather than handwaving, we might
be able to offer some advice ...

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-pe...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Jeff Davis

unread,

Nov 5, 2008, 11:31:40 AM11/5/08

to

On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote:
> For a long time already I can see very poor OR performance in
> postgres.
> If one have query like "select something from table where condition1
> or condition2" it may take ages to execute while
> "select something from table where condition1" and "select something
> from table where condition2" are executed very fast and
> "select something from table where condition1 and not condition2 union
> all select something from table where condition2" gives required
> results fast
>

What version are you using?

Have you run "VACUUM ANALYZE"?

Next, do:

EXPLAIN ANALYZE select something from table where condition1 or
condition2;

for each of the queries, unless that query takes so long you don't want
to wait for the result. In that case, omit the "ANALYZE" and just do
"EXPLAIN ...".

Then post those results to the list. These tell us what plans PostgreSQL
is choosing and what it estimates the costs to be. If it's the output of
EXPLAIN ANALYZE, it also runs the query and tells us what the costs
really are.

>From that, we can see where the planner is going wrong, and what you
might do to change it.

Regards,
Jeff Davis

Віталій Тимчишин

unread,

Nov 6, 2008, 5:46:47 AM11/6/08

to

My main message is that I can see this in many queries and many times. But OK, I can present exact example.

2008/11/5 Jeff Davis <pg...@j-davis.com>

On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote:
> For a long time already I can see very poor OR performance in
> postgres.
> If one have query like "select something from table where condition1
> or condition2" it may take ages to execute while
> "select something from table where condition1" and "select something
> from table where condition2" are executed very fast and
> "select something from table where condition1 and not condition2 union
> all select something from table where condition2" gives required
> results fast
>

What version are you using?

Server version 8.3.3

Have you run "VACUUM ANALYZE"?

I have autovacuum, but for this example I did vacuum analyze of the whole DB.
The real-life query (autogenerated) looks like the next:
select t0.id as pk1,t1.id as pk2 ,t0.run_id as f1_run_id,t1.run_id as f2_run_id
from tmpv_unproc_null_production_company_dup_cons_company as t0, (select * from production.company where run_id in (select id from production.run where name='test')) as t1
where
t0.name = t1.name
or
(t0.name,t1.name) in (select s1.name, s2.name from atom_match inner join atoms_string s1 on atom_match.atom1_id = s1.id inner join atoms_string s2 on atom_match.atom2_id = s2.id where s1.atom_type_id = -1 and match_function_id = 2)

with tmpv_unproc_null_production_company_dup_cons_company:

create temporary view tmpv_unproc_null_production_company_dup_cons_company as select * from production.company where 1=1 and status='unprocessed' and run_id in (select id from production.run where name='test')

Next, do:

EXPLAIN ANALYZE select something from table where condition1 or
condition2;

without analyze is in OR-plan.txt
Also plans for only condition1, only condition2 and union is attached

OR-plan.txt

condition1-plan.txt

condition2-plan.txt

union-plan.txt

Helio Campos Mello de Andrade

unread,

Nov 6, 2008, 7:26:20 AM11/6/08

to

For what i see in four OR-plan.txt tou are doing too much "sequencial scan" . Create some indexes for those tables using the fields that you use an it may help you.

OBS: If you already have lots of indexes in your tables it may be a good time for you re-think your strategy because it´s ot working.
Tips:
1 - create indexes for the tables with the fields that you will use in the query if it is your most important query. If you have others querys that are used please post those here and we can help you to desing a better plan.
2 - You cold give us the configuration os the hardware and the posgresql configuration file and we can see what is going on.

Regards

--
Sent via pgsql-performance mailing list (pgsql-pe...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

--
Helio Campos Mello de Andrade

Віталій Тимчишин

unread,

Nov 6, 2008, 11:33:27 AM11/6/08

to

2008/11/6 Helio Campos Mello de Andrade <helio....@gmail.com>

For what i see in four OR-plan.txt tou are doing too much "sequencial scan" . Create some indexes for those tables using the fields that you use an it may help you.

OBS: If you already have lots of indexes in your tables it may be a good time for you re-think your strategy because it´s ot working.
Tips:
1 - create indexes for the tables with the fields that you will use in the query if it is your most important query. If you have others querys that are used please post those here and we can help you to desing a better plan.

As you can see from other plans, it do have all the indexes to perform it's work fast (when given part by part). It simply do not wish to use them. My question: Is this a configuration problem or postgresql optimizer simply can't do such a query rewrite?

Actually I did rewrite the query to work properly as you can see from union-plan.txt. My question is if postgresql can do this automatically because such a rewrite is not always easy/possible (esp. for generated queries)?

Richard Huxton

unread,

Nov 6, 2008, 11:44:05 AM11/6/08

to

Віталій Тимчишин wrote:
> As you can see from other plans, it do have all the indexes to perform it's
> work fast (when given part by part). It simply do not wish to use them. My
> question: Is this a configuration problem or postgresql optimizer simply
> can't do such a query rewrite?

I must admit, I haven't managed to figure out what your query is trying
to do, but then that's a common problem with autogenerated queries.

The main question that needs answering is why the planner thinks you're
going to get 1.3 billion rows in the "or" query:

"Nested Loop (cost=4588.13..960900482668.95 rows=1386158171 width=32)"

You don't show "explain analyse" for this query, so there's no way of
knowing how many rows get returned but presumably you're expecting
around 88000. What does "explain analyse" return?

--
Richard Huxton
Archonet Ltd

Helio Campos Mello de Andrade

unread,

Nov 6, 2008, 11:44:18 AM11/6/08

to

As far as i know if you created the indexes properly and postgres sees that it will give some improvement he will use those.
- Look at the page of index creation that we may be forgeting some thing.

http://www.postgresql.org/docs/8.3/static/indexes.html

I have to go to the hospital know. Tomorrow i will take a look at the manual and try to understand all the necessary for the postgresql use an index.

Regards

--

Віталій Тимчишин

unread,

Nov 6, 2008, 12:37:46 PM11/6/08

to

2008/11/6 Richard Huxton <d...@archonet.com>

Віталій Тимчишин wrote:
> As you can see from other plans, it do have all the indexes to perform it's
> work fast (when given part by part). It simply do not wish to use them. My
> question: Is this a configuration problem or postgresql optimizer simply
> can't do such a query rewrite?

I must admit, I haven't managed to figure out what your query is trying
to do, but then that's a common problem with autogenerated queries.

That's easy - I am looking for duplicates from subset of companies. Two companies are equal when there names are simply equal or there is an entry in "match" table for names.

The main question that needs answering is why the planner thinks you're
going to get 1.3 billion rows in the "or" query:

"Nested Loop (cost=4588.13..960900482668.95 rows=1386158171 width=32)"

You don't show "explain analyse" for this query, so there's no way of
knowing how many rows get returned but presumably you're expecting
around 88000. What does "explain analyse" return?

Yes, the query should output exactly same result as in "Union" plan. I will run "slow" explain analyze now and will repost after it will complete (tomorrow?).
BTW: I'd say planner should think rows estimated as sum of "ORs" estimation minus intersection, but no more then sum or ORs (if intersection is 0). For first condition it has rows=525975, for second it has rows=2403 (with other plans, of course), so it's strange it has such a high estimation.... It's exactly 50% of full cartesian join of merge, so it does think that every second pair would succeed, that is not true.

Віталій Тимчишин

unread,

Nov 7, 2008, 4:14:01 AM11/7/08

to

Yes, the query should output exactly same result as in "Union" plan. I will run "slow" explain analyze now and will repost after it will complete (tomorrow?).
BTW: I'd say planner should think rows estimated as sum of "ORs" estimation minus intersection, but no more then sum or ORs (if intersection is 0). For first condition it has rows=525975, for second it has rows=2403 (with other plans, of course), so it's strange it has such a high estimation.... It's exactly 50% of full cartesian join of merge, so it does think that every second pair would succeed, that is not true.

I am sorry, I've emptied atom_match table, so one part produce 0 result, but anyway here is explain:

"Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual time=30292.802..755751.242 rows=34749 loops=1)"
" Merge Cond: (production.run.id = (production.company.run_id)::bigint)"
" Join Filter: (((production.company.name)::text = (production.company.name)::text) OR (hashed subplan))"
" -> Sort (cost=45474.92..45606.54 rows=52648 width=38) (actual time=562.928..595.128 rows=15507 loops=1)"
"        Sort Key: production.run.id"
"        Sort Method: external sort Disk: 880kB"
"        -> Nested Loop (cost=1184.82..39904.24 rows=52648 width=38) (actual time=90.571..530.925 rows=15507 loops=1)"
"              -> HashAggregate (cost=1.55..1.56 rows=1 width=8) (actual time=3.077..3.078 rows=1 loops=1)"
"                    -> Seq Scan on run (cost=0.00..1.55 rows=1 width=8) (actual time=3.066..3.068 rows=1 loops=1)"
"                          Filter: ((name)::text = 'test'::text)"
"              -> Nested Loop (cost=1183.27..39376.19 rows=52648 width=30) (actual time=87.489..484.605 rows=15507 loops=1)"
"                    -> HashAggregate (cost=1.55..1.56 rows=1 width=8) (actual time=0.016..0.019 rows=1 loops=1)"
"                          -> Seq Scan on run (cost=0.00..1.55 rows=1 width=8) (actual time=0.009..0.011 rows=1 loops=1)"
"                                Filter: ((name)::text = 'test'::text)"
"                    -> Bitmap Heap Scan on company (cost=1181.72..38592.03 rows=62608 width=30) (actual time=87.465..441.014 rows=15507 loops=1)"
"                          Recheck Cond: ((production.company.run_id)::bigint = production.run.id)"
"                          Filter: ((production.company.status)::text = 'unprocessed'::text)"
"                          -> Bitmap Index Scan on comp_run (cost=0.00..1166.07 rows=62608 width=0) (actual time=65.828..65.828 rows=15507 loops=1)"
"                                Index Cond: ((production.company.run_id)::bigint = production.run.id)"
" -> Materialize (cost=469981.13..498937.42 rows=2316503 width=30) (actual time=15915.639..391938.338 rows=242752539 loops=1)"
"        -> Sort (cost=469981.13..475772.39 rows=2316503 width=30) (actual time=15915.599..19920.912 rows=2316503 loops=1)"
"              Sort Key: production.company.run_id"
"              Sort Method: external merge Disk: 104896kB"
"              -> Seq Scan on company (cost=0.00..58808.03 rows=2316503 width=30) (actual time=22.244..7476.954 rows=2316503 loops=1)"
" SubPlan"
"    -> Nested Loop (cost=2267.65..3314.94 rows=22 width=1038) (actual time=0.009..0.009 rows=0 loops=1)"
"          -> Hash Join (cost=2267.65..3141.36 rows=22 width=523) (actual time=0.006..0.006 rows=0 loops=1)"
"                Hash Cond: ((atom_match.atom1_id)::integer = s1.id)"
"                -> Seq Scan on atom_match (cost=0.00..30.38 rows=1630 width=8) (actual time=0.002..0.002 rows=0 loops=1)"
"                      Filter: ((match_function_id)::integer = 2)"
"                -> Hash (cost=1292.04..1292.04 rows=12209 width=523) (never executed)"
"                      -> Index Scan using atomstr_typ on atoms_string s1 (cost=0.00..1292.04 rows=12209 width=523) (never executed)"
"                            Index Cond: ((atom_type_id)::integer = (-1))"
"          -> Index Scan using pk_atoms_string on atoms_string s2 (cost=0.00..7.88 rows=1 width=523) (never executed)"
"                Index Cond: (s2.id = (atom_match.atom2_id)::integer)"
"Total runtime: 755802.686 ms"

P.S. May be I've chosen wrong list and my Q better belongs to -hackers?

David Wilson

unread,

Nov 7, 2008, 5:07:32 AM11/7/08

to

On Fri, Nov 7, 2008 at 4:14 AM, Віталій Тимчишин <tiv...@gmail.com> wrote:
> "Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual
> time=30292.802..755751.242 rows=34749 loops=1)"

Have you tried increasing the default_statistics_target? The planner
is expecting 1.3 billion rows to be produced from a query that's only
actually producting 35k, which probably indicates some very bad
statistics. At the same time, the materialize step produces 242
million rows when the planner only expects to produce 2.3, indicating
a similar problem in the opposite direction. This probably means that
the planner is choosing plans that would be optimal if it was making
good guesses but are decidedly sub-optimal for your actual data.

--
- David T. Wilson
david.t...@gmail.com

Richard Huxton

unread,

Nov 7, 2008, 5:45:50 AM11/7/08

to

Віталій Тимчишин wrote:
> I am sorry, I've emptied atom_match table, so one part produce 0 result, but
> anyway here is explain:

David's right - the total estimate is horribly wrong

> "Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual
> time=30292.802..755751.242 rows=34749 loops=1)"

But it's this materialize that's taking the biggest piece of the time.

> " -> Materialize (cost=469981.13..498937.42 rows=2316503 width=30)
> (actual time=15915.639..391938.338 rows=242752539 loops=1)"

15.9 seconds to 391.9 seconds. That's half your time right there. The
fact that it's ending up with 242 million rows isn't promising - are you
sure the query is doing what you think it is?

> " -> Sort (cost=469981.13..475772.39 rows=2316503 width=30) (actual
> time=15915.599..19920.912 rows=2316503 loops=1)"
> " Sort Key: production.company.run_id"
> " Sort Method: external merge Disk: 104896kB"

By constrast, this on-disk sort of 104MB is comparatively fast.

> P.S. May be I've chosen wrong list and my Q better belongs to -hackers?

No - hackers is if you want to discuss the code of the database server
itself.

Віталій Тимчишин

unread,

Nov 15, 2008, 8:55:38 AM11/15/08

to

Sorry, for delayed response - It was very busy week.

2008/11/7 David Wilson <david.t...@gmail.com>

On Fri, Nov 7, 2008 at 4:14 AM, Віталій Тимчишин <tiv...@gmail.com> wrote:
> "Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual
> time=30292.802..755751.242 rows=34749 loops=1)"

Have you tried increasing the default_statistics_target? The planner
is expecting 1.3 billion rows to be produced from a query that's only
actually producting 35k, which probably indicates some very bad
statistics.

The planner seems to think that every second pair from company<->company join will succeed with this join expression (1386158171 ~ 52648^2 / 2). That is not true.
Anyway, I've tried to set default_statistics_target to 1000, then analyze. Nothing've changed

At the same time, the materialize step produces 242
million rows when the planner only expects to produce 2.3, indicating
a similar problem in the opposite direction. This probably means that
the planner is choosing plans that would be optimal if it was making
good guesses but are decidedly sub-optimal for your actual data.

That is even more strange, because materialize step must produce exactly the rows it takes from sort, that is 2316503, so I don't get how table scan + sort + materialize can multiply number of rows by 100.

Віталій Тимчишин

unread,

Nov 15, 2008, 8:57:03 AM11/15/08

to

2008/11/7 Richard Huxton <d...@archonet.com>

But it's this materialize that's taking the biggest piece of the time.

> " -> Materialize (cost=469981.13..498937.42 rows=2316503 width=30)
> (actual time=15915.639..391938.338 rows=242752539 loops=1)"

15.9 seconds to 391.9 seconds. That's half your time right there. The
fact that it's ending up with 242 million rows isn't promising - are you
sure the query is doing what you think it is?

I am not. I can't see how materialize can multiply number of rows it gets from sort by 100.

Tom Lane

unread,

Nov 15, 2008, 12:07:31 PM11/15/08

to

"=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?=" <tiv...@gmail.com> writes:
> I am not. I can't see how materialize can multiply number of rows it gets
> from sort by 100.

Is it the right-hand input of a merge join? If so you're looking at
mark/restore rescans, ie, repeated fetches of the same tuples. There
must be a huge number of duplicate join keys in that relation to make
for such an increase though. Normally the planner avoids putting a
table with lots of duplicates as the RHS of a merge, but if it doesn't
have good statistics for the join key then it might not realize the
problem.

regards, tom lane

Віталій Тимчишин

unread,

Nov 17, 2008, 4:52:21 AM11/17/08

to

2008/11/15 Tom Lane <t...@sss.pgh.pa.us>

"=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?=" <tiv...@gmail.com> writes:

> I am not. I can't see how materialize can multiply number of rows it gets
> from sort by 100.

Is it the right-hand input of a merge join? If so you're looking at
mark/restore rescans, ie, repeated fetches of the same tuples. There
must be a huge number of duplicate join keys in that relation to make
for such an increase though. Normally the planner avoids putting a
table with lots of duplicates as the RHS of a merge, but if it doesn't
have good statistics for the join key then it might not realize the
problem.

OK, thanks for cleaning-up some mystery.
But, returning to original Q: Do anyone known why does it choose plan from OR-plan.txt instead of union-plan.txt? The first is cost=4588.13..960900482668.95, the latter is cost=266348.42..272953.14 according to statistics postgres have, so I suppose planner would select it if it could evaluate it.