If you would provide a concrete example rather than handwaving, we might
be able to offer some advice ...
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-pe...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
What version are you using?
Have you run "VACUUM ANALYZE"?
Next, do:
EXPLAIN ANALYZE select something from table where condition1 or
condition2;
for each of the queries, unless that query takes so long you don't want
to wait for the result. In that case, omit the "ANALYZE" and just do
"EXPLAIN ...".
Then post those results to the list. These tell us what plans PostgreSQL
is choosing and what it estimates the costs to be. If it's the output of
EXPLAIN ANALYZE, it also runs the query and tells us what the costs
really are.
>From that, we can see where the planner is going wrong, and what you
might do to change it.
Regards,
Jeff Davis
On Wed, 2008-11-05 at 13:12 +0200, Віталій Тимчишин wrote:What version are you using?
> For a long time already I can see very poor OR performance in
> postgres.
> If one have query like "select something from table where condition1
> or condition2" it may take ages to execute while
> "select something from table where condition1" and "select something
> from table where condition2" are executed very fast and
> "select something from table where condition1 and not condition2 union
> all select something from table where condition2" gives required
> results fast
>
Have you run "VACUUM ANALYZE"?
Next, do:
EXPLAIN ANALYZE select something from table where condition1 or
condition2;
--
Sent via pgsql-performance mailing list (pgsql-pe...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
For what i see in four OR-plan.txt tou are doing too much "sequencial scan" . Create some indexes for those tables using the fields that you use an it may help you.
OBS: If you already have lots of indexes in your tables it may be a good time for you re-think your strategy because it´s ot working.
Tips:
1 - create indexes for the tables with the fields that you will use in the query if it is your most important query. If you have others querys that are used please post those here and we can help you to desing a better plan.
I must admit, I haven't managed to figure out what your query is trying
to do, but then that's a common problem with autogenerated queries.
The main question that needs answering is why the planner thinks you're
going to get 1.3 billion rows in the "or" query:
"Nested Loop (cost=4588.13..960900482668.95 rows=1386158171 width=32)"
You don't show "explain analyse" for this query, so there's no way of
knowing how many rows get returned but presumably you're expecting
around 88000. What does "explain analyse" return?
--
Richard Huxton
Archonet Ltd
Віталій Тимчишин wrote:I must admit, I haven't managed to figure out what your query is trying
> As you can see from other plans, it do have all the indexes to perform it's
> work fast (when given part by part). It simply do not wish to use them. My
> question: Is this a configuration problem or postgresql optimizer simply
> can't do such a query rewrite?
to do, but then that's a common problem with autogenerated queries.
The main question that needs answering is why the planner thinks you're
going to get 1.3 billion rows in the "or" query:
"Nested Loop (cost=4588.13..960900482668.95 rows=1386158171 width=32)"
You don't show "explain analyse" for this query, so there's no way of
knowing how many rows get returned but presumably you're expecting
around 88000. What does "explain analyse" return?
Yes, the query should output exactly same result as in "Union" plan. I will run "slow" explain analyze now and will repost after it will complete (tomorrow?).
BTW: I'd say planner should think rows estimated as sum of "ORs" estimation minus intersection, but no more then sum or ORs (if intersection is 0). For first condition it has rows=525975, for second it has rows=2403 (with other plans, of course), so it's strange it has such a high estimation.... It's exactly 50% of full cartesian join of merge, so it does think that every second pair would succeed, that is not true.
Have you tried increasing the default_statistics_target? The planner
is expecting 1.3 billion rows to be produced from a query that's only
actually producting 35k, which probably indicates some very bad
statistics. At the same time, the materialize step produces 242
million rows when the planner only expects to produce 2.3, indicating
a similar problem in the opposite direction. This probably means that
the planner is choosing plans that would be optimal if it was making
good guesses but are decidedly sub-optimal for your actual data.
--
- David T. Wilson
david.t...@gmail.com
David's right - the total estimate is horribly wrong
> "Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual
> time=30292.802..755751.242 rows=34749 loops=1)"
But it's this materialize that's taking the biggest piece of the time.
> " -> Materialize (cost=469981.13..498937.42 rows=2316503 width=30)
> (actual time=15915.639..391938.338 rows=242752539 loops=1)"
15.9 seconds to 391.9 seconds. That's half your time right there. The
fact that it's ending up with 242 million rows isn't promising - are you
sure the query is doing what you think it is?
> " -> Sort (cost=469981.13..475772.39 rows=2316503 width=30) (actual
> time=15915.599..19920.912 rows=2316503 loops=1)"
> " Sort Key: production.company.run_id"
> " Sort Method: external merge Disk: 104896kB"
By constrast, this on-disk sort of 104MB is comparatively fast.
> P.S. May be I've chosen wrong list and my Q better belongs to -hackers?
No - hackers is if you want to discuss the code of the database server
itself.
On Fri, Nov 7, 2008 at 4:14 AM, Віталій Тимчишин <tiv...@gmail.com> wrote:Have you tried increasing the default_statistics_target? The planner
> "Merge Join (cost=518771.07..62884559.80 rows=1386158171 width=32) (actual
> time=30292.802..755751.242 rows=34749 loops=1)"
is expecting 1.3 billion rows to be produced from a query that's only
actually producting 35k, which probably indicates some very bad
statistics.
At the same time, the materialize step produces 242
million rows when the planner only expects to produce 2.3, indicating
a similar problem in the opposite direction. This probably means that
the planner is choosing plans that would be optimal if it was making
good guesses but are decidedly sub-optimal for your actual data.
But it's this materialize that's taking the biggest piece of the time.
15.9 seconds to 391.9 seconds. That's half your time right there. The
> " -> Materialize (cost=469981.13..498937.42 rows=2316503 width=30)
> (actual time=15915.639..391938.338 rows=242752539 loops=1)"
fact that it's ending up with 242 million rows isn't promising - are you
sure the query is doing what you think it is?
Is it the right-hand input of a merge join? If so you're looking at
mark/restore rescans, ie, repeated fetches of the same tuples. There
must be a huge number of duplicate join keys in that relation to make
for such an increase though. Normally the planner avoids putting a
table with lots of duplicates as the RHS of a merge, but if it doesn't
have good statistics for the join key then it might not realize the
problem.
regards, tom lane
"=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?=" <tiv...@gmail.com> writes:
> I am not. I can't see how materialize can multiply number of rows it getsIs it the right-hand input of a merge join? If so you're looking at
> from sort by 100.
mark/restore rescans, ie, repeated fetches of the same tuples. There
must be a huge number of duplicate join keys in that relation to make
for such an increase though. Normally the planner avoids putting a
table with lots of duplicates as the RHS of a merge, but if it doesn't
have good statistics for the join key then it might not realize the
problem.