the only difference is in the final statement in the where clause
Short Query
-----------------------
explain ANALYZE select SUM((st.stop_time - st.start_time))
from shift s
join shift_time st on s.shift_time_id = st.id
where s.employee_id = 38645
and st.start_time >= TIMESTAMP '2009-05-17 00:00:00'
and st.stop_time <= TIMESTAMP '2009-05-23 23:59:59';
QUERY
PLAN
Aggregate (cost=1604.31..1604.32 rows=1 width=16) (actual
time=33.274..33.277 rows=1 loops=1)
-> Hash Join (cost=192.28..1604.28 rows=8 width=16) (actual
time=33.266..33.266 rows=0 loops=1)
Hash Cond: (s.shift_time_id = st.id)
-> Seq Scan on shift s (cost=0.00..1411.76 rows=43 width=4)
(actual time=33.254..33.254 rows=0 loops=1)
Filter: (employee_id = 38645)
-> Hash (cost=177.66..177.66 rows=1170 width=20) (never
executed)
-> Seq Scan on shift_time st (cost=0.00..177.66
rows=1170 width=20) (never executed)
Filter: ((start_time >= '2009-05-17
00:00:00'::timestamp without time zone) AND (stop_time <= '2009-05-23
23:59:59'::timestamp without time zone))
Total runtime: 33.413 ms
(9 rows)
Long Query
-----------------------
explain ANALYZE select SUM((st.stop_time - st.start_time))
from shift s
join shift_time st on s.shift_time_id = st.id
where s.employee_id = 38645
and st.start_time >= TIMESTAMP '2009-05-17 00:00:00'
and st.start_time <= TIMESTAMP '2009-05-23 23:59:59';
QUERY
PLAN
Aggregate (cost=1589.96..1589.97 rows=1 width=16) (actual
time=1056.213..1056.215 rows=1 loops=1)
-> Nested Loop (cost=0.00..1589.95 rows=1 width=16) (actual
time=1056.199..1056.199 rows=0 loops=1)
Join Filter: (s.shift_time_id = st.id)
-> Seq Scan on shift_time st (cost=0.00..177.66 rows=1
width=20) (actual time=4.950..5.446 rows=85 loops=1)
Filter: ((start_time >= '2009-05-17
00:00:00'::timestamp without time zone) AND (start_time <= '2009-05-23
23:59:59'::timestamp without time zone))
-> Seq Scan on shift s (cost=0.00..1411.76 rows=43 width=4)
(actual time=12.354..12.354 rows=0 loops=85)
Filter: (s.employee_id = 38645)
Total runtime: 1056.499 ms
(8 rows)
hoping someone can explain why the latter is using a Nested Loop - and
how / if i can optimize it
thanks
It's hard to say without knowing the table definitions,
indexes and statistics, but one thing I notice is that the
planner estimates 1170 rows for the scan on table shift_time
in the first query and only 1 row in the second query.
These estimates probably lead to the decision which join
type to use.
The first query is lucky because there are no
results and the query can figure that out without actually
having to execute the join.
Are the table statistics current? Does an ANALYZE change
the estimates or execution plans?
As to the question how these queries can be optimized,
that is much easier:
CREATE INDEX shift_empid_idx ON shift(employee_id);
That would avoid the expensive sequential scan on the
table, and the NULL value should be returned
almost immediately :^)
Yours,
Laurenz Albe
thanks Laurenz that helped tremendously - updating the statistics had
no bearing on the query plan, but creating the index took the query
down to 1.512 ms
reading the query plan seems to be a powerful, majestic art
i saw the Seq Scan on shift's actual time was 12.354 - but didn't
realize this cost was being incurred for each loop
can anyone recommend a good 'how to read a postgres query plan'
reference
here are the results with the new index, thanks again Laurenz
Aggregate (cost=153.87..153.88 rows=1 width=16) (actual
time=1.389..1.391 rows=1 loops=1)
-> Nested Loop (cost=0.00..153.87 rows=1 width=16) (actual
time=1.381..1.381 rows=0 loops=1)
Join Filter: (s.shift_time_id = st.id)
-> Index Scan using speedy_index2 on shift_time st
(cost=0.00..8.27 rows=1 width=20) (actual time=0.047..0.348 rows=85
loops=1)
Index Cond: ((start_time >= '2009-05-17
00:00:00'::timestamp without time zone) AND (start_time <= '2009-05-23
23:59:59'::timestamp without time zone))
-> Index Scan using shift_empid_idx on shift s
(cost=0.00..145.04 rows=44 width=4) (actual time=0.008..0.008 rows=0
loops=85)
Index Cond: (s.employee_id = 38645)
Total runtime: 1.512 ms
(8 rows)
There, there. Don't exaggerate.
> i saw the Seq Scan on shift's actual time was 12.354 - but didn't
> realize this cost was being incurred for each loop
>
> can anyone recommend a good 'how to read a postgres query plan'
> reference
I found two things on the PostgreSQL Wiki:
http://wiki.postgresql.org/wiki/Introduction_to_VACUUM%2C_ANALYZE%2C_EXPLAIN%2C_and_COUNT
This explains some of the principles and walks you through some
interesting EXPLAINs.
http://wiki.postgresql.org/wiki/Using_EXPLAIN
This has links to the above and other documentation.
I find that it is not too hard to read an EXPLAIN output
once you have understood how a database works, what an index
is, what a table scan is, how a "nested loop join" or a
"hash join" or a "merge join" work.
These are things you can find in many places, Wikipedia
being a good starting point.
Yours,
Laurenz Albe