Both queries should return the same rows, but the biggest difference is that PostgreSQL's query planner produces completely different plans. The performance difference can be huge, even for a relatively modest data set. (I believe my data set has something like 270k "parent" instances and 80k "child" instances.)
I tried searching this group as well as the ticket system, and couldn't find a solution to this exact problem (other than using the alternative query).
Thanks for taking the time to read through all of this! I am more than happy to open a ticket, but I thought I should post here first.
More Technical Stuff
Here's the output of EXPLAIN ANALYZE on my actual models/data. I had to apply a LIMIT 100 because if I try to return the entire result, the second one completely hangs my computer. I bolded some relevant bits
Limit (cost=0.71..9.70 rows=100 width=111) (actual time=0.074..0.483 rows=100 loops=1) -> Merge Anti Join (cost=0.71..22435.69 rows=249613 width=111) (actual time=0.072..0.465 rows=100 loops=1) Merge Cond: (catalogue_product.id = catalogue_productcategory.product_id)" -> Index Scan using catalogue_product_pkey on catalogue_product (cost=0.42..16986.70 rows=292339 width=111) (actual time=0.020..0.285 rows=150 loops=1) -> Index Only Scan using catalogue_productcategory_9bea82de on catalogue_productcategory (cost=0.29..3861.27 rows=81671 width=4) (actual time=0.037..0.070 rows=101 loops=1) Heap Fetches: 0 Total runtime: 0.568 ms
Here's the plan for the second query: Limit (cost=0.00..234229.95 rows=100 width=111) (actual time=31.087..1165.022 rows=100 loops=1) -> Seq Scan on catalogue_product (cost=0.00..342373919.34 rows=146170 width=111) (actual time=31.087..1164.992 rows=100 loops=1) Filter: (NOT (SubPlan 1)) Rows Removed by Filter: 5 SubPlan 1 -> Materialize (cost=0.00..2138.07 rows=81671 width=4) (actual time=0.001..5.539 rows=80818 loops=105) -> Seq Scan on catalogue_productcategory u0 (cost=0.00..1409.71 rows=81671 width=4) (actual time=0.005..10.574 rows=81671 loops=1) Total runtime: 1165.255 ms
As you can see the former is about ~2000 times faster than the latter.
John P.
unread,
Feb 3, 2016, 11:09:24 AM2/3/16
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
I don't think this answers my specific question, because mine is more about trying to explicitly invoke the WHERE EXISTS clause instead of automatically replacing IN with WHERE EXISTS, but I thought I should put them here for anyone interested.
Thanks.
John
John P.
unread,
Feb 10, 2016, 11:02:18 AM2/10/16
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message