Feature Request (have sample implemenation): ALL subquery

48 views
Skip to first unread message

David Sanders

unread,
Aug 24, 2022, 9:53:46 AM8/24/22
to Django developers (Contributions to Django itself)
Hi folks!

I have a piece of low hanging fruit I'd like to suggest and contribute: ALL subqueries.

Here's the PG docs on ALL subqueries but basically it causes the subquery expression to evaluate to true only if all rows themselves evaluate to true when compared against the supplied value & comparison operator.

The syntax is:

<value> <operator> ALL (<subquery>)

I've played around with this idea in my "Stupid Django Tricks" repo and have taken the liberty of also creating a small patch for Django.

There are a couple of points to note:

1. Postgres, MySQL & SQL Server all support ALL subquerying but unfortunately SQLite does not. A cursory search about Oracle suggests it may support it but couldn't see anything in the official docs.

2. An equivalent expression is to use NOT EXISTS and move the comparison logic into the subquery & invert it (but leaving any correlating logic uninverted).

A couple of further points on the NOT EXISTS approach:
  1. It uses double negatives and hinders the readability.
  2. It's not always obvious what the correct way to invert the subquery is – I found I had to be careful with filters involving many relationship lookups.

That being said I found that getting ALL working in Django only took a few lines. From my initial experimentation I saw 3 possible ways of doing this:

1. Lookups: A lookup could be defined for each operator supported.

eg using Exact:

class All(Exact):
    lookup_name = "all"
    def get_rhs_op(self, connection, rhs):
        return connection.operators[super().lookup_name] % f"ALL {rhs}"

Parent.objects.annotate(test=Value("test")).filter(test__all=Subquery(subquery))

2. Subclass Subquery

This is my favoured approach as it resembles Python's all() builtin. It uses almost identical code to Exists except for one line that defines the exists expression. The idea is that subqueries supply a single boolean column created with annotate:

class All(Subquery):
    template = "'t' = ALL (%(subquery)s)"
    output_field = fields.BooleanField()
    # similar methods to Exists omitted

Parent.objects.filter(
    All(
        Child.objects.annotate(is_result=Q(<expression evaluating to bool>)).filter(something=OuterRef('something').values('is_result')
    )
)

2. Translate the expression to NOT EXISTS.

I'm not entirely sure what the best approach is for this, or even whether it's possible, but it would need to negate the subquery filtering while leaving the correlating filter (ie the OuterRef) alone.

My naive attempt here works for the simplest of cases but doesn't take into account anything more complex.


I found this work well for my codebase and I'd love to explore contributing it back to Django. I'd suggest the subquery subclass as it results in more readable code – but there is the question about lack of support for SQLite. One approach would be to note in the documentation on how to write the equivalent using NOT EXISTS.

What do folks think?

Cheers,
David



charettes

unread,
Aug 24, 2022, 10:41:43 AM8/24/22
to Django developers (Contributions to Django itself)
Hello David,

Do you know if ALL provides any performance benefits over NOT EXISTS? Given we already support

Employee.objects.filter(
    ~Exists(Employee.objects.filter(
         manager=OuterRef("pk"),
         join_date__gt=OuterRef("join_date")
     )
)

If it's not the case I'm not sure adding support for ALL to core is worth it, particularly because we need to emulate it on SQLite.

If we were to add it I think All(lookup, subquery) would make the most sense as it still looks a bit like how you'd do things In Python and it's relatively easy to translate to NOT EXISTS as you basically take the lookup and the selected value and turn it into ~Exists(subquery.exclude(Lookup(OuterRef(field, value))).

Employee.objects.filter(
    All(
       "join_date__lte", Employee.objects.filter(
           manager=OuterRef("pk")
       ).values("join_date")
    )
)

Generally I'm +0 on the idea though unless we can prove there are use cases for it that Exists can't solve.

Cheers,
Simon

David Sanders

unread,
Aug 24, 2022, 12:26:27 PM8/24/22
to Django developers (Contributions to Django itself)
I'm no expert on how PG optimises queries but NOT EXISTS appears to produce the desired execution plan.

My original rationale was to improve the readability and "solution accessibility" for lack of a better term. This all came about because a colleague couldn't figure out how to do it as a Django query so resorted to app-level rigmarole & multiple queries (our use case involved multiple many-to-many's). I showed them the ~Exists() approach then the All() approach and they picked All() because of the natural readability 🤷‍♂️.

This is just something I quickly whipped up (ignore the execution time when I ran them through pgbench they were roughly the same on average):

# update pgbench_accounts set abalance = 10 where bid in (4, 7);
UPDATE 200000

# explain analyze SELECT * FROM pgbench_branches b WHERE 't' = ALL (SELECT abalance > 0 FROM pgbench_accounts a WHERE b.bid = a.bid );
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on pgbench_branches b  (cost=0.00..163366.10 rows=5 width=364) (actual time=486.614..602.467 rows=2 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 8
   SubPlan 1
     ->  Seq Scan on pgbench_accounts a  (cost=0.00..32423.00 rows=100000 width=1) (actual time=46.929..58.739 rows=20001 loops=10)
           Filter: (b.bid = bid)
           Rows Removed by Filter: 430258
 Planning Time: 0.360 ms
 Execution Time: 602.516 ms
(9 rows)

# explain analyze SELECT * FROM pgbench_branches b WHERE NOT EXISTS (SELECT 1 FROM pgbench_accounts a WHERE b.bid = a.bid and abalance = 0);
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=0.00..32183.84 rows=1 width=364) (actual time=753.089..951.244 rows=2 loops=1)
   Join Filter: (b.bid = a.bid)
   Rows Removed by Join Filter: 3702576
   ->  Seq Scan on pgbench_branches b  (cost=0.00..1.10 rows=10 width=364) (actual time=0.008..0.013 rows=10 loops=1)
   ->  Seq Scan on pgbench_accounts a  (cost=0.00..32173.00 rows=799867 width=4) (actual time=0.002..68.581 rows=370258 loops=10)
         Filter: (abalance = 0)
         Rows Removed by Filter: 80000
 Planning Time: 0.206 ms
 Execution Time: 951.263 ms
(9 rows)

Reply all
Reply to author
Forward
0 new messages