"Unify by values" setting in Oracle's base.py

214 views
Skip to first unread message

NPB

unread,
Apr 3, 2021, 6:30:28 PM4/3/21
to Django developers (Contributions to Django itself)
Hi,

Can you tell me why execute in .../backends/oracle/base.py sets unify_by_values=True when it calls _fix_for_params? It has an interesting effect on the Oracle cursor cache.

For example, if I use a Django model called Logger like this:

from . import models
...
   a = models.Logger(t1="1", t2="2", t3="3", i1=1, i2=2, i3=3)
   a.save()
   b = models.Logger(t3="3", t2="2", t1="1", i3=3, i2=2, i1=1)
   b.save()
   c = models.Logger(t1="1", t2="2", t3="3", i1=1,i2=1,i3=3)
   c.save()
   d = models.Logger(t1="1", t2="2", t3="3", i1=3,i2=1,i3=3)
   d.save()
   e = models.Logger(t1="1", t2="1", t3="1", i1=1,i2=1,i3=1)
   e.save()

It results in the following SQL statements in the Oracle cursor cache:

INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES (:arg0, :arg1, :arg2, :arg3, :arg4, :arg3) RETURNING "POLLS_LOGGER"."ID" INTO :arg5
INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES (:arg0, :arg0, :arg0, :arg1, :arg1, :arg1) RETURNING "POLLS_LOGGER"."ID" INTO :arg2
INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES (:arg0, :arg1, :arg2, :arg3, :arg4, :arg5) RETURNING "POLLS_LOGGER"."ID" INTO :arg6
INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES (:arg0, :arg1, :arg2, :arg3, :arg3, :arg4) RETURNING "POLLS_LOGGER"."ID" INTO :arg5

Bind variable names are re-used and shuffled around if any share the same value. This results in multiple SQL IDs.

If, instead, execute used  unify_by_values=False, then the Oracle buffer cache would have a single version of the statement irrespective of bind values:

INSERT INTO "POLLS_LOGGER" ("T1", "T2", "T3", "I1", "I2", "I3") VALUES (:arg0, :arg1, :arg2, :arg3, :arg4, :arg5) RETURNING "POLLS_LOGGER"."ID" INTO :arg6

On the face of it, the use of unify_by_values=False is more efficient (at least from the database's perspective) because there is only one hard parse. I guess it doesn't look like a big deal in this example, but I have seen cases where the cursor cache has (literally) tens of thousands of repeated cursor cache entries where one would have been used if bind names were kept consistent and independent of bind value.

Do you think there is a valid case for this default behavior to be changed?

Regards,
Nigel

charettes

unread,
Apr 5, 2021, 10:45:31 AM4/5/21
to Django developers (Contributions to Django itself)
Hello Nigel,

Through git blame for unify_by_values I figured it was introduced[0] to deal with an issue during aggregation[1].

Mariusz might be able to provide more context here as I don't have much knowledge around Oracle cursor cache but it seems you'll have to find another way to address the aggregation issue[1] if you'd like to change this behaviour.

Cheers,
Simon

NPB

unread,
Apr 12, 2021, 7:15:25 AM4/12/21
to Django developers (Contributions to Django itself)
Hi Simon,

Thanks for tracking that down.  I see - an interesting problem that lacks an obvious solution. Nevertheless, there is a generic way to deal with this and it will optimize OK in Oracle (at least in the vast majority of cases). 

select x, sum(sal) from (select :arg0 * deptno as x, sal from emp) group by x;

There would be a need to generate suitably obscure column aliases. So - not "x"! Also, more than one alias per SQL statement might be needed too of course.

I don't think this should be used for all cases of aggregation. IMHO it would be best to target it based on how the Django model is used. I wonder if it would be possible to catch cases where a bind variable would normally be included in both the SELECT list and the GROUP BY.

Regards,
Nigel

charettes

unread,
Apr 12, 2021, 10:47:49 AM4/12/21
to Django developers (Contributions to Django itself)
Hello Nigel,

> I wonder if it would be possible to catch cases where a bind variable would normally be included in both the SELECT list and the GROUP BY.

It should be possible at the compiler level which is overriddable by backend (e.g. here's how it's done for a different purpose on MySQL backend [0]).

Thinking more about it while writing this response I do wonder if changing BaseExpression.get_group_by_cols to return Ref(alias) when an alias is specified could make things easier[1].

diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index bdd23617ac..8dbfaa13fe 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -348,6 +348,8 @@ def copy(self):

     def get_group_by_cols(self, alias=None):
         if not self.contains_aggregate:
+            if alias:
+                return [Ref(alias, self)]
             return [self]
         cols = []
         for source in self.get_source_expressions():

Taking the ticket's query as an example

Book.objects.annotate(
    discount_price=F('price') * 0.75
).values(
    'discount_price'
).annotate(sum_discount=Sum('price'))

It should change its generated SQL from

SELECT (book.price * :arg0) discount_price, SUM(book.price) sum_discount
FROM book
GROUP BY (book.price * :arg1)

to

SELECT (book.price * :arg0) discount_price, SUM(book.price) sum_discount
FROM book
GROUP BY discount_price

Which circumvents the reported issue.

I'm not sure Oracle supports or require a subquery pushdown as you demonstrated in your response. If it does require adjustment though they'll be way easier to make if a proper list of references need to be pushed down is already provided by the expression layer.

I cannot commit to writing a full patch myself but hopefully this gives you enough pointers to give it a stab yourself.

FWIW the queries, aggregation, and aggregation_regress parts of the suite all pass with the proposed patch on both SQLite and PostgreSQL. I don't have a local Oracle setup anymore but I spun up a PR to give our CI a run against it[2].

Cheers,
Simon

Mariusz Felisiak

unread,
Apr 13, 2021, 1:31:51 AM4/13/21
to Django developers (Contributions to Django itself)
Hi Nigel,

    Creating a subquery only looks like a good solution for a limited number of cases. As far as I'm aware it will also change an execution plan and can cause a performance regression in complicated queries. We also cannot group by column aliases on Oracle as proposed in PR14251. You can try to tune cursor sharing but I don't have any advice here. I would be happy to review a patch in this area, so feel-free to prepare an optimization.

Best,
Mariusz

NPB

unread,
Apr 13, 2021, 8:34:30 AM4/13/21
to Django developers (Contributions to Django itself)
Hi Mariusz and Simon,

Yes, agreed, it is true that Oracle doesn't support column aliases in GROUP BY. It is possible to use "GROUP BY <column-ordinal>", but only if a specific Oracle session parameter setting is used. IMO it would be undesirable to make this setting mandatory for Django, though.

Oracle definitely deals well with the example I gave, and the optimizer is very good at avoiding performance regressions when it is used (provided that optimizer transformations such as simple view merging are not disabled). It is, after all, an easy transformation to spot and transformations are applied even in very complex cases of course. IMHO, the effect on query performance is unlikely to be a significant concern.  

Tuning Oracle cursor sharing is something I've considered, but there is no solution in cases where bind variable names are used and are jumbled. Each version of the SQL statement will be hard parsed (rather than soft parsed) and a new cursor will be created. Sadly, the performance implications of this are almost always underestimated; probably because they don't become apparent until the application feels some heat. I don't want to over-state the issue too much though - when I say "some heat" I really mean "significant heat", but as a long-term Oracle user it is a no-brainer to empower cursor sharing to do its part in making sure that the database runs smoothly all the way up to machine/VM limits.

Regards,
Nigel.

NPB

unread,
Apr 13, 2021, 9:23:15 AM4/13/21
to Django developers (Contributions to Django itself)
P.S.

I think I will have a stab at creating a solution. I'll run it through the queries, aggregation, and aggregation_regress tests.

Thanks,
Nigel

NPB

unread,
Apr 14, 2021, 11:19:54 AM4/14/21
to Django developers (Contributions to Django itself)
Hi -

Here is something I tried out:


I am no Python expert, and having never done this before I'm not sure if it would be worthy of a pull request. Let me know.

I added a test to compiler.py  to detect GROUP BYs with a combined expression. This test is a little less specific than I'd like because it would be nice to target combined expressions that will include a named parameter placeholder. I don't know how to implement this, though. Nevertheless, it looks like the execute_with_val_placeholder_in_gb method added to handle GROUP BYs with expressions (in Oracle's base.py) is called only once (via aggregation_regress). 

No unit test issues were seen for Oracle or SQLLite (I ran all enabled tests).

The implementation idea avoids having to wrap the GROUP BY query in an outer query block and makes use of the existing unify by values solution.

Regards,
Nigel

NPB

unread,
Apr 27, 2021, 1:56:34 PM4/27/21
to Django developers (Contributions to Django itself)
Hi there -

Should I create a ticket?

Thanks and regards,
Nigel.

NPB

unread,
Aug 4, 2021, 10:16:53 AM8/4/21
to Django developers (Contributions to Django itself)
Hi there - 

I think this question/idea is still relevant.  I made these changes:


What's the right thing to do next? I don't expect this to be anywhere near the best solution, but I think it's an idea that's worth some consideration. Should I - for example - create a pull request? 

Regards,
Nigel

Reply all
Reply to author
Forward
0 new messages