[Django] #24096: GROUP BY on oracle fails with TextField and BinaryField

38 views
Skip to first unread message

Django

unread,
Jan 7, 2015, 10:54:32 PM1/7/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
----------------------------------------------+--------------------
Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer (models, ORM) | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
When a GROUP BY is generated that includes a column of type LOB
(BLOB/NCLOB) such as TextField or BinaryField, then an error is thrown on
Oracle. See the test case below:

{{{

class Author(models.Model):
name = models.CharField(max_length=100)
bio = models.TextField(default='')
age = models.IntegerField()
friends = models.ManyToManyField('self', blank=True)


def test_annotate_with_textfield_in_values(self):
qs = Author.objects.annotate(min=Min('pk')).order_by('pk')
self.assertQuerysetEqual(
qs, [ 1, 1, 1, 1, 1, 1, 1 ],
transform=lambda a: a.min
)

django.db.utils.DatabaseError: ORA-00932: inconsistent datatypes: expected
- got NCLOB
}}}

Adding a `defer('bio')` does not work either, as it's still included in
the GROUP BY (but not the SELECT). The only way I could get this to work
is by providing a `values()`, and listing all fields except LOB type
columns.

This isn't a regression either - the same testcase fails on the 1.6
branch.

--
Ticket URL: <https://code.djangoproject.com/ticket/24096>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jan 7, 2015, 10:55:04 PM1/7/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by jarshwah):

* cc: josh.smeaton@… (added)
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:1>

Django

unread,
Jan 8, 2015, 12:32:54 AM1/8/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by charettes):

* stage: Unreviewed => Accepted


Comment:

Looks related to #20971 and #19259.

Do you know if Oracle also supports grouping by primary key only? This is
the suggested fix for #19259.

--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:2>

Django

unread,
Jan 8, 2015, 1:09:59 AM1/8/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by jarshwah):

Yes, this is the general case of #20971, so it is almost a duplicate. IMO
the linked ticket is a bit confusing. Not sure if we should close this as
a duplicate and add this information to the other or not.

In any case, Oracle does not support grouping by primary key. All fields
listed in the select list must also be present in the group by list.

There are a few options I can think of:

1. Document the limitation (must use values()) on Oracle
2. Remove LOB type columns from the select and group by list
3. Convert (with lossy behaviour) the LOBs to reasonable types in both
lists
4. Remove deferred columns from GROUP BY when the PK is available
5. A combo of 2 and 4 - automatically defer LOB type columns and remove
them from GROUP BY

Option 5 provides the most reasonable behaviour here I think, as it
shouldn't change results, and won't be lossy with regards to data. It
should also allow the same django code to run on all backends, at the
expense of an extra query on Oracle to fetch LOB results on demand.

--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:3>

Django

unread,
Jan 8, 2015, 2:29:11 AM1/8/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by shaib):

* cc: shaib (added)


Comment:

I've just closed #20971 as a duplicate of this one, because the issue is
presented more clearly here.

Quoting my comment from that ticket:

> This seems to be a correctness problem on Oracle, but may cause
performance issues elsewhere (other backends will group over deferred
fields, in particular text-fields, which may indeed bring back the
problems of #17144).

Of jarshwah's options, I think option 4 (removing deferred columns from
the group-by) is a no-brainer (even if the PK is not available). I find
option 3 unacceptable (except on MySql, perhaps... no, I'm kidding), and
option 1 vastly inferior.

With respect to removal, I don't like the idea of completely automatic
removal. Too much magic for my tastes. But -- having actually implemented
such automatic removal for a project I was involved with (in a custom
query-set class) -- I do see the need for it to be easy; and per my
comment above, I also think we should tell non-Oracle users about it.

So my suggestion is:
1. Add a database-feature `can_compare_LOBs`
2. When a query groups over a LOB -- if `can_compare_LOBs`, warn about
performance implications; else, error out (with a clear Django exception
rather than an obscure Oracle error)
3. Add a query-set method to defer all LOBs, and/or a query-set subclass
that does it automatically.

--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:4>

Django

unread,
Jan 8, 2015, 7:19:43 PM1/8/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by jarshwah):

The problem with 4 is that natural queries that run on most backends will
fail on Oracle:

{{{
Model.objects.annotate(Count('field')) # error on Oracle with
TextField, fine on others..
Model.objects.defer('text_field').annotate(Count('field')) # required
for Oracle, and suggested for others
}}}

While option 5 is somewhat magical, it's also the best way to have cross-
database querysets work without modification. Oracle users will always
need to defer LOB-like columns, and will never need to un-defer (since
it's an error). Raising a warning on other backends should be enough if
they haven't deferred. We could control this behaviour with a setting, but
that seems a little wasteful.

Another thing to remember is that any models added with select_related
will also require their LOB like columns to be deferred.

I'm not a fan of adding a new queryset method to defer a certain class of
column, and I'm not too partial about adding a queryset subclass either -
since it'd require special knowledge to recognise that it'd be needed, and
will be a source of bugs in libraries that define models with TextFields,
but do not test annotations on Oracle.

Another direction could be a meta option: Meta.always_defer = ('field',
'text_field'), with a recommendation to always defer large, rarely used
fields. This will have applications larger than this somewhat-narrow
usecase, and will be a lot less magical.

--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:5>

Django

unread,
Jan 8, 2015, 10:07:25 PM1/8/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by charettes):

> Another direction could be a meta option: Meta.always_defer = ('field',
'text_field'), with a recommendation to always defer large, rarely used
fields. This will have applications larger than this somewhat-narrow
usecase, and will be a lot less magical.

I know that Adrian [https://groups.google.com/d/msg/django-
developers/C3qoHEfeeUg/cwBNV5snQPMJ proposed something similar] on
@developpers.

--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:6>

Django

unread,
Jan 9, 2015, 5:38:44 AM1/9/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by akaariai):

I guess there is also option 6: rework the query to something like:
{{{
SELECT innerq.*, basetable.lob
FROM (SELECT <original_query_without_lob_fields>) innerq
INNER JOIN basetable on basetable.id = innerq.id
}}}
That is, run the original select query without LOB fields, then join the
LOB fields back. Needless to say this solution is likely complex to
implement. I think the solution would likely look something like this:

- Generate the select and group by of the query. If the select list has
any LOB fields, then remove them from the select and group by and memoize
what was removed.
- Generate the inner query
- Join back the memoized columns in an outer query
- You can't actually use SQL using innerq.* as mentioned above, the
SELECT list must be in the same order as it was for the original query so
that select_related works properly.

As this solution is likely too complex, I think I like the
Meta.always_defer approach instead. There should be some way to override
the defer though (.defer(None) would clear always_defers, too?)

--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:7>

Django

unread,
Jan 11, 2015, 8:50:10 AM1/11/15
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by shaib):

The meta option is indeed attractive. I would add a twist: Allow a field
to declare itself deferred (this would somewhat parallel), that is, a new
field option `deferred=False`.

Build into that some API which will let the LOB field find out that
they're on Oracle (or, more generally, a backend with
`can_compare_lobs=False`), and you also get the automatic-compatibility.

--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:8>

Django

unread,
Apr 7, 2016, 9:23:25 AM4/7/16
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by jaywink):

* cc: mail@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:9>

Django

unread,
Jun 29, 2016, 5:08:52 AM6/29/16
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------

Reporter: jarshwah | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by robin):

* cc: robinchew@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:10>

Django

unread,
Jun 25, 2021, 6:35:25 AM6/25/21
to django-...@googlegroups.com
#24096: GROUP BY on oracle fails with TextField and BinaryField
-------------------------------------+-------------------------------------
Reporter: Josh Smeaton | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution: duplicate

Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: new => closed
* resolution: => duplicate


Comment:

I agree with Shai and Anssi that a new meta option is the best way to
solve this ticket. Closing as a duplicate of #23816.

--
Ticket URL: <https://code.djangoproject.com/ticket/24096#comment:11>

Reply all
Reply to author
Forward
0 new messages