[Django] #26683: Oracle DBs: Add option to use VARCHAR2 data type

39 views
Skip to first unread message

Django

unread,
May 30, 2016, 11:49:16 AM5/30/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------
Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Keywords: Oracle
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------
CharFields are currently created in Oracle databases using the NVARCHAR2
data type. If possible, we would like to have an option to use the
VARCHAR2 data type instead. This is preferable when using a Unicode
database, to avoid any performance degradation caused by data conversions
when the database contains a mix of the two types.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 30, 2016, 11:57:47 AM5/30/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Changes (by timgraham):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

Did you consider writing a custom field? I don't think Django would ship
an Oracle specific field or add some option to `CharField` that's Oracle
specific. Do you have a proposal about how to offer this feature?

Actually what you might want is a custom Oracle database backend that
overrides the default
[https://github.com/django/django/blob/f1e408ff40d2c1753f92515c70a44634b4d47467/django/db/backends/oracle/base.py#L93-L121
DatabaseWrapper.data_types].

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:1>

Django

unread,
May 31, 2016, 6:27:34 AM5/31/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by elarandjelovic):

Thanks for the reply. We hadn't looked at custom fields, but were
planning on using a custom backend as you suggested. We'd just like to
minimise the amount of custom code wherever possible and thought other
people might also want this feature.

I was thinking an additional setting in settings.py might be appropriate?
There are already some Oracle specific settings in there.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:2>

Django

unread,
Jun 1, 2016, 10:18:45 AM6/1/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------
Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: closed
Component: Migrations | Version: 1.9
Severity: Normal | Resolution: wontfix

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Changes (by timgraham):

* status: new => closed
* cc: jtiai, jarshwah, shaib (added)
* resolution: => wontfix


Comment:

I'll add some known Oracle users to the CC to perhaps get some insight
about whether trying to make the use case easier is worth it. I have some
concerns that if this works at the database backend level, then we
couldn't support migrating an existing database from one to the other, but
maybe that's not important.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:3>

Django

unread,
Jun 1, 2016, 7:21:50 PM6/1/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------
Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: closed
Component: Migrations | Version: 1.9
Severity: Normal | Resolution: wontfix

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by shaib):

Personally I do not understand this:

> This is preferable when using a Unicode database, to avoid any
performance degradation caused by data conversions when the database
contains a mix of the two types

What two types? VARCHAR2 and NVARCHAR2?

Either way, the request is for a feature at the database (rather than
field) level; I'm not sure that is the right granularity. I'm also not
sure what should happen if the proposed setting is changed on a database
which already has some fields (of the "wrong" type).

Could you please elaborate on these points?

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:4>

Django

unread,
Jun 2, 2016, 7:17:25 AM6/2/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------
Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: closed
Component: Migrations | Version: 1.9
Severity: Normal | Resolution: wontfix

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by VackarAfzal):

The industry standard seems to to be VARCHAR2, with support for NVARCHAR2
really there for legacy purposes.
I've encountered issues where existing oracle code was all VARCHAR2, as
per our (and probably most) DBA's spec, but django fields were NVARCHAR2,
when joining data accross these schemas it would take 20 minutes to
complete, changing the field to VARCHAR2 resulted in the same query
executing in 200ms.

My suggestion would be to default to using VARCHAR2 for everything UNLESS
the database has not been setup with unicode support. Checking for this is
pretty straightforward:


{{{
SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET' ;
}}}


If you get this is a result, you're OK


{{{
AL32UTF8
}}}

If not then fallback to NVARCHAR2 if unicode support is required.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:5>

Django

unread,
Jun 2, 2016, 7:25:29 AM6/2/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Changes (by VackarAfzal):

* status: closed => new
* resolution: wontfix =>


--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:6>

Django

unread,
Jun 2, 2016, 8:12:51 AM6/2/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by jarshwah):

Just about every database I've worked with uses VARCHAR rather than
NVARCHAR. I wasn't aware of performance issues between the two though.
Vackar, were the joins you're talking about joins between NVAR and VAR
fields, or just two tables that contained those fields. I'm fairly sure
you mean joining NCHAR to CHAR, but just want to be clear.

I guess the original decision to support NVARCHAR was due to the
possibility/frequency of oracle databases at the time not using utf8 (or
similar) encoding by default, so the NVARCHAR was a way to make unicode
work in all circumstances.

Recommendations mostly follow this from Ask Tom
(https://asktom.oracle.com/pls/apex/f%3Fp%3D100:11:::NO:RP:P11_QUESTION_ID:9462837200346048883)

> NVARCHAR2 is a unicode-only data type. It's useful if you want to have
unicode for some columns (those that are NVARCHAR2) but have the rest of
the database use a different characterset. In all other cases stick with
using VARCHAR2.

I think this is probably worth fixing provided a palatable solution can be
found. Simply querying for the character set is probably not enough. You
need to know this information every time you run a migration (or cache it
somewhere). If the character set is changed between migrations, now you
have different character types within the same application which is
arguably worse. I'm unsure if django needs to know what the type of
varchar column is at runtime or not, but if it does, that'd incur an extra
query at connection creation time.

Also, looking into the code a bit it looks like the
startswith/istartswith/contains based operators all expect NCHAR, and do
the appropriate conversions:

{{{
_standard_operators = {
'contains': "LIKE TRANSLATE(%s USING NCHAR_CS) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'icontains': "LIKE UPPER(TRANSLATE(%s USING NCHAR_CS)) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'startswith': "LIKE TRANSLATE(%s USING NCHAR_CS) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'endswith': "LIKE TRANSLATE(%s USING NCHAR_CS) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'istartswith': "LIKE UPPER(TRANSLATE(%s USING NCHAR_CS)) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'iendswith': "LIKE UPPER(TRANSLATE(%s USING NCHAR_CS)) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
}
}}}

I would think most of that TRANSLATE/USING would be unnecessary if we were
simply using varchar to begin with. There's probably other optimisations
we can make elsewhere if varchar was used. But these operators probably
prevent you from creating a useful model field yourself - I haven't tried
to see what happens if you do LIKE comparisons by translating char using
nchar.

Migrations between the two are possible, but it requires creating a new
field, migrating the data across, and then dropping the existing field.
For big tables that's going to be very expensive.

It seems to me that this is an all or nothing proposition. Since the
character encoding is database wide (not per user/schema), and there are
many very old installations of oracle around, we can't guarantee that an
existing django deployment will be able to continue on a version using
varchar if they aren't able to change the global character set.

I think this is going to require a new backend since there is no easy way
to dynamically switch behaviours. A new backend would be able to sweep out
a tonne of old work arounds that are probably not useful anymore. It'd
also be able to break backwards compatibility, because I don't see the
django organisation maintaining two oracle backends when maintaining the
current one is hard enough due to lack of oracle maintainers.

Sorry for the wall of text, but I kept jumping down new rabbit holes. As I
said earlier though, if a palatable solution can be found, I think it's
definitely worth looking into.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:7>

Django

unread,
Jun 2, 2016, 11:02:48 AM6/2/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by VackarAfzal):

With regards to the slow queries it was VARCHAR2 to NVARCHAR2 translation
that was the killer, one table had around 100 million rows, and was
executing as a full table scan, so understandably took a while.

I agree with and understand all of your points. It's a complex situation
and there doesn't appear to be a simple solution.
My recommendation would be to create a new backend with the desired
behaviour, and mark the current one as deprecated in version x.x.x

Then in version x.x.x+1, switch to having the new backend as the default
and drop support for the older one. Migration can be done at the Python
level as you suggested, or alternatively there is this tool from Oracle
that any dba (or even dev) can use to convert the DB to unicode. Literally
just need to click a few buttons then wait a while for it to convert.
{{{
http://www.oracle.com/technetwork/database/database-
technologies/globalization/dmu/overview/index.html
}}}

There will be some initial pain for sites using the legacy approach, but I
think in the long run this will benefit the Django project, and in turn
the community.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:8>

Django

unread,
Jun 2, 2016, 2:14:03 PM6/2/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by timgraham):

If we move forward with this, I prefer the approach of a temporary setting
to opt-in to the new behavior (which eventually becomes mandatory) rather
than creating a backend with a new name (unless there are compelling
arguments against this approach).

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:9>

Django

unread,
Jun 3, 2016, 3:51:43 AM6/3/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by jtiai):

Oracle database encoding is slightly problematic case since once you have
set it it's quite impossible to change it without reinstalling whole
server.

At least we and our customers do have servers that run with WE8ISO8859P1
encoding, and very probably will have them for next few years at least.
From my perspective I would like to have that datatype switch to be
permanent so I could use that per customer basis.

It's not that we could use tools like DMU. It's just that some of our
customers do still have non-unicode aware applications that requires
VARCHAR2 to be something else but unicode.

I could do Proof of Concept with a switch and to see if it wouldn't be too
complex to achieve.

Sidenote: There is also in use NCLOB which supposed to be CLOB as well?

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:10>

Django

unread,
Jun 3, 2016, 4:02:25 AM6/3/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by VackarAfzal):

Replying to [comment:9 timgraham]:


> If we move forward with this, I prefer the approach of a temporary
setting to opt-in to the new behavior (which eventually becomes mandatory)
rather than creating a backend with a new name (unless there are
compelling arguments against this approach).

Thinking about it, this actually would be more sensible.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:11>

Django

unread,
Jun 3, 2016, 4:07:01 AM6/3/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by VackarAfzal):

Replying to [comment:10 jtiai]:


> Oracle database encoding is slightly problematic case since once you
have set it it's quite impossible to change it without reinstalling whole
server.

I can only speak for Oracle12c, but you can change the charecterset
without having to re-install the server, not sure what the situation is
with older versions.


>
> At least we and our customers do have servers that run with WE8ISO8859P1
encoding, and very probably will have them for next few years at least.
From my perspective I would like to have that datatype switch to be
permanent so I could use that per customer basis.
>
> It's not that we could use tools like DMU. It's just that some of our
customers do still have non-unicode aware applications that requires
VARCHAR2 to be something else but unicode.
>
> I could do Proof of Concept with a switch and to see if it wouldn't be
too complex to achieve.
>
> Sidenote: There is also in use NCLOB which supposed to be CLOB as well?

I've never had a use case that required NCLOB, but I'm sure there will be
others who do. Suppose it makes sense for the same reason NVARCHAR2 was
used.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:12>

Django

unread,
Jun 3, 2016, 5:28:56 AM6/3/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by jtiai):

Replying to [comment:12 VackarAfzal]:


> Replying to [comment:10 jtiai]:
> > Oracle database encoding is slightly problematic case since once you
have set it it's quite impossible to change it without reinstalling whole
server.
>
> I can only speak for Oracle12c, but you can change the charecterset
without having to re-install the server, not sure what the situation is
with older versions.

Well we do have Oracle 11g mostly, as well as our customers.

> >
> > At least we and our customers do have servers that run with
WE8ISO8859P1 encoding, and very probably will have them for next few years
at least. From my perspective I would like to have that datatype switch to
be permanent so I could use that per customer basis.

This is our main problem for unseen future. There exists, mainly windows
applications that doesn't handle unicode and customers do use them still.

> > I could do Proof of Concept with a switch and to see if it wouldn't be
too complex to achieve.
> >
> > Sidenote: There is also in use NCLOB which supposed to be CLOB as
well?
> I've never had a use case that required NCLOB, but I'm sure there will
be others who do. Suppose it makes sense for the same reason NVARCHAR2 was
used.

Django uses NCLOB as database datatype for models.TextFields.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:13>

Django

unread,
Jun 3, 2016, 12:59:33 PM6/3/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:

Keywords: Oracle | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------

Comment (by shaib):

A few points:

1) Oracle 11.2 "extended support" is until 2020. We may decide to drop our
support of 11.2 earlier, but that wouldn't be nice to our users.

2) Varchar vs. NVarchar performance issues are well documented; such an
issue was [https://sourceforge.net/p/cx-oracle/mailman/message/35134630/
raised] only a few hours ago on the cx-oracle users group.

3) Actually, if anyone has the time to put into it, a compatibility-
breaking new Oracle backend could have a lot of benefits -- two important
problems that are hard to fix in a backward-compatible manner are the
performance issues around numbers (we translate many numbers to strings in
order to figure out the right Python type) and the case issues around
database object (tables etc) names.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:14>

Django

unread,
Jun 6, 2016, 8:25:30 AM6/6/16
to django-...@googlegroups.com
#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+-----------------------------------------

Reporter: elarandjelovic | Owner: nobody
Type: New feature | Status: new
Component: Migrations | Version: 1.9
Severity: Normal | Resolution:
Keywords: Oracle | Triage Stage: Someday/Maybe

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
--------------------------------+-----------------------------------------
Changes (by timgraham):

* stage: Unreviewed => Someday/Maybe


Comment:

If there are advantages to dropping Oracle 11.2 support, we might consider
making Django 1.11 LTS (supported until at least April 2020) the last
version to support it.

I'll bump this ticket to "Someday/Maybe" to get it off the unreviewed
queue. If we have a specific plan for how to move it forward, we can move
it to "Accepted".

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:15>

Reply all
Reply to author
Forward
0 new messages