ORM, Oracle and UTF-8 encoding problem.

Showing 1-11 of 11 messages
ORM, Oracle and UTF-8 encoding problem. Jani Tiainen 1/8/13 7:34 AM
Hi,

I've been trying to save UTF-8 characters to oracle database without
success.

I've verified that database is indeed UTF-8 capable.

I can insert UTF-8 characters directly using cx_Oracle.

But when I use ORM it will trash characters.

Model I use:

class MyTest(models.Model):
     txt = CharField(max_length=128)


s = u'0 \u0442\u0435\u0441\u0442 test'

i = MyTest()
i.txt = s
i.save()

i2 = MyTest.objects.get(id=i.id)
print i2.txt

u'0 \xbf\xbf\xbf\xbf test'


So what happens here? It looks like Django trashes my unicode string at
some (unknown point).

Additional note:

If I use cursor() from Django connection object strings get broken also.
So it must be django Oracle backend doing something evil for me.

--
Jani Tiainen

- Well planned is half done and a half done has been sufficient before...
Re: ORM, Oracle and UTF-8 encoding problem. Anssi Kääriäinen 1/8/13 11:00 AM
I created the following test case into django's test suite modeltests/
basic/tests.py:
    def test_unicode(self):
        # Note: from __future__ import unicode_literals is in
effect...
        a = Article.objects.create(headline='0
\u0442\u0435\u0441\u0442 test', pub_date=datetime.n  ow())
        self.assertEqual(Article.objects.get(pk=a.pk).headline, '0
\u0442\u0435\u0441\u0442 test'   )

This does pass on Oracle when using Django's master branch, both with
Python 2.7 and 3.3.

Django's backend is doing all sorts of trickery behind the scenes to
get correct unicode handling. I am not sure where the problem is. What
Django version are you using?

 - Anssi
Re: ORM, Oracle and UTF-8 encoding problem. Jani Tiainen 1/8/13 10:56 PM
8.1.2013 21:00, akaariai kirjoitti:
> I created the following test case into django's test suite modeltests/
> basic/tests.py:
>      def test_unicode(self):
>          # Note: from __future__ import unicode_literals is in
> effect...
>          a = Article.objects.create(headline='0
> \u0442\u0435\u0441\u0442 test', pub_date=datetime.n  ow())
>          self.assertEqual(Article.objects.get(pk=a.pk).headline, '0
> \u0442\u0435\u0441\u0442 test'   )
>
> This does pass on Oracle when using Django's master branch, both with
> Python 2.7 and 3.3.
>
> Django's backend is doing all sorts of trickery behind the scenes to
> get correct unicode handling. I am not sure where the problem is. What
> Django version are you using?

Sorry about forgotting version info. I tested with 1.3.1 and 1.4.1 and
both gave same behaviour.

And I know that there is quite a lot of trickery going on. I'll try to
figure out what causes that problem.
Re: ORM, Oracle and UTF-8 encoding problem. Jani Tiainen 1/8/13 11:38 PM
Tested against latest master. Same behaviour.

In Oracle backend base.py is following piece of code:

# Check whether cx_Oracle was compiled with the WITH_UNICODE option.
This will
# also be True in Python 3.0.
if int(Database.version.split('.', 1)[0]) >= 5 and not hasattr(Database,
'UNICODE'):
     convert_unicode = force_text
else:
     convert_unicode = force_bytes

Which was added in <https://github.com/django/django/commit/dcf3be7a62>

Thing is that my cx_Oracle is version 5.1.2, it has cx_Oracle.UNICODE
definition.

And Django uses smart_str / force_bytes.

If I remove that and use convert_unicode as force_text / force_unicode
everything works as expected.
Re: ORM, Oracle and UTF-8 encoding problem. Jani Tiainen 1/9/13 12:53 AM
Ok, found source of the problem - but I don't know the solution.

I'm using Oracle client 10.2.0.3.0. It seems that unicode doesn't work
there.

I compiled cx_Oracle against 11g instantclient 11.2 and it worked just fine.

So it must be something that Django assumes with Oracle and unicode
capability.

I had cx_Oracle.UNICODE defined always which is checked in the code. I
don't really know why.


9.1.2013 8:56, Jani Tiainen kirjoitti:
Re: ORM, Oracle and UTF-8 encoding problem. Ian 1/9/13 2:28 AM
On Wednesday, January 9, 2013 12:38:28 AM UTC-7, Jani Tiainen wrote:
Tested against latest master. Same behaviour.

In Oracle backend base.py is following piece of code:

# Check whether cx_Oracle was compiled with the WITH_UNICODE option.
This will
# also be True in Python 3.0.
if int(Database.version.split('.', 1)[0]) >= 5 and not hasattr(Database,
'UNICODE'):
     convert_unicode = force_text
else:
     convert_unicode = force_bytes

Which was added in <https://github.com/django/django/commit/dcf3be7a62>

Thing is that my cx_Oracle is version 5.1.2, it has cx_Oracle.UNICODE
definition.

That sounds correct.  The cx_Oracle.UNICODE type constant is present when cx_Oracle is compiled *without* the WITH_UNICODE option (which no longer exists in 5.1 anyway).

 
And Django uses smart_str / force_bytes.

If I remove that and use convert_unicode as force_text / force_unicode
everything works as expected.

Strange, in 5.1 it shouldn't make any difference which is used, as long as your NLS_LANG is getting set properly in the backend.  What is your server setup?  It seems that sometimes that can get interfered with if you have other services using Oracle in the same process.  It shouldn't hurt anything though for us to do an additional check for cx_Oracle 5.1+ and always use force_text in that case.

Re: ORM, Oracle and UTF-8 encoding problem. Jani Tiainen 1/9/13 2:55 AM
Server is running Oracle Database 10g Release 10.2.0.5.0 - 64bit
Production. (EE edition)

and charset info:
NLS_CHARACTERSET        WE8ISO8859P1
NLS_NCHAR_CHARACTERSET        AL16UTF16

When cx_Oracle (Version 5.1.2) is compiled against 10.2.0.3 client:

I can insert unicode characters directly using cx_Oracle.
I can't insert unicode characters using ORM
I can't insert unicode characters using Django connection.cursor()

When cx_Oracle is compiled against instantclient 11.2 (multinational
version) I can do all of the above without the problems.
Re: ORM, Oracle and UTF-8 encoding problem. Ian Kelly 1/9/13 9:21 AM
On Wed, Jan 9, 2013 at 3:55 AM, Jani Tiainen <red...@gmail.com> wrote:
> Server is running Oracle Database 10g Release 10.2.0.5.0 - 64bit Production.
> (EE edition)
>
> and charset info:
> NLS_CHARACTERSET        WE8ISO8859P1
> NLS_NCHAR_CHARACTERSET  AL16UTF16

Sorry, I meant your web server setup.
Re: ORM, Oracle and UTF-8 encoding problem. Jani Tiainen 1/9/13 10:40 PM
Windows 7, development server.

Staging server Ubuntu something (propably 10.04 LTS) 64bit

And symptoms were consistent. For some reason Django does something bad
when it uses smart_str (and whatever that is in 1.5).

If we just force using force_unicode everything works except in older
versions of cx_Oracle (our server had 5.0.4 or something) connection
strings can't be unicode for some reason.
Re: ORM, Oracle and UTF-8 encoding problem. Ian Kelly 1/9/13 10:59 PM
On Wed, Jan 9, 2013 at 11:40 PM, Jani Tiainen <red...@gmail.com> wrote:
> If we just force using force_unicode everything works except in older
> versions of cx_Oracle (our server had 5.0.4 or something) connection strings
> can't be unicode for some reason.

Sure, that's why the check exists in the first place.  Prior to 5.1
cx_Oracle could be built either with Unicode or without.  If the
former, it would accept only unicode strings and would raise an
exception on byte strings.  If the latter, it would be exactly the
opposite.

Does it work for you using force_bytes with 5.0.4?
Re: ORM, Oracle and UTF-8 encoding problem. Jani Tiainen 1/10/13 12:14 AM
That's on my production server that runs 1.3.x version. smart_str (which
detection selects) does not work.

using force_unicode works (except for connection string).

Also depending on what OCI client 10.2.0.5 or instant client 11.2 is
used when compiling cx_Oracle causes variation. 10.2.0.5 doesn't work
with smart_str while 11.2 does work.

Both can take plain unicode (u'<some unicode stuff here>') when using
just cx_Oracle commands without any problems.

Note:

If I add manually some unicode to database Django can read it without
any problems.