UnicodeDecode error on sqlalchemy select query

1,013 views
Skip to first unread message

Abhishek Sharma

unread,
Jul 29, 2015, 2:23:33 PM7/29/15
to sqlalchemy
We are using sqlalchemy version 0.7, python 2.7 and oracle Database.
We have ASCII as default python encoding and DB have ISO-8052 encoding. Our DB tables contains some of characters which are out of ASCII range. So when we are running query on those tables we are getting Unicode Decode error saying "ASCII" codec can not decode. This error we are getting without accessing model attributes.

How i can handle these errors without changing python default encoding.

Mike Bayer

unread,
Jul 29, 2015, 5:25:29 PM7/29/15
to sqlal...@googlegroups.com
Oracle's client encoding is controlled by the NLS_LANG environment variable.    That has to be set correctly first off (see http://www.oracle.com/technetwork/products/globalization/nls-lang-099431.html).  If you have non-ASCII strings encoded in datatypes that are explicitly not of type NVARCHAR or NCLOB , or you're relying on a lot of raw SQL, and you are still getting errors, I would set the "coerce_to_unicode=True" flag on create_engine(), which allows cx_Oracle's unicode facilities to take place fully for all string data being returned, at the expense of some performance.  See http://docs.sqlalchemy.org/en/rel_1_0/dialects/oracle.html#unicode for background.



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Abhishek Sharma

unread,
Aug 1, 2015, 12:12:14 PM8/1/15
to sqlal...@googlegroups.com
Thanks for help. But still i have confusion over encoding and decoding procedure which will take place before retrieving and storing the results in DB.

In case if i am not using convert_unicode option and data type is String so python process will give str object to sqlalchemy at the time of insert record in DB using ORM. So will alchemy store that object in encoded form?. So at the time of retrieving ORM will give str object for String type column to python and python decode that object with default encoding?

Can i simply use Unicode Data type for columns where there might be chance of using non ascii data?

--
You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/eThubIMnL4o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.

Mike Bayer

unread,
Aug 3, 2015, 9:26:57 AM8/3/15
to sqlal...@googlegroups.com


On 8/1/15 12:12 PM, Abhishek Sharma wrote:
Thanks for help. But still i have confusion over encoding and decoding procedure which will take place before retrieving and storing the results in DB.

In case if i am not using convert_unicode option and data type is String so python process will give str object to sqlalchemy at the time of insert record in DB using ORM. So will alchemy store that object in encoded form?. So at the time of retrieving ORM will give str object for String type column to python and python decode that object with default encoding?

Can i simply use Unicode Data type for columns where there might be chance of using non ascii data?

if you know that your unicode data is on specific columns then yes, the Unicode type plugs in an encoder/decoder for those backends that require it.

Abhishek Sharma

unread,
Aug 3, 2015, 1:04:43 PM8/3/15
to sqlal...@googlegroups.com
what about CLOB type?  Unicode only handles String type. Do i need to use convert_unicode there?

Mike Bayer

unread,
Aug 3, 2015, 1:27:40 PM8/3/15
to sqlal...@googlegroups.com


On 8/3/15 1:04 PM, Abhishek Sharma wrote:
what about CLOB type?  Unicode only handles String type. Do i need to use convert_unicode there?

if your CLOB expects non-ascii characters then yes. 

though on Oracle I thought you really need to be using NCLOB for a col that stores unicode.

Abhishek Sharma

unread,
Aug 4, 2015, 5:47:56 AM8/4/15
to sqlal...@googlegroups.com
applying convert_unicode to CLOB type does not have any effect. Still I am getting str type object from sqlalchemy for CLOB type column

Mike Bayer

unread,
Aug 4, 2015, 6:15:15 AM8/4/15
to sqlal...@googlegroups.com


On 8/4/15 5:47 AM, Abhishek Sharma wrote:
applying convert_unicode to CLOB type does not have any effect. Still I am getting str type object from sqlalchemy for CLOB type column

have you tried the coerce_to_unicode flag on create_engine() ?

Abhishek Sharma

unread,
Aug 4, 2015, 7:41:20 AM8/4/15
to sqlal...@googlegroups.com
in case lot of overhead will be there so it is better to use that column label only

Mike Bayer

unread,
Aug 4, 2015, 11:30:05 AM8/4/15
to sqlal...@googlegroups.com


On 8/4/15 7:41 AM, Abhishek Sharma wrote:
in case lot of overhead will be there so it is better to use that column label only
well it doesn't work anyway because data from a CLOB is not in cx_oracle's world a "String", it's a LOB.   The CLOB / NCLOB types for cx_oracle are organized in their own way where only NCLOB actually has unicode handling capability, regardless of the coerce_to_unicode or convert_unicode flags; CLOB does not.   So either use NCLOB, or build out your own convert unicode, here is a demo:

from sqlalchemy import create_engine, CLOB, Table, MetaData, Column, select, TypeDecorator
from sqlalchemy.dialects.oracle import NCLOB

e = create_engine("oracle+cx_oracle://scott:tiger@xe", echo='debug')

class ForceUnicodeClob(TypeDecorator):
    impl = CLOB
    def process_result_value(self, value, dialect):
        if value is not None:
            value = value.decode('utf-8')
        return value

m = MetaData()
t = Table('test', m, Column('data1', NCLOB()), Column('data2', ForceUnicodeClob()))

m.drop_all(e)
m.create_all(e)

e.execute(t.insert(), data1=u'unicode', data2=u'unicode')

result = e.execute(select([t.c.data1, t.c.data2]))
value1, value2 = result.fetchone()
print repr(value1), repr(value2)

Mike Bayer

unread,
Aug 4, 2015, 11:36:43 AM8/4/15
to sqlal...@googlegroups.com


On 8/4/15 11:29 AM, Mike Bayer wrote:


On 8/4/15 7:41 AM, Abhishek Sharma wrote:
in case lot of overhead will be there so it is better to use that column label only
well it doesn't work anyway because data from a CLOB is not in cx_oracle's world a "String", it's a LOB.   The CLOB / NCLOB types for cx_oracle are organized in their own way where only NCLOB actually has unicode handling capability, regardless of the coerce_to_unicode or convert_unicode flags; CLOB does not.   So either use NCLOB, or build out your own convert unicode, here is a demo:

https://bitbucket.org/zzzeek/sqlalchemy/issues/3507/revisit-cx_oracle-unicode-handling is added to eventually revisit cx_oracle's situation here and get the string flags to work as documented for these special types.

Abhishek Sharma

unread,
Aug 4, 2015, 1:41:18 PM8/4/15
to sqlal...@googlegroups.com
is this followings two instructions compulsory  while defining new type?
m.drop_all(e)
m.create_all(e)

this instructions are not feasible , because DB team already defined schema and normal user can not drop and create table.

Mike Bayer

unread,
Aug 4, 2015, 3:49:55 PM8/4/15
to sqlal...@googlegroups.com


On 8/4/15 1:41 PM, Abhishek Sharma wrote:
is this followings two instructions compulsory  while defining new type?
m.drop_all(e)
m.create_all(e)

no that is just part of the demonstration script.




this instructions are not feasible , because DB team already defined schema and normal user can not drop and create table.
yup, that is a given

Abhishek Sharma

unread,
Aug 5, 2015, 9:24:38 AM8/5/15
to sqlal...@googlegroups.com
Hi Team,
I have created customized data type using TypeDecorator approach.

from sqlalchemy import TypeDecorator, CLOB
class ForceUnicodeClob(TypeDecorator):
     impl = CLOB
     def process_bind_param(self, value, dialect):
         if isinstance(value, str):
            value = value.decode('utf-8', 'ignore')
         return value
     def process_result_value(self, value, dialect):
         if value is not None:
             value = "PREFIX" + value.decode('utf-8', 'ignore')
         return value


After this in my table definition I declared the type of one of column as ForceUnicodeClob.
dlr_dclmr = Table('dlr_dclmr', metadata,
Column('dclmr_ds', ForceUnicodeClob(), primary_key=False))

After this I am executing the query on this table with session.query(Model) , but when I am accessing result.dclmr_ds i am not getting response prepended with "PREFIX".

am i missing some  steps?

SQLAlchemy Version: 0.9.9
We are using classic mapping for attributes mapping with DB columns.



Mike Bayer

unread,
Aug 5, 2015, 10:11:28 AM8/5/15
to sqlal...@googlegroups.com


On 8/5/15 9:24 AM, Abhishek Sharma wrote:
Hi Team,
I have created customized data type using TypeDecorator approach.

from sqlalchemy import TypeDecorator, CLOB
class ForceUnicodeClob(TypeDecorator):
     impl = CLOB
     def process_bind_param(self, value, dialect):
         if isinstance(value, str):
            value = value.decode('utf-8', 'ignore')
         return value
     def process_result_value(self, value, dialect):
         if value is not None:
             value = "PREFIX" + value.decode('utf-8', 'ignore')
         return value


After this in my table definition I declared the type of one of column as ForceUnicodeClob.
dlr_dclmr = Table('dlr_dclmr', metadata,
Column('dclmr_ds', ForceUnicodeClob(), primary_key=False))

After this I am executing the query on this table with session.query(Model) , but when I am accessing result.dclmr_ds i am not getting response prepended with "PREFIX".
this can happen in the ORM if you retrieve a Model object that is already present in the Session, which you created as a pending object:


model = Model(dclmr_ds='somestring')
session.add(model)
session.flush()

m2 = session.query(Model).filter(...).first()

assert model is m2


that is, you get the same object back.  The values of the attributes are not updated unless you first expire that object, which normally happens when you say session.commit().

Abhishek Sharma

unread,
Aug 5, 2015, 11:45:24 AM8/5/15
to sqlal...@googlegroups.com
that object already got saved in DB after session.commit(). After that i am retrieving object which already stored in db. still same issue.

Mike Bayer

unread,
Aug 5, 2015, 1:07:26 PM8/5/15
to sqlal...@googlegroups.com
Attached is a test script illustrating the code you have below along with a round trip verification using a simple Model class mapped to the table.  Please confirm this script works as expected, as it does here.  Assuming that works, determine what's different about your real-world environment versus this test case.
test.py

Abhishek Sharma

unread,
Aug 5, 2015, 2:13:44 PM8/5/15
to sqlal...@googlegroups.com
thanks for your help.
its seems to be working. i will troubleshoot in my development environment.

Abhishek Sharma

unread,
Aug 11, 2015, 1:58:19 PM8/11/15
to sqlal...@googlegroups.com

Hi Team,
                With asynchronous request my model object Unicode type column not returning Unicode object but if I do same action using synchronous I am getting Unicode object

Mike Bayer

unread,
Aug 11, 2015, 2:22:52 PM8/11/15
to sqlal...@googlegroups.com


On 8/11/15 1:58 PM, Abhishek Sharma wrote:

Hi Team,


                With asynchronous request my model object Unicode type column not returning Unicode object but if I do same action using synchronous I am getting Unicode object

SQLAlchemy has no asynchrnous API itself so this has to do with the asynchronous framework you're using and how you are integrating it with SQLAlchemy.

Abhishek Sharma

unread,
Aug 12, 2015, 9:55:23 AM8/12/15
to sqlal...@googlegroups.com
Hi Team,
               We are not calling all or first method on got query object. We are just passing query object to set method of python and we are getting Unicode Error "ASCII CODEC Can not decode" this means I am not getting Unicode Object from SQLAlchemy side.

Column('dlrprod_name_pri', Unicode(length=100), primary_key=False),

Do I need to call all or first method on query object to get Unicode object rather than str object.

Thanks
Abhishek Sharma

Mike Bayer

unread,
Aug 12, 2015, 10:07:24 AM8/12/15
to sqlal...@googlegroups.com


On 8/12/15 9:55 AM, Abhishek Sharma wrote:
Hi Team,
               We are not calling all or first method on got query object. We are just passing query object to set method of python and we are getting Unicode Error "ASCII CODEC Can not decode" this means I am not getting Unicode Object from SQLAlchemy side.

Column('dlrprod_name_pri', Unicode(length=100), primary_key=False),
Do I need to call all or first method on query object to get Unicode object rather than str object.

the unicode conversion requires that you are explicitly referring to your model, or the Table or Column metadata, or are using Text given a datatype.  a plain SQL string will NOT decode unless you add additional typing information up front

query.from_statement("select * from table")    #  <-- will NOT decode


query.from_statement(text("select * from table"))    #  <-- will NOT decode


query.from_statement(text("select * from table").columns(dlrprod_name_pri=Unicode)    #  <-- WILL decode

query(MyModel)   # <-- WILL decode

query(MyModel.dlrprod_name_pri, ...)  # <-- WILL decode

Abhishek Sharma

unread,
Aug 12, 2015, 11:07:23 AM8/12/15
to sqlal...@googlegroups.com
I am using session.query(Model).filter(conditions) but still getting UnicodeErrors

Mike Bayer

unread,
Aug 12, 2015, 11:19:18 AM8/12/15
to sqlal...@googlegroups.com


On 8/12/15 11:07 AM, Abhishek Sharma wrote:
I am using session.query(Model).filter(conditions) but still getting UnicodeErrors

unfortunately this all goes back to the test script I shared with you; that script works.  What's different in your application that distinguishes the system of querying from that of the test script?

Mike Bayer

unread,
Aug 12, 2015, 12:12:53 PM8/12/15
to sqlal...@googlegroups.com


On 8/12/15 11:19 AM, Mike Bayer wrote:


On 8/12/15 11:07 AM, Abhishek Sharma wrote:
I am using session.query(Model).filter(conditions) but still getting UnicodeErrors

unfortunately this all goes back to the test script I shared with you; that script works.  What's different in your application that distinguishes the system of querying from that of the test script?

I see we're talking about Unicode now, not CLOB.   Attached is a test script illustrating unicode conversion for all three of CLOB, VARCHAR2, NVARCHAR2.  Work with this script to find where your program isn't configured the same.


from sqlalchemy import create_engine, CLOB, Table, MetaData, \
    Column, TypeDecorator, Integer, Sequence, Unicode
from sqlalchemy.orm import Session, mapper



class ForceUnicodeClob(TypeDecorator):
    impl = CLOB

    def process_bind_param(self, value, dialect):
        if isinstance(value, str):
            value = value.decode('utf-8', 'ignore')
        return value

    def process_result_value(self, value, dialect):
        if value is not None:
            value = value.decode('utf-8', 'ignore')
        return value


e = create_engine("oracle+cx_oracle://scott:tiger@xe", echo='debug')

try:
    e.execute("drop table test_unicode")
except:
    pass
e.execute("""
create table test_unicode(
    id integer primary key,
    clob_data CLOB,
    varchar_data VARCHAR2(50),
    nvarchar_data NVARCHAR2(50)
)
""")


from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()
class Model(Base):
    __table__ = Table(
        "test_unicode", Base.metadata,
        Column('id', Integer, primary_key=True),
        Column('clob_data', ForceUnicodeClob()),
        Column('varchar_data', Unicode()),
        Column('nvarchar_data', Unicode()),
    )


s = Session(e)
m1 = Model(id=1, clob_data='data', varchar_data='data', nvarchar_data='data')
s.add(m1)
s.commit()

m1 = s.query(Model).first()
print repr(m1.clob_data)
print repr(m1.varchar_data)
print repr(m1.nvarchar_data)

assert isinstance(m1.clob_data, unicode)
assert isinstance(m1.varchar_data, unicode)
assert isinstance(m1.nvarchar_data, unicode)
test.py

Abhishek Sharma

unread,
Aug 17, 2015, 10:13:02 AM8/17/15
to sqlal...@googlegroups.com
Hi Mike,
              Sorry for repetitive same questions.
After adding TypeDecorator I am getting Unicode Object as expected.

Lets say I have one model:
Products=Table("products",metadata,
                          Column("product_id",Numeric,primary_key=True),
                          Column("product_name",Unicode(),nullable=False))

I have one for Table in which above specified model product_id act as foreign key.
prod_survey = Table('prod_survey', metadata,
                Column(
                        u'survey_prod_id',
                        NUMERIC(
                            precision=18,
                            scale=0,
                            asdecimal=False),
                        primary_key=True,
                        nullable=False),
                    Column(
                        u'survey_id',
                        NUMERIC(
                            precision=9,
                            scale=0,
                            asdecimal=False),

                        primary_key=False),

                  

               Column(
                        u'prod_id',
                        NUMERIC(
                            precision=9,
                            scale=0,
                            asdecimal=False),
                        primary_key=False,

                        nullable=False),

ForeignKeyConstraint(
                        ['prod_id'],
                        ['md.dlr_cstm_prod.prod_id'],

                        name=u'FK503_PROD_SURVEY'),


Now When I am assigning Products from Product table to survey table I am getting ascii encoding can not encode character.

It means python is getting Unicode object but trying to encode before sending to DB param.

Please suggest                    

Mike Bayer

unread,
Aug 17, 2015, 11:29:24 AM8/17/15
to sqlal...@googlegroups.com
these are Numeric columns so unicode encoding should not be involved.

It would be helpful if you could produce isolated demonstrations of these issues, using the format given by the test scripts provided.  Without seeing code that actually causes the problem as well as stack traces, those of us reading this thread can only guess as to the various issues occurring within your environment.

Abhishek Sharma

unread,
Aug 18, 2015, 6:31:29 AM8/18/15
to sqlal...@googlegroups.com
Hi Mike,
              I am attaching the Code and Model Info in which I am facing issue.
Please let me know if you need more info from my side
Survey_Assignment.txt

Mike Bayer

unread,
Aug 18, 2015, 11:16:02 AM8/18/15
to sqlal...@googlegroups.com


On 8/18/15 6:31 AM, Abhishek Sharma wrote:
Hi Mike,
              I am attaching the Code and Model Info in which I am facing issue.
Please let me know if you need more info from my side
well this is still not the level of test script that I can do anything with, it's only a fragment of the actual program, so at least if you can turn on echo=True it will show the SQL statement and the parameters being emitted, that will at least illustrate what column or element of the SQL string itself has non-ascii characters in it that match the exception.

Abhishek Sharma

unread,
Aug 25, 2015, 1:44:08 PM8/25/15
to sqlal...@googlegroups.com

Hi Team,
               We are executing select query using self.session.query(Model).filter(filter_conditions).first()

Then we are storing the about query result in result variable.
Then we are trying to update one of model attribute like
result.description=value
Even though we have not added that object in session to update the DB and there is no commit but sqlalchemy emitting update query on DB.

...

Simon King

unread,
Aug 26, 2015, 4:13:58 AM8/26/15
to sqlal...@googlegroups.com
On Tue, Aug 25, 2015 at 6:43 PM, Abhishek Sharma <abhisha...@gmail.com> wrote:

Hi Team,
               We are executing select query using self.session.query(Model).filter(filter_conditions).first()

Then we are storing the about query result in result variable.
Then we are trying to update one of model attribute like
result.description=value
Even though we have not added that object in session to update the DB and there is no commit but sqlalchemy emitting update query on DB.



Objects loaded via session.query(...) are already attached to a session; session.add(...) is for *new* objects.

If you modify an object attached to a session, SQLAlchemy will consider it "dirty", and will UPDATE the database when the session is next flushed.

By default, SQLAlchemy will autoflush the session whenever you use it to query the database, to synchronise the application state with the database. If it didn't do this, the query could potentially return incorrect results. (Imagine you change an attribute on your object from 1 to 2, then you query for all objects where the attribute equals 1. If SA didn't flush the session first, your modified object would match the query, even though in the application it has changed.)

There's some information about that in the docs at http://docs.sqlalchemy.org/en/rel_1_0/orm/session_basics.html#flushing

Hope that helps,

Simon
Reply all
Reply to author
Forward
0 new messages