mssql: UnicodeDecodeError: 'utf8' codec can't decode byte

784 views
Skip to first unread message

Tim Pierson

unread,
Mar 4, 2016, 2:51:15 PM3/4/16
to sqlalchemy
I've seen a couple of issues regarding mssql and drivers handling of strings but I haven't been able to resolve my problem.  

I have a flask app with flask-sqlalchemy models defined and a simple query is throwing the above complaint about decoding the string. 

My connection string:
mssql+pyodbc://[...]/[...]?trusted_connection=yes&charset=utf8&deprecate_large_types=True A 

Which is given to the SQLAlchemy class instance db after the initialization of the app with:

db.init_app(app)
The offending model:

class Annotations(db.Model):
id = db.Column(db.Integer, primary_key=True)
creation_date = db.Column(db.DateTime, default=datetime.datetime.now)
created_by = db.Column(db.String, nullable=False)
annotation = db.Column(sqlalchemy.Unicode, nullable=False)
annotation_type_id = db.Column(db.Integer, db.ForeignKey('AnnotationTypes.id'))


The query:

db.session.query(Annotations.annotation).all()

I've replaced the original column type db.String with sqlalchemy.Unicode on the field that throws the error as per an earlier question on this list but it hasn't resolved the problem.  

Could someone offer any pointers?

Thanks,

Tim Pierson

unread,
Mar 4, 2016, 6:14:26 PM3/4/16
to sqlalchemy
Update: I fixed the symptom by changing the column datatypes back to db.String and casting when a UnicodeDecodError is raised. 

db.session.query(cast(Annotations.annotation, sa.Unicode)

However, my feeling is that this is not a great situation as it looks like I have two different string encodings (?) in the same table-field.  This is because I was running into an issue where long strings passed as query parameters to _.in() or `==` filters were raising ProgrammingError exceptions about varchar(max) and text types not being comparable.  To work around, I trapped the exceptions and casted the parameters as db.String in the filter

.filter(cast(AnnotatedData.annotated_string, sa.String) == cast(annotation_instance, sa.String)).all()

and inserted the cast'd strings (which shouldn't have been a problem since the column types were defined as db.String in the first place).

I wonder if this situation is recognizable and if anyone has any advice on dealing with the situation?

Thanks

Mike Bayer

unread,
Mar 6, 2016, 6:59:44 PM3/6/16
to sqlal...@googlegroups.com
Can you share a stack trace please ?   Encoding operations can occur in many places and I don't see that identified here.   Also this is Python 3?  What OS platform and ODBC driver / configuration as well?  If freetds please share your freetds config too.
--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Tim Pierson

unread,
Mar 9, 2016, 11:13:01 AM3/9/16
to sqlalchemy
Python 2.7, on windows using pyodbc and whatever the default driver is.  Not sure if the problem exists on FreeTDS.

Here's an example of the ProgrammingError that is raised when I don't cast the filter() query parameter text:

File "C:\Python27\lib\site-packages\sqlalchemy\orm\query.py", line 2320, in all
    return list(self)
  File "C:\Python27\lib\site-packages\sqlalchemy\orm\query.py", line 2438, in __iter__
    return self._execute_and_instances(context)
  File "C:\Python27\lib\site-packages\sqlalchemy\orm\query.py", line 2453, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "C:\Python27\lib\site-packages\sqlalchemy\engine\base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "C:\Python27\lib\site-packages\sqlalchemy\sql\elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "C:\Python27\lib\site-packages\sqlalchemy\engine\base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "C:\Python27\lib\site-packages\sqlalchemy\engine\base.py", line 958, in _execute_context
    context)
  File "C:\Python27\lib\site-packages\sqlalchemy\engine\base.py", line 1159, in _handle_dbapi_exception
    exc_info
  File "C:\Python27\lib\site-packages\sqlalchemy\util\compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "C:\Python27\lib\site-packages\sqlalchemy\engine\base.py", line 951, in _execute_context
    context)
  File "C:\Python27\lib\site-packages\sqlalchemy\engine\default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
ProgrammingError: (ProgrammingError) ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The data types varchar(max) and ntext are incompatible in the equal to operator. (402) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)') [....]



And here's the relevant stack trace for the UnicodeDecodeError that thows on a similar (but different) query with a text parameter passed to filter.  I believe the problem is when the database string comes back and fails to be decoded:

[Script ... ]  line 450, in get
    AnnotationTypes.tag_or_inline == self.tag_or_inline).all()
  File "C:\Python27\lib\site-packages\sqlalchemy\orm\query.py", line 2320, in all
    return list(self)
  File "C:\Python27\lib\site-packages\sqlalchemy\orm\loading.py", line 76, in instances
    labels) for row in fetch]
  File "C:\Python27\lib\site-packages\sqlalchemy\orm\query.py", line 3523, in proc
    return row[column]
UnicodeDecodeError: 'utf8' codec can't decode byte 0x93 in position 506: invalid start byte 

Mike Bayer

unread,
Mar 9, 2016, 2:49:35 PM3/9/16
to sqlal...@googlegroups.com
OK what you can try here that might resolve all the issues is to only
pass encoded utf8 to the app (and also receive it on the way back); you
can do that by setting convert_unicode=True at the create_engine() level
where will take effect for all String columns. Technically if you're
storing non-ascii data you'd be using an NVARCHAR column so you wouldn't
get that comparison error. For the result rows coming back, again the
VARCHAR is not giving pyodbc the clue it needs to know that it should
decode, so SQLA's convert_unicode will get you that also.
>> send an email to sqlalchemy+...@googlegroups.com <javascript:>.
>> To post to this group, send email to sqlal...@googlegroups.com
>> <javascript:>.
>> <https://groups.google.com/group/sqlalchemy>.
>> For more options, visit https://groups.google.com/d/optout
>> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.

Tim Pierson

unread,
Mar 9, 2016, 3:03:12 PM3/9/16
to sqlalchemy
Thanks.  

So that would be a matter of adding convert_unicode=True to the connection string?

. . . ?trusted_connection=yes&charset=utf8&deprecate_large_types=True&convert_unicode=True

Mike Bayer

unread,
Mar 9, 2016, 3:14:21 PM3/9/16
to sqlal...@googlegroups.com
ah no that one looks like this

e = create_engine("mssql+pyodbc://user:pass@dsn", convert_unicode=True,
encoding="utf8")
> > an email to sqlalchemy+...@googlegroups.com <javascript:>
> > <mailto:sqlalchemy+...@googlegroups.com <javascript:>>.
> > To post to this group, send email to sqlal...@googlegroups.com
> <javascript:>
> > <mailto:sqlal...@googlegroups.com <javascript:>>.
> > Visit this group at https://groups.google.com/group/sqlalchemy
> <https://groups.google.com/group/sqlalchemy>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
Reply all
Reply to author
Forward
0 new messages