Error with SQL Server and utf-8 encoding

1,074 views
Skip to first unread message

Massi

unread,
Sep 1, 2015, 11:28:26 AM9/1/15
to sqlalchemy
Hi everyone,

I'm trying to manage read and write operations of utf-8 unicode strings with SQL Server (sqlalchemy 0.9.10), but I'm having some problems. I correctly write the strings to the database, but when I read them back and try to convert to unicode I get the following error:

Traceback (most recent call last):
  File "C:\Users\Impara 01\Desktop\t.py", line 18, in <module>
    print unicode(row[0], "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte
Process terminated with an exit code of 1

Here is a sample code showing the problem:

# -*- coding: utf-8 -*-
import sqlalchemy
from sqlalchemy import select, create_engine, MetaData, Table, Column
import datetime

engine = create_engine('mssql+pyodbc://MYHOST\SQLEXPRESS/user?trusted_connection=True&charset=utf8')
metadata = MetaData(engine)
t = Table('test', metadata,
          Column('unicode', sqlalchemy.dialects.mssql.VARCHAR())
)
t.create()
s = "àèìòù"
s = unicode(s, "utf-8")
t.insert().values(unicode=s).execute()
res = select([t.c.unicode]).execute().fetchall()
for i, row in enumerate(res):
    print unicode(row[0], "utf-8")

Can anyone point me out what I'm doing wrong?
Thanks in advance!



Mike Bayer

unread,
Sep 1, 2015, 11:42:08 AM9/1/15
to sqlal...@googlegroups.com
pyodbc and SQL Server are very particular about unicode.  In this case it seems like you are passing a Python unicode literal into Pyodbc, and assuming Pyodbc knows how to handle that, but then on the reception side you're assuming that you're getting a bytestring back, and not again a Python Unicode object.

to do a unicode round trip, use the SQLAlchemy Unicode type, and deal only with Python unicode literals in your script:

t = Table(.... Column('x', sqlalchemy.Unicode()))

s = u'àèìòù'

t.insert().values(unicode=s) ...

for row in res:
   print row[0]

SQLAlchemy will make sure that the value is passed to pyodbc in the expected format, in this case it is likely encoding to utf-8 on the way in and decoding from utf-8 on the way out.





Thanks in advance!



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Massi

unread,
Sep 2, 2015, 3:52:56 AM9/2/15
to sqlalchemy
It works! Thanks a lot!
Reply all
Reply to author
Forward
0 new messages