UnicodeDecodeError during reflection

Dirk Makowski

unread,

Nov 29, 2011, 9:51:37 PM11/29/11

to sqlal...@googlegroups.com

Hello,

using SQLAlchemy 0.7.3's Inspector, I'm reflecting the columns of a table, and it fails with this trace:

File "./main.py", line 106, in <module>

dbmgr.reflect()

File "./main.py", line 32, in reflect

self.fetch_tables(schema='eg')

File "./main.py", line 58, in fetch_tables

cols0 = self.inspector.get_columns(t, schema=schema)

File "/home/dm/myprojects/Runcible-env/lib/python2.7/site-packages/SQLAlchemy-0.7.3-py2.7-linux-x86_64.egg/sqlalchemy/engine/reflection.py", line 230, in get_columns

**kw)

File "<string>", line 1, in <lambda>

File "/home/dm/myprojects/Runcible-env/lib/python2.7/site-packages/SQLAlchemy-0.7.3-py2.7-linux-x86_64.egg/sqlalchemy/engine/reflection.py", line 46, in cache

ret = fn(self, con, *args, **kw)

File "/home/dm/myprojects/Runcible-env/lib/python2.7/site-packages/SQLAlchemy-0.7.3-py2.7-linux-x86_64.egg/sqlalchemy/dialects/postgresql/base.py", line 1191, in get_columns

rows = c.fetchall()

File "/home/dm/myprojects/Runcible-env/lib/python2.7/site-packages/SQLAlchemy-0.7.3-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2985, in fetchall

l = self.process_rows(self._fetchall_impl())

File "/home/dm/myprojects/Runcible-env/lib/python2.7/site-packages/SQLAlchemy-0.7.3-py2.7-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2952, in _fetchall_impl

return self.cursor.fetchall()

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

I have pinned down the cause to be a column's default value which contains a "ü" character (u umlaut). The query SA emits to get the columns should give for that column:

-----------------+-----------------------------+------------------------------------------------+------------+--------+-----------

min_amount_unit | character varying(16) | 'Stück'::character varying | t | 10 | 43091

Running the query from psql prompt, or from a simple script that uses psycopg2 directly, gives no error. So the problem is SA-related.

(The database's encoding is SQLASCII.)

Is there a fix or workaround for this?

Thanks in advance,

- Dirk

Michael Bayer

unread,

Nov 29, 2011, 10:01:10 PM11/29/11

to sqlal...@googlegroups.com

You'd need to ensure your client charset is utf-8. This would be in your postgresql.conf file. That is also psycopg2 raising the error in the stack trace - the difference is SQLAlchemy uses psycopg2's psycopg2.extensions.UNICODE extension to coerce multibyte characters into Python unicode objects, and if you added this into your psycopg2-only script you'd probably get the same error. Passing use_native_unicode=False to create_engine() would disable the usage of this extension, in which case SQLAlchemy would by default use the utf-8 codec to decode the incoming row.

Dirk Makowski

unread,

Nov 29, 2011, 10:39:00 PM11/29/11

to sqlal...@googlegroups.com

Thanks for the quick reply. Passing use_native_unicode=False to create_engine() indeed prevents this error. Is this parameter missing in the docs for create_engine() (http://www.sqlalchemy.org/docs/core/engines.html#engine-creation-api)?

I had assumed the "encoding" parameter to set the wanted client-encoding, but changing it had no effect on the decode error. What is the purpose of this parameter, then?

The psycopg2 script detects the database encoding as "SQLASCII" and prints out "'St\xc3\xbcck'::character varying".

After adding

dbapi.extensions.register_type(dbapi.extensions.UNICODE, c)

indeed the decode error happens.

However, if I explicitly do set_client_encoding('utf8'), then it prints without an error u"'St\xfcck'::character varying"

I had expected the "encoding" parameter of create_engine() to have the same purpose as set_client_encoding() of psycopg2. Did I misunderstand sth.?

-- Dirk

Michael Bayer

unread,

Nov 29, 2011, 10:52:38 PM11/29/11

to sqlal...@googlegroups.com

On Nov 29, 2011, at 10:39 PM, Dirk Makowski wrote:

Thanks for the quick reply. Passing use_native_unicode=False to create_engine() indeed prevents this error. Is this parameter missing in the docs for create_engine() (http://www.sqlalchemy.org/docs/core/engines.html#engine-creation-api)?

you need to look at the docs specific to the DBAPI in use (we support like 20): http://www.sqlalchemy.org/docs/dialects/postgresql.html#module-sqlalchemy.dialects.postgresql.psycopg2

I had assumed the "encoding" parameter to set the wanted client-encoding, but changing it had no effect on the decode error. What is the purpose of this parameter, then?

"encoding" as passed to create_engine() only affects those encode/decode operations which SQLAlchemy performs itself; when a DBAPI does not accept Python unicode objects, or does not return them, SQLAlchemy uses this as the charset to coerce to/from Python unicode. Even DBAPIs that do "support" unicode still often expose areas where SQLA has to do things by hand; bound parameter names, bound parameter values, result set values, the string statement itself, the string names within cursor.description (many DBAPIs especially miss that last one).

As we all move to Python 3 and those DBAPIs will at last all be forced to fully support unicode natively (which has already been happening for several years), this parameter will likely no longer be needed.

The psycopg2 script detects the database encoding as "SQLASCII" and prints out "'St\xc3\xbcck'::character varying".
After adding
dbapi.extensions.register_type(dbapi.extensions.UNICODE, c)
indeed the decode error happens.
However, if I explicitly do set_client_encoding('utf8'), then it prints without an error u"'St\xfcck'::character varying"

I had expected the "encoding" parameter of create_engine() to have the same purpose as set_client_encoding() of psycopg2. Did I misunderstand sth.?

Yeah funny story here is that I forgot all about that setting. We added support for that in 0.7.3 and that is the "client_encoding" parameter on create_engine(), documented here: http://www.sqlalchemy.org/docs/dialects/postgresql.html#client-encoding - however quite disastrously that paragraph supercedes/contradicts the "unicode" paragraph further up, so I'm going to merge those two together right now.

Dirk Makowski

unread,

Nov 29, 2011, 11:18:33 PM11/29/11

to sqlal...@googlegroups.com

I see. Thanks for the explanation. I totally forgot to read the dialect specific docs... ;)

Maybe it would be nice to have the general docs point out that specialities for a certain subject like encodings are also handled by some dialects. E.g. for create_engine(): "Some dialects may have additional parameters concerning the client encoding. Please see..."

-- Dirk

Dirk Makowski

unread,

Nov 29, 2011, 11:25:01 PM11/29/11

to sqlal...@googlegroups.com

P.S. for completeness:

As you had suggested,

create_engine(encoding='utf-8', convert_unicode=False, client_encoding='utf8')

runs perfectly without "use_native_unicode".

-- Dirk

Dirk Makowski

unread,

Nov 29, 2011, 11:35:59 PM11/29/11

to sqlal...@googlegroups.com

Ah, there already is such a paragraph at create_engine(). Please ignore my suggestion :)

Michael Bayer

unread,

Nov 29, 2011, 11:41:03 PM11/29/11

to sqlal...@googlegroups.com

On Nov 29, 2011, at 11:18 PM, Dirk Makowski wrote:

I see. Thanks for the explanation. I totally forgot to read the dialect specific docs... ;)

Maybe it would be nice to have the general docs point out that specialities for a certain subject like encodings are also handled by some dialects. E.g. for create_engine(): "Some dialects may have additional parameters concerning the client encoding. Please see..."

the point is there just more generically and not prominently enough, near that particular section:

**kwargs takes a wide variety of options which are routed towards their appropriate components. Arguments may be specific to the Engine, the underlying Dialect, as well as the Pool. Specific dialects also accept keyword arguments that are unique to that dialect. Here, we describe the parameters that are common to most create_engine() usage.

For "encoding" it should be mentioned and it's also unfortunate that the Unicode doclink is broken there. The best description of the mechanism is then under the convert_unicode flag under the String type: http://www.sqlalchemy.org/docs/core/types.html?highlight=unicode#sqlalchemy.types.String.__init__ ...but even there there's broken formatting...ugh. But perhaps cleaning up that text and each of these points need to filter down to "convert_unicode", since that's really the main place "encoding" has any relevance.

Here's the new section (do a shift-reload): http://www.sqlalchemy.org/docs/dialects/postgresql.html#unicode

-- Dirk

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/pD17Lb4TaFsJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Reply all

Reply to author

Forward