SQLAlchemy MYSQL query utf8 character problem

634 views
Skip to first unread message

chat...@gmail.com

unread,
Sep 3, 2020, 4:55:38 AM9/3/20
to sqlalchemy

Trying to query all items from a mysql (charset:utf8) table which has a field that contains rows with chinese and other special characters I am taking the above error

items = session.query(Item).all()

File "/root/.local/share/virtualenvs/server-WesSANjA/lib/python3.8/site-packages/MySQLdb/cursors.py", line 355, in _post_get_result self._rows = self._fetch_row(0) File "/root/.local/share/virtualenvs/server-WesSANjA/lib/python3.8/site-packages/MySQLdb/cursors.py", line 328, in _fetch_row return self._result.fetch_row(size, self._fetch_type) File "/usr/local/lib/python3.8/encodings/cp1252.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 30: character maps to

Simon King

unread,
Sep 3, 2020, 5:16:52 AM9/3/20
to sqlal...@googlegroups.com
You mention utf8, but the error suggests that the data is being
decoded as cp1252. Are you declaring an explicit charset when you
create your engine, as suggested here:

https://docs.sqlalchemy.org/en/13/dialects/mysql.html#unicode

What does this output:

for row in dbsession.execute("show variables like 'character%'").fetchall():
print(row)

Warning: I've run an application for a long time where I didn't
specify the charset in the connection string. SQLAlchemy defaulted to
encoding strings as utf8 (because the dialect didn't support unicode
strings). However, my output from the above command looked something
like this:

('character_set_client', 'latin1')
('character_set_connection', 'latin1')
('character_set_database', 'utf8')
('character_set_filesystem', 'binary')
('character_set_results', 'latin1')
('character_set_server', 'latin1')
('character_set_system', 'utf8')
('character_sets_dir', '/usr/share/mysql/charsets/')

This meant that SQLAlchemy was sending utf-8 strings, but the database
was interpreting them as latin1. To make things worse, some of my
tables have a default charset of latin1, and the others are utf8. The
result is that the tables that are declared to hold latin1 data
actually hold utf8, and the tables that are declared to hold utf8
actually hold double-encoded utf8.

Simon
Reply all
Reply to author
Forward
0 new messages