On Feb 6, 2014, at 6:59 PM, Erich Blume <
blume...@gmail.com> wrote:
> Hmm, this one has me stumped. As best I can tell after poking at it using the column_reflect event, a custom dialect, etc. - the issue here is that in pysqlite.py we (in my Python 3.3 install) are selecting `sqlite3.dbapi2` as the dbapi interface, but we aren't telling sqlite3 anything about how to treat unicode errors. From what I am reading (but it seems inconsistent, maybe?) sqlite3 automatically decodes all database retrieved values from their bytes for text fields, returning unicode strings. Except... that doesn't always seem to be true. I hex-edited a db file to change the utf-8 string "hello" to "hell" + 0x92 and sqlite3 switched from returning "hello" to b"hell\x92", or something like that - I've been poking at this for so long I've lost track of that transcript.
>
> One can override sqlite3's text factory, apparently, with (for instance) `sqlite3.text_factory = lambda x: x.decode('utf-8', errors='ignore')`. Maybe the key is to try and find a way to trigger that from sqlalchemy? I tried and failed, maybe someone else can point me back to the path?
>
> Just to re-summarize the problem: In python 3, I'm getting errors trying to read a row from a sqlite database that has a TEXT column with an invalid utf-8 sequence (specifically, the singleton bye '0x92'). I'd love to just have sqlalchemy move along and ignore the byte, but I'm not clear how to do that.
Pysqlite (e.g. sqlite3 module) returns TEXT as Python unicode out of the gate. That exception message is being raised by sqlite3 itself, SQLAlchemy is just a pass-through, as the string type knows on sqlite that the value is already unicode.
you might need to CAST() the value as BINARY perhaps, not sure. You’d first want to get a plain sqlite3 script to do what you want. Setting a “text_factory” at the module level of sqlite3 is certainly easy enough but that seems way too broad. Ideally you’d want to be able to get the value on a per-column basis.