Re: [sqlite3-ruby] Numerous issues working with UTF-16LE databases

162 views
Skip to first unread message

Aaron Patterson

unread,
May 20, 2013, 12:08:06 AM5/20/13
to sqlite...@googlegroups.com
On Sun, May 19, 2013 at 7:42 PM, Adrian Irving-Beer <wi...@wisq.net> wrote:
Hi,

So I'm trying to work with a database written by another program. It's encoded in UTF-16LE, and I'm trying to edit data in it in Ruby on both Mac and Linux.  I've been having numerous problems so far.

----

Problem #1: Incorrect encoding on queried results

Using the "sqlite3" program:

SQLite version 3.7.16.2 2013-04-12 11:52:43
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> pragma encoding;
UTF-16le
sqlite> SELECT * FROM ItemTable WHERE key = 'muted';
muted|false

Using sqlite3-ruby:

>> db = SQLite3::Database.new("file__0.localstorage")
=> #<SQLite3::Database:0x000000030906e8>
>> q = db.query("SELECT * FROM ItemTable WHERE key = 'muted'"); key, value = q.next
=> ["\u756D\u6574\x64", "f\x00a\x00l\x00s\x00e\x00"]
>> [key, value].map(&:encoding)
=> [#<Encoding:UTF-16LE>, #<Encoding:ASCII-8BIT>]

It actually has them reversed:

>> key.force_encoding("ASCII-8BIT")
=> "muted"
>> value.force_encoding("UTF-16LE")
=> "false"

It also gets the encoding wrong for values I insert myself, but in a different way:

>> db.execute("INSERT INTO ItemTable VALUES ('insertedKey', 'insertedValue')")
=> []
>> q = db.query("SELECT * FROM ItemTable WHERE key = 'insertedKey'"); key, value = q.next
=> ["\u6E69\u6573\u7472\u6465\u654B\x79", "\u6E69\u6573\u7472\u6465\u6156\u756C\x65"]
>> [key, value].map(&:encoding)
=> [#<Encoding:UTF-16LE>, #<Encoding:UTF-16LE>]
>> key.force_encoding("ASCII-8BIT")
=> "insertedKey"
>> value.force_encoding("ASCII-8BIT")
=> "insertedValue"

But sqlite3 is happy with my inserted stuff:

sqlite> select * from itemtable where key = 'insertedKey';
insertedKey|insertedValue

Note that the above occurs even if I create a DB completely from scratch, using

    SQLite3::Database.new(filename.sqlite".encode("UTF-16LE"))

so it's not just that I have a weird corrupt database or something.  (Also, is that filename encode trick documented anywhere?  I believe I randomly stumbled across it in sample code.)

Can you post the schema for your database, and a test script?  If we can get a script to reproduce the issue, I think we can fix it.

It also seems I'm not the only one to encounter this, but the author of this post just found a satisfactory workaround and went with that: http://stackoverflow.com/questions/8250171/ruby-sqlite3-encoding-problems

----

Problem #2: Binding parameters in queries only accepts UTF-8

>> q = db.query("SELECT * FROM ItemTable WHERE key = ?", ["muted".encode("US-ASCII")]); q.count
=> 0
>> q = db.query("SELECT * FROM ItemTable WHERE key = ?", ["muted".encode("UTF-16LE")]); q.count
=> 0
>> q = db.query("SELECT * FROM ItemTable WHERE key = ?", ["muted".encode("ASCII-8BIT")]); q.count
=> 0
>> q = db.query("SELECT * FROM ItemTable WHERE key = ?", ["muted".encode("UTF-8")]); q.count
=> 1

Discovered this when I tried to do queries in a source file that wasn't marked with an "enconding: utf-8" comment tag at the top.  My strings were US-ASCII by default and I couldn't do any queries with bound params.

The SQLite3 documentation doesn't mention what encoding it supports for bind values.  It seems that it may only support UTF-8.  We have code that transcodes from your source encoding to the database encoding when you bind values here:


Which means that if you open the database as UTF16, we'll transcode to UTF16 (if the string isn't UTF8).  I'm guessing the parameter must always be UTF8 (so we should always transcode to UTF8).

----

So, yeah.  What's up with the current status of sqlite3-ruby UTF-16 support?  Are these just a really unfortunate series of bugs, or is UTF-16 not supported at all and I'm basically on my own?  (I'd be happier if it was at least consistently wrong, but it seems to be wrong in different ways depending on the situation.)

We support it, you're just the first person to report UTF16 related issues.  I can reproduce the second issue, so I can fix it.  Can you give us a test program for the first issue and I can fix that too?

--
Aaron Patterson
http://tenderlovemaking.com/
Reply all
Reply to author
Forward
0 new messages