Unable to read text from SQLite text column

578 views
Skip to first unread message

Nate Bargmann

unread,
Jun 28, 2012, 10:57:51 AM6/28/12
to python...@googlegroups.com
Perhaps I am missing something obvious.

I am trying to read columns from an SQLite database created by a Qt C++ application.  The SQLite column affinity is set to text but the suggested conversion techniques I have found through Web searches have still left me with the dreaded (<read-write buffer ptr 0xac66c70, size 5 at 0xac66c50>,) result.  Trying to convert using the str() function has resulted in the same output.  I am using Python 2.7.3rc2 and APSW 3.7.6.3-r1 on Debian Testing.

Here is example output starting with the sqlite command line utility:

The schema:

sqlite> .sch log
CREATE TABLE log (nr int primary key,time text,freq int,call text,band int,date text,mode int,snt1 text,snt2 text,snt3 text,snt4 text,rcv1 text,rcv2 text,rcv3 text,rcv4 text,pts int,valid int);

and the column output:

sqlite> select * from log where band = 2 and mode = 2;
6|1611|7020000|K1SJW|2|06152012|2|1A|KS|||2B|CT|||2|true
7|1614|7020000|N4JWK|2|06152012|2|1A|KS|||1B|NFL|||2|true
8|1614|7020000|KG5ELA|2|06152012|2|1A|KS|||1C|WTX|||2|true
9|1616|7022390|KB5KWQ|2|06152012|2|1A|KS|||1A|AR|||2|true

Opening the same database in Python:

>>> import apsw
>>> connection = apsw.Connection('fd043012.log')
>>> cursor = connection.cursor()
>>> for call in cursor.execute("SELECT call FROM log WHERE band = ? AND mode = ?", (2, 2)):
    print call

   
(<read-write buffer ptr 0xac66c70, size 5 at 0xac66c50>,)
(<read-write buffer ptr 0xac66ec8, size 5 at 0xac66ea8>,)
(<read-write buffer ptr 0xac66fe0, size 6 at 0xac66fc0>,)
(<read-write buffer ptr 0xac66fb8, size 6 at 0xac66f98>,)

Wrapping the returned column in str():

>>> for call in cursor.execute("SELECT call FROM log WHERE band = ? AND mode = ?", (2, 2)):
    print str(call)

   
(<read-write buffer ptr 0xac88108, size 5 at 0xac880e8>,)
(<read-write buffer ptr 0xac66f90, size 5 at 0xac66f70>,)
(<read-write buffer ptr 0xac66fb8, size 6 at 0xac66f98>,)
(<read-write buffer ptr 0xac66f90, size 6 at 0xac66f70>,)

It appears as though the data is encoded in such a way that Python still has no idea what to do with it.  I'm not sure what to try next as slicing, which I saw suggested returns the same result:

>>> for call in cursor.execute("SELECT call FROM log WHERE band = ? AND mode = ?", (2, 2)):
    print call[:]

   
(<read-write buffer ptr 0xac88130, size 5 at 0xac88110>,)
(<read-write buffer ptr 0xac66f90, size 5 at 0xac66f70>,)
(<read-write buffer ptr 0xac66fe0, size 6 at 0xac66fc0>,)
(<read-write buffer ptr 0xac66f90, size 6 at 0xac66f70>,)

Any help is appreciated.

- Nate

Nate Bargmann

unread,
Jun 28, 2012, 1:04:35 PM6/28/12
to python...@googlegroups.com
For the record,  I discovered the answer almost by accident.  I needed to encode the returned bytes as UTF8:

str(call[0]).encode('utf8')

The key was that the Web example was encoding in 'hex' and testing with that encoding showed me the byte values making up the string.  Everything is working and I'm off and running again.

- Nate

Roger Binns

unread,
Jun 28, 2012, 1:19:56 PM6/28/12
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28/06/12 07:57, Nate Bargmann wrote:
> The SQLite column affinity is set to text but the suggested conversion
> techniques I have found through Web searches have still left me with
> the dreaded (<read-write buffer ptr 0xac66c70, size 5 at 0xac66c50>,)
> result.

You are getting a blob back, not text. Even if the affinity is set to
text, there is no way that SQLite can perform affinity to convert a
blob(bytes) to text. The reason is quite simple: it is impossible to
convert bytes to text without knowing the encoding. The C++ app is very
broken.

And even if you wanted the affinity to do the conversion as say UTF8 it is
still impossible for it to know that for example something really was text
versus say an image.

> Here is example output starting with the sqlite command line utility:

The SQLite shell has a *very* nasty bug. For any values that are blobs,
it just outputs the bytes as is. If they happen to be text then it looks
fine. Anyway there is a command line tool included in APSW and it does
colouring, completion and whole bunch of other stuff.

http://apidoc.apsw.googlecode.com/hg/shell.html

Here I'll show you the difference:

SQLite version 3.7.13 (APSW 3.7.13-r1)
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table foo(col1,col2);
sqlite> insert into foo values('AB', X'4142');
sqlite> select * from foo;
AB|<Binary data>

SQLite version 3.7.4
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table foo(col1,col2);
sqlite> insert into foo values('AB', X'4142');
sqlite> select * from foo;
AB|AB

If you use .dump on the tables you can see what is going on, or use typeof:

sqlite> select typeof(col1), col1, typeof(col2), col2 from foo;
text|AB|blob|AB


> It appears as though the data is encoded in such a way that Python
> still has no idea what to do with it.

Python knows exactly what to do with it. It was added to the database
using sqlite3_bind_blob - ie the QT application explicitly said the data
was a bucket of bytes and did not use sqlite3_bind_text. So APSW and
Python treat your bucket of bytes as a bucket of bytes. This is
absolutely the correct thing to do.

Read this if you disagree in any way:

http://www.joelonsoftware.com/articles/Unicode.html

The correct thing to do is fix the C++ app. The SQLite internals are
somewhat ambivalent about the differences between strings and blobs and
many of the string functions (eg lower and length) treat them the same,
but others won't (eg equality).

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk/skjwACgkQmOOfHg372QRy1ACg4UCx2ZEWppzrW1/8ohz7r4E+
XuEAnis7/EGtV1Pfm8Bs8lIg2p5IgFXe
=A+tY
-----END PGP SIGNATURE-----

Roger Binns

unread,
Jun 28, 2012, 1:22:23 PM6/28/12
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28/06/12 10:04, Nate Bargmann wrote:
> Everything is working and I'm off and running again.

Nope, you've just swept a major issue under the rug. If you want data
treated as text then add it as text. Your workaround is broken in various
ways (eg you won't be able to use it on Python 3). You'll find yourself
getting bitten again in different ways.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk/sks8ACgkQmOOfHg372QTRKgCgsABbfP6mIXc8JiIl+DySMlPQ
4WgAoOW9gKfn+wdK35YaL0KFa5DpYNXQ
=9q9R
-----END PGP SIGNATURE-----

Nate Bargmann

unread,
Jun 28, 2012, 1:49:37 PM6/28/12
to python...@googlegroups.com
On Thursday, June 28, 2012 12:19:56 PM UTC-5, Roger Binns wrote:

On 28/06/12 07:57, Nate Bargmann wrote:
> The SQLite column affinity is set to text but the suggested conversion
> techniques I have found through Web searches have still left me with
> the dreaded (<read-write buffer ptr 0xac66c70, size 5 at 0xac66c50>,)
> result.

You are getting a blob back, not text.  Even if the affinity is set to
text, there is no way that SQLite can perform affinity to convert a
blob(bytes) to text.  The reason is quite simple:  it is impossible to
convert bytes to text without knowing the encoding.  The C++ app is very
broken.

I don't disagree.  I think the data being stored could become locale dependent, in fact will likely need to be locale dependent and not simple ASCII.
Thanks for the background insight.  Perhaps some of this could be included in the APSW docs, perhaps in the Tips section?  If it already is, I likely missed it.
 
> It appears as though the data is encoded in such a way that Python
> still has no idea what to do with it.

Python knows exactly what to do with it.  It was added to the database
using sqlite3_bind_blob - ie the QT application explicitly said the data
was a bucket of bytes and did not use sqlite3_bind_text.  So APSW and
Python treat your bucket of bytes as a bucket of bytes.  This is
absolutely the correct thing to do.

Read this if you disagree in any way:

  http://www.joelonsoftware.com/articles/Unicode.html

I have no disagreement, I was just trying to figure out how I might encode the bytes to get the string I was expecting.  

The correct thing to do is fix the C++ app.  The SQLite internals are
somewhat ambivalent about the differences between strings and blobs and
many of the string functions (eg lower and length) treat them the same,
but others won't (eg equality).

That is worthy of consideration.  To be fair, I doubt the author of the C++ app expected the database to be accessed outside of the C++ program, but one can expect that someone will want to do so sooner or later.  I will propose that the C++ app specifically save its strings to the DB in Unicode.  Right now I think it simply stores a QByteArray as a QString conversion, or something like that.
 
- Nate

Nate Bargmann

unread,
Jun 28, 2012, 1:51:03 PM6/28/12
to python...@googlegroups.com

On Thursday, June 28, 2012 12:22:23 PM UTC-5, Roger Binns wrote:

On 28/06/12 10:04, Nate Bargmann wrote:
> Everything is working and I'm off and running again.

Nope, you've just swept a major issue under the rug.  If you want data
treated as text then add it as text.  Your workaround is broken in various
ways (eg you won't be able to use it on Python 3).  You'll find yourself
getting bitten again in different ways.

As the data is written by an outside application the fix will need to occur there.  In fact, I am opening the DB as read only as I don't want or need to modify the DB from the Python script.  As the C++ app's use of SQLite is young, I think this can be changed with minimal pain now.

- Nate

Roger Binns

unread,
Jun 28, 2012, 3:03:20 PM6/28/12
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28/06/12 10:49, Nate Bargmann wrote:
> I think the data being stored could become locale dependent, in fact
> will likely need to be locale dependent and not simple ASCII.

You are mixing up several things there. The issue you are experiencing is
to do with encodings not locales. SQLite only stores text as Unicode:

http://apidoc.apsw.googlecode.com/hg/tips.html#unicode

If you use SQLite text then you are using Unicode. In some locations they
don't like Unicode that much. For example some Japanese prefer the native
Shift-JIS over Unicode. If you wanted to store shift-jis then you would
need to store it as a blob and ensure that all handling of the bytes knows
that is the encoding. The reason for disliking unicode is in this
article: http://en.wikipedia.org/wiki/Han_unification

Separately locales matter too. For example they can affect rendering (the
same codepoints being drawn differently), equality, case conversion and
sorting. The usual approach taken is to pick up the locale and apply it
to the whole session. You can also apply it to SQLite tables if you use
the ICU extension:

http://apidoc.apsw.googlecode.com/hg/extensions.html#icu

Concrete examples are that Turkish has four different letter i's. Calling
upper/lower for a Turkish locale will give different results than most
other locales. Each locale also has one or more sorting rules. For
example German has different sorting depending on if the data is for a
dictionary or a telephone listing. (Then to make life fun what should
happen when a German user has Swedish city names in the list - do you use
German or Swedish rules?)

You could store a locale per string but that is unlikely to be too
helpful. This report gives you an idea of the complexity of locale
sorting issues:

http://unicode.org/reports/tr10/

> Thanks for the background insight. Perhaps some of this could be
> included in the APSW docs, perhaps in the Tips section? If it already
> is, I likely missed it.

APSW doesn't let you mess up. There is no way you can confuse it over
text versus bytes. The problem is when other developers in other
languages put garbage into the database. I'll update this section to give
some more info:

http://apidoc.apsw.googlecode.com/hg/tips.html#unicode

> To be fair, I doubt the author of the C++ app expected the database to
> be accessed outside of the C++ program,

And presumably they never intended to actually use the database for string
based queries either?

> Right now I think it simply stores a QByteArray as a QString
> conversion, or something like that.

Make sure to give them the Joel article. Have them repeat that bytes are
different than characters and you can't convert unless you know the
encoding. Don't let them write any more code until they really get it!

It is quite possible that when their app is run in Japan it uses shift-jis
which would break your Python code.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk/sqngACgkQmOOfHg372QTmXACfdWzYsymG3L+NarMpXC5GqcRL
OswAn2TzTThYY9sQnKhjYqPO/jd7nKdc
=CRvm
-----END PGP SIGNATURE-----
Reply all
Reply to author
Forward
0 new messages