String encoding in the SQLite DB

8 views
Skip to first unread message

Nate Bargmann

unread,
Jun 28, 2012, 2:07:32 PM6/28/12
to SO2SDR list
I set about to write a Python program to generate dupe lists for Field
Day (and as an exercise to uderstand working with SQLite and Python
better). I found that the string data--call, date, time, exchange data,
etc.--is not stored as UTF8 but as simple ASCII byte blobs. This is
probably an issue as it means any third party tool accessing the DB file
has to make a conversion that is prone to being broken, and what will it
mean for SO2SDR users outside of the 'C' locale?

It would probably be a good idea to ensure that all strings are stored
in UTF8. I see that SQLite offers the user_version PRAGMA:

http://www.sqlite.org/pragma.html#pragma_schema_version

Which can be used to set a version number. The current DB schema can be
determined by this value as it returns '0' on the database files SO2SDR
has created for me recently. Setting it higher would note a revised
schema. Some conversion code could be included for the short term and
scheduled to be removed at a later date/version.

73, de Nate >>

--

"The optimist proclaims that we live in the best of all
possible worlds. The pessimist fears this is true."

Ham radio, Linux, bikes, and more: http://www.n0nb.us

R. Torsten Clay

unread,
Jun 28, 2012, 5:51:07 PM6/28/12
to so2...@googlegroups.com
This would require quite a lot of work to fix. Internally, some of those things
(like callsign) are stored as QByteArrays, which are basically just ascii. I admit
I haven't been very consistent which are QByteArrays and which are QString. These
could all be changed to QString which does support UTF8.

But for example, how do you process a non-ascii UTF8 call when it gets sent to winkey?

Tor

Nate Bargmann

unread,
Jun 28, 2012, 6:14:22 PM6/28/12
to so2...@googlegroups.com
* On 2012 28 Jun 16:51 -0500, R. Torsten Clay wrote:
> This would require quite a lot of work to fix. Internally, some of
> those things (like callsign) are stored as QByteArrays, which are
> basically just ascii. I admit I haven't been very consistent which
> are QByteArrays and which are QString. These could all be changed to
> QString which does support UTF8.
>
> But for example, how do you process a non-ascii UTF8 call when it gets
> sent to winkey?

I'll admit, it's a tricky process, as I hadn't thought about the Morse
aspect at all. Hmmmm.

I suppose that the easiest is to ensure that values promote to QString
when writing to the DB and are read back as QStrings so the DB is UTF8.
Interally, using the QByteArray variables should be fine, no?

Files written to ASCII are automatically UTF8 as I understand it, so the
file exports/imports probably won't need to be changed.

As I recall, amateur radio contest exchanged information is confined to
values represented by ASCII. There may be local stuff that isn't but I
doubt this program will be used in such instances. But if someone does
and gets gibberish out of their keyer, I guess they'll need to invent a
Morse keyer and code that works!

Further down the road I'll hack on this a bit as it relates to SQLite
and see what happens.

Nate Bargmann

unread,
Jul 13, 2012, 3:41:11 PM7/13/12
to so2...@googlegroups.com
I found through examination of a log file that had been edited in SO2SDR
that callsign or exchange data was stored as UTF-8 by the Qt table
editing class. As I saw no issue with edited log data in SO2SDR versus
data stored as QByteArray blobs, I've tested the following patch
locally:

- newqso.setValue(SQL_COL_CALL, QVariant(qso->call));
+ newqso.setValue(SQL_COL_CALL, QVariant(QString::fromAscii(qso->call)));

Doing this for the Sent and Receive exchange fields seems to work as
well.

Looking at the sqlite dump comparing records entered before the applied
patch:

INSERT INTO "log" VALUES(19,'0154',21250400,X'573047434A',4,'06202012',4,X'3141',X'4B53',NULL,NULL,X'3241',X'4B53',NULL,NULL,1,'true');

and after the applied patch:

INSERT INTO "log" VALUES(31,'2024',14000000,'K1YT',3,'07102012',2,'1A','KS',NULL,NULL,'1D','NH',NULL,NULL,2,'true');

if I understand it, the leading X from the SQLite dump indicates a
binary blob, although in this case they are ASCII byte values for the
fields stored as strings.

As this seems to be a non-visible change, I don't think that setting the
PRAGMA user_version is necessary.

I haven't pushed this to my tree on Github yet.

Nate Bargmann

unread,
Jul 13, 2012, 4:56:28 PM7/13/12
to so2...@googlegroups.com
I have pushed the patch into my tree. I may work some of the IARU HF
Championship contest this weekend to see if any problems arise.
Reply all
Reply to author
Forward
0 new messages