Unicode UTF-8 workarounds

19 views
Skip to first unread message

Edward Jahn

unread,
Nov 16, 2009, 10:01:43 AM11/16/09
to ope...@googlegroups.com
This is a contribution to a discussion that has come and gone a few times in this group – Unicode support.

I have been working with Unicode UTF-8, and found some things that might be useful for those who want to use it. This applies to Windows, which is probably the most difficult environment, but the one that I am currently forced to work with.

I have not been able to get UTF-8 correctly in QM Console on Windows. The problem is that Windows somehow hard codes the display font as Lucida Console, in a way that prevents the font from working properly, even when Unicode is turned on within the font. I don’t use unix or Accuterm, but I like to think that neither of them would have such an absurd limitation.

However – UTF-8 will store correctly in QM files. As has been pointed out before, it does not have conflicts with any of the mark characters. QM read’s, write’s, readseq’s, and writeseq’s, all handle it correctly as far as internal storage is concerned.

UTF-8 data written from QM into a file in a suitable format will give a correct display when viewing that file in suitable software. I have done this for two formats – Notepad, and Excel 2003 XML – both of which are read not just by Microsoft Office, but by Open Office, and probably other packages as well.

The secret to the Notepad solution is to put a string consisting of char(239):char(187):char(191) at the beginning of the file. Nothing else is needed. A text file can be switched back and forth between UTF-8 and ANSI mode just by inserting and deleting this character string.

The attached program, UTF8.LIST, takes a LIST statement and puts it out in the Notepad format. Martin could, if there were enough market demand, pretty easily add an option to LIST that would do the same thing.


Excel 2003 XML is a little harder. It needs XML commands to be generated, first in a startup sequence, then in a series of <row>…<cell>…</cell>…</row> commands containing the data, then a final sequence. The startup sequence needs to contain the total number of rows and columns. But most of the startup and final sequences can be stored in a file and re-used without change. The attached program, X03.LIST, does all this. It uses stored XML sequences SS.XML.BEGINNING and SS.XML.ENDING, that are also attached.


Excel 2003 XML is good not just for supporting UTF-8 but for producing spreadsheets containing all kinds of data. So maybe there would be more market demand for an X03 option in LIST.

None of this handles the issues involved with sorting, display of dates, and other such woes that internationalization is prey to. But for people like myself, who only need to store and report, it is unexpectedly easy and quite satisfactory.


UTF8.LIST.P
X03.LIST.P
SS.XML.BEGINNING
SS.XML.ENDING

Martin Phillips

unread,
Nov 16, 2009, 10:40:10 AM11/16/09
to ope...@googlegroups.com
Hi Ed,

Extended character set support remains one of those highly desireable items
that has yet to find its way to the top of the wish list. Whether we do
UTF-8, Unicode or some other variant is open to discussion. There are strong
arguments for a fixed size character in memory but equally strong arguments
for variable length characters on disk.

QM Console uses a Windows Console (hence the name). This has a number of
limitations imposed by Windows, one of which is that code pages (needed by
some users) only work with the Lucida Console font. The largely defunct
QMTerm emulator is effectively the same as QM Console but with a network in
the middle.

As you say, UTF-8 does store correctly and, conveniently, has no clashes
with the mark characters. This is one of its strong appeals as the way to
implement extended character sets. The downside is that variable length
characters make operations such as substring extraction more complex.

When we finally pluck up the courage to do extended character set support,
the impact is far greater than just being able to display "foreign"
characters. As you suggest, there are associated changes to sorting, case
conversion, etc.

Your sample programs are much appreciated and we will look at these to see
what we can usefully do to improve the existing character support. The long
awaited query processor rewrite is well under way and will include the
ability to add user written back-ends to render data in ways that we do not
support natively. This may make it easy to generate UTF-8 (etc) directly
from the query processor.


Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton, NN4 6DB
+44-(0)1604-709200

Tony G

unread,
Nov 16, 2009, 2:24:50 PM11/16/09
to Ope...@googlegroups.com
> From: Edward Jahn
> Excel 2003 XML is good not just for supporting UTF-8 but for
> producing spreadsheets containing all kinds of data. So maybe
> there would be more market demand for an X03 option in LIST.

You may be interested in NebulaXLite which builds XML for
spreadsheets _exactly_ as you have described. However, that
format is for more than just definition of rows and columns: It's
faciliates borders, colors, cell shading, fonts, merging of
cells, width and height settings, comments, data type details,
and lots of other features that people expect in a rich
spreadsheet (Excel, OpenOffice, or GoogleDocs).

Download the version for QM and docs now:
remove.pleaseNebula-RnD.com/products/xlite.htm

NebulaXLite is completely free for developers, no strings
attached. It's only $200 for an end-user server when you start
creating production reports. There are no support fees,
connectivity components, upsells, or other extras. We encourage
developers to embed NebulaXLite in their solutions, and some have
done so as their new "Export to Excel" enhancements.

Thanks for your time.

Tony Gravagno
Nebula Research and Development
TG@ remove.pleaseNebula-RnD.com
Nebula R&D sells mv.NET and other Pick/MultiValue products
worldwide, and provides related development services
remove.pleaseNebula-RnD.com/blog
Visit PickWiki.com! Contribute!
NEW: http://Twitter.com/TonyGravagno

sdavmor

unread,
Nov 16, 2009, 3:16:25 PM11/16/09
to ope...@googlegroups.com
Tony G wrote:
>> From: Edward Jahn Excel 2003 XML is good not just for supporting
>> UTF-8 but for producing spreadsheets containing all kinds of
>> data. So maybe there would be more market demand for an X03
>> option in LIST.
>
> You may be interested in NebulaXLite which builds XML for
> spreadsheets _exactly_ as you have described.

[snip]

> Download the version for QM and docs now:
> remove.pleaseNebula-RnD.com/products/xlite.htm
>
> NebulaXLite is completely free for developers, no strings
> attached.

[snip]

I'm going to have to check this out, Tony.
--
Cheers, SDM -- a 21st Century Schizoid Man
Systems Theory music project: <www.systemstheory.net>
on MySpace: <www.myspace.com/systemstheory>
on GarageBand: <www.garageband.com/systemstheory>
on Last FM: <www.last.fm/music/Systems+Theory>
on CDBaby: CDBaby <www.cdbaby.com/all/systemstheory>
get "Codetalkers" *free* at <www.mikedickson.org.uk/codetalkers>
NP: Greg MAov "Gecko Highway"

eppick77

unread,
Nov 17, 2009, 10:07:52 AM11/17/09
to OpenQM
I hope that the query rewrite is going well for you!

Eugene

On Nov 16, 10:40 am, "Martin Phillips" <martinphill...@ladybridge.com>
wrote:
Reply all
Reply to author
Forward
0 new messages