[...]
>>
>> Perhaps, jj.tcl isn't really utf-8? What tool or editor did
>> you use to create it?
[...]
> I was using windows notepad for quick experiment. notepad save-as says
> utf-8.... Normally I don't use notepad. MS is misleading.
>
> notepad utf-8 format hexdump: efbbbf707574732022e0a485e0a49a22
> gedit utf-8 format hexdump: 707574732022e0a485e0a49a220d0a
Argh. This is the well-known Microsoft's UTF-8 Idiocy. One of the
advantages of UTF-8 is that plain ASCII files don't need a change:
they still are plain ASCII and they are at the same time UTF-8.
Not so in the Wonderful World of Microsoft: their tools stupidly insist
on inserting a Byte Order Mark (BOM) at the beginning of the file. You'd
need this BOM[1] if you are reading/writing 16 bit encodings (the reader
needs to know which byte sex the file was written with, to know whether
to swap the bytes on each 16 bit word). The BOM has the hexadecimal
value 0xfeff, so if you are reading 0xffef you know to swap.
Now UTF-8 is a byte stream, so it doesn't need a BOM. The cited
Wikipedia page is more polite than me ("The Unicode Standard does
permit the BOM in UTF-8,[2] but does not require or recommend its
use"). My take is that Microsoft should be banned from standard bodies
until they know to behave themselves. Brrr.
[1] <
https://secure.wikimedia.org/wikipedia/en/wiki/Byte_order_mark>
Regards
-- tomás