Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

When do we need "encoding system"

51 views
Skip to first unread message

Alexandru

unread,
Aug 2, 2022, 12:30:18 PM8/2/22
to
Recently I though it would be a good idea to add "encoding system utf-8" to my code. After that I realized that the icons of Windows folders in the treectrl package are not shown anymore, if the folder path contains special chars such as umlaute. So I must revert back. But when do we need this command anyway?

Rich

unread,
Aug 2, 2022, 2:16:27 PM8/2/22
to
Alexandru <alexandr...@meshparts.de> wrote:
> Recently I though it would be a good idea to add "encoding system
> utf-8" to my code.

Will only work right if the OS system call encoding is also UTF-8.

> After that I realized that the icons of Windows folders in the
> treectrl package are not shown anymore, if the folder path contains
> special chars such as umlaute. So I must revert back.

Yup, expected, as windows system calls are likely largely still UTF-16.

> But when do we need this command anyway?

When you need to change the encoding for a system call that accepts
something other than the overall default for the rest. From the man page:

encoding system ?encoding?
Set the system encoding to encoding. If encoding is omitted then
the command returns the current system encoding. The system
encoding is used whenever Tcl passes strings to system calls.

The key phrase is the last sentence.

Overall, unless you are testing obscure things, it is probably best to
leave the system encoding alone.

Alexandru

unread,
Aug 2, 2022, 4:42:41 PM8/2/22
to
Thanks Rich for the explanation.
I think Windows uses cp1252.
So it's a mess: I write files typically in utf-8, read them back in utf-8.
All the application data is encoded in utf-8 although the system encoding is cp1252.
E.g. when I use CAWT to read an Excel file, it's content is cp1252 but somehow this still works?

Regards
Alexandru

Rich

unread,
Aug 2, 2022, 5:21:14 PM8/2/22
to
Alexandru <alexandr...@meshparts.de> wrote:
> Rich schrieb am Dienstag, 2. August 2022 um 21:16:27 UTC+3:
>> Alexandru <alexandr...@meshparts.de> wrote:
>> > Recently I though it would be a good idea to add "encoding system
>> > utf-8" to my code.
>>
>> Overall, unless you are testing obscure things, it is probably best to
>> leave the system encoding alone.
>
> Thanks Rich for the explanation.
> I think Windows uses cp1252.

cp1252 is a font mapping, UTF-16 is an encoding - two different, but
related, items. Font mappings define what characters each integer
value represents (such as 65 meaning capital letter A in ASCII).
Encodings are how the integers are stored in memory (in the case of
UTF-16, as 16-bit integer values).

> So it's a mess: I write files typically in utf-8, read them back in
> utf-8.

Yep, and most new work really should be in UTF-8, unless you need
something else due to 'legacy'.

> All the application data is encoded in utf-8 although the system
> encoding is cp1252.

Again, that legacy stuff... :)

> E.g. when I use CAWT to read an Excel file, it's content is cp1252
> but somehow this still works?

Yes, because Tcl transparently converts it from cp1252 (and whatever
encoding it is stored in) for you.

Ralf Fassel

unread,
Aug 3, 2022, 4:07:09 AM8/3/22
to
* Rich <ri...@example.invalid>
| Alexandru <alexandr...@meshparts.de> wrote:
| > Recently I though it would be a good idea to add "encoding system
| > utf-8" to my code.
>
| Will only work right if the OS system call encoding is also UTF-8.
>
| > After that I realized that the icons of Windows folders in the
| > treectrl package are not shown anymore, if the folder path contains
| > special chars such as umlaute. So I must revert back.
>
| Yup, expected, as windows system calls are likely largely still UTF-16.

I'm not convinced that this is the real reason for that error.

In my experience, the file handling functions on Windows don't care
about the system encoding when it comes to the *name* of the file - they
simply convert TCL's internal rep to wide char
(win/tclWinFile.c:TclNativeCreateNativeRep(), using
MultiByteToWideChar() from CP_UTF8).

In contrast, the code on unix indeed uses the system encoding to get the
file name to open (unix/tclUnixFile.c:TclNativeCreateNativeRep() uses
Tcl_UtfToExternalDString(NULL,) where the NULL denotes the system
encoding).

I rather suspect that the file *reading* behind the scenes relies on the
system encoding being 'correct'. That might fail if the system encoding
ist set to utf-8, but the file content is not stored in utf-8.

R'
0 new messages