Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tcl, Windows, glob and encoding

67 views
Skip to first unread message

Alexandru

unread,
Mar 27, 2018, 1:27:38 PM3/27/18
to
Hi,

I face the problem with the encoding of special characters in Windows file names.

I get those file names from Tcl Tcl running as CGI script on an Apache server, send the file names over https to another Tcl app and displaying the file names in a Tablelist.

What I get, ist that the special characters are replaced by question marks.

Now, I understand that Windows file names use the Unicode character set, so I execute a Wish console on the server with

wish.exe -encoding utf-8

just to see what the Tcl interpreter sees.

When I execute "glob" in the directory with special chars, I get also here unwanted results. So there must be an incompatibility in the characters set used by Windows and Tcl.

But why?

Thanks.
Alexandru

Brad Lanam

unread,
Mar 27, 2018, 3:40:23 PM3/27/18
to
Specifics please.
Can you dump the filenames that are causing issues?

I have no issues with unicode filenames on windows.

Kevin Kenny

unread,
Mar 27, 2018, 3:50:27 PM3/27/18
to
I don't think it's that. I suspect that somewhere in the pipeline Tcl CGI - Apache https -
whatever https client, that someone is specifying an incorrect encoding for the content so that it comes in damaged. That would be the place to start looking, at any rate. Can you do the https transaction with wget or something, and report back what the headers and content (once the encipherment is stripped) look like?

'wish.exe -encoding utf-8' will have no effect, by the way, unless you also have a file name on the command line. The '-encoding' there changes the encoding of the initial script that you're reading in, but not the system encoding.

Alexandru

unread,
Mar 27, 2018, 3:54:36 PM3/27/18
to
This is the typical name I'm tryig to read:

1xAMSABS_3B_S_35-C-G1-KC__Standard-R11-1197-18.5-18.5-CN_1xAMSABS_3B_W_35-DBdy6.02.mpprt

but the problem is, I cann't even paste it here, it's already incompatible with the encoding on this group.

Here is a picture: https://www.meshparts.de/download/temp.png

Brad Lanam

unread,
Mar 27, 2018, 4:08:44 PM3/27/18
to
So it has a \uF00B character in there (I can see it in the text when I
do a post reply in google groups ...W _ \uF00B 3 5 - ...

a) Is that character supposed to be there?
b) \uF00B is listed as a unicode "private use area" in the character map
program. So I am suspecting that that character isn't supposed to
be there at all.

The character map program on linux shows a odd type of 7 character, not a
dot-in-the-middle. But since it's a private use are, I am guessing it
could be anything.

Alexandru

unread,
Mar 27, 2018, 4:53:12 PM3/27/18
to
a) No, the character should not be thre. The file name is not the original one, is created after parsing the content of a STEP file (CAD data) by Tcl.

I'm realizing this point now. I tried to create a ZIP archive from this file with the in-build tool of Windows and not even Windows can handle this file.

So I guess, I cannpt except that Tcl of my App can deal with this.

It's interesting though, that Windows can display the file name but not work on it.

Thanks.
Alexandru

Ralf Fassel

unread,
Mar 28, 2018, 5:54:57 AM3/28/18
to
* Alexandru <alexandr...@meshparts.de>
| I face the problem with the encoding of special characters in Windows
| file names.
>
| I get those file names from Tcl Tcl running as CGI script on an Apache
| server, send the file names over https to another Tcl app and
| displaying the file names in a Tablelist.

Check the encoding of the channel when sending the file names.
If it is 'system', this most probably is some codepage which replaces
the special chars with "?".

Eg,
set c \u1234
encoding convertto cp1252 $c
=> ?
whereas:
encoding convertto utf-8 $c
=> ሴ

HTH
R'

Alexandru

unread,
Mar 28, 2018, 6:44:58 AM3/28/18
to
Thanks, I will do. Right now I see that, like Brad wrote, the char in use non-standard Unicode is. So I cannot expect, that it will work. For example, Windows cannot pack the file into an Zip archive, because of the name...

Ralf Fassel

unread,
Mar 28, 2018, 7:28:51 AM3/28/18
to
* Alexandru <alexandr...@meshparts.de>
| Thanks, I will do. Right now I see that, like Brad wrote, the char in
| use non-standard Unicode is. So I cannot expect, that it will
| work. For example, Windows cannot pack the file into an Zip archive,
| because of the name...

'non-standard Unicode'? Never heard of that...

My "Send-to-ZIP-Archive" from Windows Explorer complains about \uf00b
being a char "not allowed in compressed archive file names", but in
contrast 7z creates a ZIP archive containing that file just fine.

HTH
R'

Alexandru

unread,
Mar 28, 2018, 7:50:02 AM3/28/18
to
Sorry, I meant "private use area" not non-standard.
0 new messages