On 09/06/2023 6:16 PM, Rolf Ade wrote:
>
> Michael Soyka <
mss...@gmail.com> writes:
>> I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
>> [...]
>> filenames came to include emoji characters.
>>
>> Now to the problem. When I try to access these files using Tcl, I get
>> what I consider to be nonsensical errors. For example, the "open"
>> command fails with the message "filename is invalid on this platform",
>> even though the file does exist. On the other hand, various "file"
>> commands that also take a filename argument, such as "exists" and
>> "size", return "no such file or directory". Again, the file certainly
>> does exist.
>
> You haven't shown us how you call that commands in Tcl, with the emoji
> literal in the source code or escaped as \Uxxxxx, for example and what
> encoding your source file has.
The filenames were obtained using the "glob" command. The files
themselves were created, I believe, by others using a mail client on
Windows.
>
> Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there is
> no problem in handling such filenames (with unicode code points in
> proper utf.8 in it as emojis).
>
> See for example:
>
> # The following is: a\U1f972
> set fd [open aπ₯² w+]
> # \U1f926
> puts $fd π€¦
> close $fd
>
> set fd [open aπ₯²]
> puts [read $fd]
> close $fd
>
> This script works for me on linux with 8.6.10, 8.6.13 and 9. Though this
> is on linux.
>
>> I haven't been able to construct such a filename using Tcl commands.
>> Instead, I've used "glob" to get the filename from the filesystem
>> (NTFS) and used the result as the argument for "open" and "file".
>
> So you can construct the filenames with results of Tcl commands and
> successfully open the files?
The only reason I tried to create a file that includes emoji characters
in its name was to investigate the contradictory responses I was getting
from the "open" and "file" commands.
However, that's not the primary issue I tried to raise so I'll try to be
more specific.
I was given a collection of files on a thumb drive. One of the files
contains a sequence of three emoji characters in its name: "two hearts",
"revolving hearts" and "two hearts". The corresponding unicode values
are \U01F495 and \U01F49E. One of the reasons I believe this is based
on the following code:
proc DisplayCharCodes {string} {
foreach c [split $string {}] {
puts [format {%s: %#x} $c [scan $c %c]]
}
}
set fileList [glob -type f *.eml]
set filename [lindex $fileList 1]
DisplayCharCodes $filename
which outputs the following:
N: 0x4e
E: 0x45
X: 0x58
T: 0x54
: 0x20
S: 0x53
A: 0x41
T: 0x54
.: 0x2e
: 0x20
2: 0x32
_: 0x5f
1: 0x31
5: 0x35
_: 0x5f
: 0x20
F: 0x46
A: 0x41
D: 0x44
E: 0x45
D: 0x44
: 0x20
L: 0x4c
O: 0x4f
V: 0x56
E: 0x45
R: 0x52
S: 0x53
: 0x20
T: 0x54
O: 0x4f
U: 0x55
R: 0x52
: 0x20
i: 0x69
n: 0x6e
: 0x20
P: 0x50
R: 0x52
O: 0x4f
V: 0x56
I: 0x49
D: 0x44
E: 0x45
N: 0x4e
C: 0x43
E: 0x45
!: 0x21
: 0x20
π: 0x1f495
π: 0x1f49e
π: 0x1f495
.: 0x2e
e: 0x65
m: 0x6d
l: 0x6c
Given the above, this is what "open" returns:
% open $filename r
couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
πππ.eml": filename is invalid on this platform
and the response of "file exists $filename" is zero.
So I'm looking for a reason behind this inconsistent and, in my mind,
nonsensical behavior. Is it a Windows issue, a Tcl issue, a little of
both and/or something else?
I hope the above clarifies my problem.
-mike
>
> rolf