Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Problem with filenames that include emoji characters

580 views
Skip to first unread message

Michael Soyka

unread,
Sep 1, 2023, 8:25:50β€―PM9/1/23
to
I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10 system. I recently received a collection of .eml files whose filenames include emoji characters. I assume these files were created by some email client such as Outlook. When emails are saved to a file the Subject line is used for the filename. I assume that this is how the filenames came to include emoji characters.

Now to the problem. When I try to access these files using Tcl, I get what I consider to be nonsensical errors. For example, the "open" command fails with the message "filename is invalid on this platform", even though the file does exist. On the other hand, various "file" commands that also take a filename argument, such as "exists" and "size", return "no such file or directory". Again, the file certainly does exist.

I can confirm that the emoji characters in these filenames have the values \u01f495 and \u01f49e, "two hearts" and "rotating hearts". The filenames also include the characters "FADED LOVERS TOUR" so I suppose that justifies their inclusion. :)

I haven't been able to construct such a filename using Tcl commands. Instead, I've used "glob" to get the filename from the filesystem (NTFS) and used the result as the argument for "open" and "file".

I admit I'm inexperienced in things UTF-8, encodings and code pages but
is this a bug to report or do I need to fill-in some gaps in my education?

Thanks in advance for any comments,
-mike

Andreas Leitgeb

unread,
Sep 2, 2023, 2:05:44β€―AM9/2/23
to
Michael Soyka <mss...@gmail.com> wrote:
> I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
> system. I recently received a collection of .eml files whose filenames
> include emoji characters. [...]

As an immediate Anser, I'd suggest looking for Christian Werner's
Undrowish (no typo, it's Androwish on Android-devices ported back
to PC)

Unicode-emojis are not well supported by standard Tcl 8.6.*
There is work in progress about fixing it for 8.7 and 9.0.

Undrowish has a "variant" of 8.6 that already now supports
these to some degree at the cost, that not all extensions
can be loaded into it. (others might be able to explain it
better)

Michael Soyka

unread,
Sep 2, 2023, 10:40:54β€―AM9/2/23
to
On 09/02/2023 2:05 AM, Andreas Leitgeb wrote:
> Michael Soyka <mss...@gmail.com> wrote:
>> I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
>> system. I recently received a collection of .eml files whose filenames
>> include emoji characters. [...]
>
> As an immediate Anser, I'd suggest looking for Christian Werner's
> Undrowish (no typo, it's Androwish on Android-devices ported back
> to PC)

Thank you for the information, I'll look into Undrowish.

>
> Unicode-emojis are not well supported by standard Tcl 8.6.*
> There is work in progress about fixing it for 8.7 and 9.0.

So this a known issue and there's no reason to file a bug report.

Harald Oehlmann

unread,
Sep 6, 2023, 12:21:40β€―PM9/6/23
to
Am 02.09.2023 um 16:40 schrieb Michael Soyka:
> On 09/02/2023 2:05 AM, Andreas Leitgeb wrote:
>> Michael Soyka <mss...@gmail.com> wrote:
>>> I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
>>> system.Β  I recently received a collection of .eml files whose filenames
>>> include emoji characters.Β  [...]
>>
>> As an immediate Anser, I'd suggest looking for Christian Werner's
>> UndrowishΒ  (no typo, it's Androwish on Android-devices ported back
>> to PC)
>
> Thank you for the information, I'll look into Undrowish.
>
>>
>> Unicode-emojis are not well supported by standard Tcl 8.6.*
>> There is work in progress about fixing it for 8.7 and 9.0.
>
> So this a known issue and there's no reason to file a bug report.

Well, we recently had a fix for this issue on Linux (TIP671).
The argument not to fix it was, that there are no bug reports on it ;-).

Take care,
Harald

Rolf Ade

unread,
Sep 6, 2023, 6:16:09β€―PM9/6/23
to

Michael Soyka <mss...@gmail.com> writes:
> I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
> [...]
> filenames came to include emoji characters.
>
> Now to the problem. When I try to access these files using Tcl, I get
> what I consider to be nonsensical errors. For example, the "open"
> command fails with the message "filename is invalid on this platform",
> even though the file does exist. On the other hand, various "file"
> commands that also take a filename argument, such as "exists" and
> "size", return "no such file or directory". Again, the file certainly
> does exist.

You haven't shown us how you call that commands in Tcl, with the emoji
literal in the source code or escaped as \Uxxxxx, for example and what
encoding your source file has.

Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there is
no problem in handling such filenames (with unicode code points in
proper utf.8 in it as emojis).

See for example:

# The following is: a\U1f972
set fd [open aπŸ₯² w+]
# \U1f926
puts $fd 🀦
close $fd

set fd [open aπŸ₯²]
puts [read $fd]
close $fd

This script works for me on linux with 8.6.10, 8.6.13 and 9. Though this
is on linux.

> I haven't been able to construct such a filename using Tcl commands.
> Instead, I've used "glob" to get the filename from the filesystem
> (NTFS) and used the result as the argument for "open" and "file".

So you can construct the filenames with results of Tcl commands and
successfully open the files?

rolf

Michael Soyka

unread,
Sep 7, 2023, 4:11:40β€―PM9/7/23
to
On 09/06/2023 6:16 PM, Rolf Ade wrote:
>
> Michael Soyka <mss...@gmail.com> writes:
>> I'm using the Magicsplat distribution of tcl 8.6.12 on a Windows 10
>> [...]
>> filenames came to include emoji characters.
>>
>> Now to the problem. When I try to access these files using Tcl, I get
>> what I consider to be nonsensical errors. For example, the "open"
>> command fails with the message "filename is invalid on this platform",
>> even though the file does exist. On the other hand, various "file"
>> commands that also take a filename argument, such as "exists" and
>> "size", return "no such file or directory". Again, the file certainly
>> does exist.
>
> You haven't shown us how you call that commands in Tcl, with the emoji
> literal in the source code or escaped as \Uxxxxx, for example and what
> encoding your source file has.

The filenames were obtained using the "glob" command. The files
themselves were created, I believe, by others using a mail client on
Windows.
>
> Since Tcl 8.6.10 I think and for sure with the upcomming Tcl 9 there is
> no problem in handling such filenames (with unicode code points in
> proper utf.8 in it as emojis).
>
> See for example:
>
> # The following is: a\U1f972
> set fd [open aπŸ₯² w+]
> # \U1f926
> puts $fd 🀦
> close $fd
>
> set fd [open aπŸ₯²]
> puts [read $fd]
> close $fd
>
> This script works for me on linux with 8.6.10, 8.6.13 and 9. Though this
> is on linux.
>
>> I haven't been able to construct such a filename using Tcl commands.
>> Instead, I've used "glob" to get the filename from the filesystem
>> (NTFS) and used the result as the argument for "open" and "file".
>
> So you can construct the filenames with results of Tcl commands and
> successfully open the files?

The only reason I tried to create a file that includes emoji characters
in its name was to investigate the contradictory responses I was getting
from the "open" and "file" commands.

However, that's not the primary issue I tried to raise so I'll try to be
more specific.

I was given a collection of files on a thumb drive. One of the files
contains a sequence of three emoji characters in its name: "two hearts",
"revolving hearts" and "two hearts". The corresponding unicode values
are \U01F495 and \U01F49E. One of the reasons I believe this is based
on the following code:

proc DisplayCharCodes {string} {
foreach c [split $string {}] {
puts [format {%s: %#x} $c [scan $c %c]]
}
}
set fileList [glob -type f *.eml]
set filename [lindex $fileList 1]
DisplayCharCodes $filename

which outputs the following:

N: 0x4e
E: 0x45
X: 0x58
T: 0x54
: 0x20
S: 0x53
A: 0x41
T: 0x54
.: 0x2e
: 0x20
2: 0x32
_: 0x5f
1: 0x31
5: 0x35
_: 0x5f
: 0x20
F: 0x46
A: 0x41
D: 0x44
E: 0x45
D: 0x44
: 0x20
L: 0x4c
O: 0x4f
V: 0x56
E: 0x45
R: 0x52
S: 0x53
: 0x20
T: 0x54
O: 0x4f
U: 0x55
R: 0x52
: 0x20
i: 0x69
n: 0x6e
: 0x20
P: 0x50
R: 0x52
O: 0x4f
V: 0x56
I: 0x49
D: 0x44
E: 0x45
N: 0x4e
C: 0x43
E: 0x45
!: 0x21
: 0x20
πŸ’•: 0x1f495
πŸ’ž: 0x1f49e
πŸ’•: 0x1f495
.: 0x2e
e: 0x65
m: 0x6d
l: 0x6c

Given the above, this is what "open" returns:

% open $filename r
couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

and the response of "file exists $filename" is zero.

So I'm looking for a reason behind this inconsistent and, in my mind,
nonsensical behavior. Is it a Windows issue, a Tcl issue, a little of
both and/or something else?

I hope the above clarifies my problem.

-mike

>
> rolf

Rolf Ade

unread,
Sep 7, 2023, 8:15:13β€―PM9/7/23
to
Thanks.

Yes, typically you should be able to use any file name returned by glob
as argument for open or file exists. There is an exception of that rule
(what Harald mattered) and that may be in place here.

Can you open the file in question with the file explorer? Perhaps you
can truncate it and provide it as download somewhere (in the hope that
the "strangeness" of the file name survives this actions, which is not a
given)?

The one known scenario which shows what you describe (you can't open a
filename you got from glob) is: the file names are written in another
encoding then what the system use for its filenames. Though, in what you
presented as results of your own investigations I cannot see indication
that this is the case here.

But perhaps it's in fact a strangeness of the used windows APIs (or how
they are used). At least you are right in saying this is strange and
need an explanation. If it's not the thing from above.

rolf

Michael Soyka

unread,
Sep 8, 2023, 2:49:55β€―PM9/8/23
to
Yes, using Windows Explorer I can open the file with Vim and open the
file with Outlook. I can also rename the file, deleting the 3 emoji
characters, and open it using the Tcl commands "glob" and "open".

>
> The one known scenario which shows what you describe (you can't open a
> filename you got from glob) is: the file names are written in another
> encoding then what the system use for its filenames. Though, in what you
> presented as results of your own investigations I cannot see indication
> that this is the case here.
>
> But perhaps it's in fact a strangeness of the used windows APIs (or how
> they are used). At least you are right in saying this is strange and
> need an explanation. If it's not the thing from above.

I've since copied the files from the same thumb drive onto my linux
system and retried the "glob" and "open" commands using 8.6.10- it all
works. My Windows version is 8.6.12, a later version, so it appears
that my problems are peculiar to Windows.

Thanks for your continuing interest- it's helped motivate me to look
deeper into the problem.

-mike

>
> rolf

Robert Heller

unread,
Sep 8, 2023, 3:52:19β€―PM9/8/23
to
> >>> set fd [open aðŸÂΒ₯² w+]
> >>> # \U1f926
> >>> puts $fd 🀦
> >>> close $fd
> >>> set fd [open aðŸÂΒ₯²]
> >> ðŸ’Â‒: 0x1f495
> >> 💞: 0x1f49e
> >> ðŸ’Â‒: 0x1f495

Noticing that these are 16-bit characters...

> >> .: 0x2e
> >> e: 0x65
> >> m: 0x6d
> >> l: 0x6c
> >>
> >> Given the above, this is what "open" returns:
> >>
> >> % open $filename r
> >> couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
> >> ðŸ’Â‒💞ðŸ’Â‒.eml": filename is invalid on this platform
What does DisplayCharCodes display under Linux? Do the emoji chars display as
16-bit chars or two 8-bit characters?

>
> Thanks for your continuing interest- it's helped motivate me to look
> deeper into the problem.
>
> -mike
>
> >
> > rolf
>
>
>

--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
hel...@deepsoft.com -- Webhosting Services

Michael Soyka

unread,
Sep 8, 2023, 4:15:19β€―PM9/8/23
to
>>>>> set fd [open aΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚Β₯Γƒβ€šΓ‚Β² w+]
>>>>> # \U1f926
>>>>> puts $fd ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚Β€Γƒβ€šΓ‚Β¦
>>>>> close $fd
>>>>> set fd [open aΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚Β₯Γƒβ€šΓ‚Β²]
>>>> ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚β€’: 0x1f495
>>>> ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚ΕΎ: 0x1f49e
>>>> ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚β€’: 0x1f495
>
> Noticing that these are 16-bit characters...

This doesn't look what I see in my posts which are the characters
themselves.

>
>>>> .: 0x2e
>>>> e: 0x65
>>>> m: 0x6d
>>>> l: 0x6c
>>>>
>>>> Given the above, this is what "open" returns:
>>>>
>>>> % open $filename r
>>>> couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
>>>> ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚β€’ΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚ΕΎΓƒΖ’Γ‚Β°Γƒβ€šΓ‚ΕΈΓƒβ€šΓ‚β€™Γƒβ€šΓ‚β€’.eml": filename is invalid on this platform
I see the emoji characters themselves followed by the 24-bit value.
Just to be clear though, by "emoji characters themselves" I mean as they
are displayed as in my earlier post, not as they are displayed above.

If I pipe the filename into a file and octal dump the file, I see these
byte values (octal) where the emoji characters are:

360 237 222 225 360 237 222 236 360 237 222 225

which looks like UTF-8 encoding to me. Microsoft claims it uses UTF-16
for its filenames so I'd guess the end-result is the same.

Rolf Ade

unread,
Sep 8, 2023, 6:28:25β€―PM9/8/23
to
For the record: I also saw the emojis as character glyph, they are just
ordinary unicode code points in utf-8 encodings; your system should be
able to handle this and for sure Tcl should be able to handle this.

Should be easy for listening Windows user to test. The file name in
question is:

NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml

Use the file explorer to create a file with that name. Then use Tcl
8.6.12- and look what glob returns for the directory with the file in
it. Then try to open the file name returned from glob and try file
exists.

On linux this all works well. I used emacs to create a file with the
name from above in an otherwise empty directory. Then, in an interactiv
tclsh session:

glob *
{NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml}
set filename [lindex [glob *] 0]
NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
set fd [open $filename]
file3
close $fd
file exists $filename
1

Mike, can you reproduce the issue with this recipt?

rolf

Michael Soyka

unread,
Sep 8, 2023, 8:40:46β€―PM9/8/23
to
Assuming you meant running this in my Windows box, the answer is no- it
still fails in exactly the same way in a new, empty directory:

% set filename [lindex [glob -type f *] 0]
NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
% open $filename r
couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

I haven't included the output from DisplayCharCodes but I promise it is
the same as shown way-back above.

Aside for the benefit of other readers.
Entering the emoji characters in Windows Explorer using the keyboard did
not work (I tried several methods). I had to create the characters in
Wordpad and paste them into Windows Explorer while renaming the file.

>
> rolf

Harald Oehlmann

unread,
Sep 9, 2023, 4:09:32β€―AM9/9/23
to
I can confirm to be able to reproduce:
I am in a folder with only this file in c:\test NTFS file system

% set l [glob *]
{NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml}
% set f [lindex $l 0]
NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
% file exists $f
0
% open $f r
couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform
% info patchlevel
8.6.13

This is TCL8.6.13 32 bit self compiled with MS-VC6.

Take care,
Harald

Rolf Ade

unread,
Sep 9, 2023, 7:07:33β€―AM9/9/23
to
Michael Soyka writes:
> On 09/08/2023 6:28 PM, Rolf Ade wrote:
>> Should be easy for listening Windows user to test. The file name in
>> question is:
>>
>> NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
>>
>> Use the file explorer to create a file with that name. Then use Tcl
>> 8.6.12- and look what glob returns for the directory with the file in
>> it. Then try to open the file name returned from glob and try file
>> exists.
>> On linux this all works well. [...]
>>
>> Mike, can you reproduce the issue with this recipt?
>
> Assuming you meant running this in my Windows box, the answer is no-
> it still fails in exactly the same way in a new, empty directory:
>
> % set filename [lindex [glob -type f *] 0]
> NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE! πŸ’•πŸ’žπŸ’•.eml
> % open $filename r
> couldn't open "NEXT SAT. 2_15_ FADED LOVERS TOUR in PROVIDENCE!
> πŸ’•πŸ’žπŸ’•.eml": filename is invalid on this platform

So you can reproduce the issue with this.

Since Harald confirmed it is high time for a bug report. Please open a
ticket on https://core.tcl-lang.org/tcl. What you try to do should work and
does work on linux; this looks like a windows platform issue.

rolf

Harald Oehlmann

unread,
Sep 9, 2023, 8:15:40β€―AM9/9/23
to
Done here:

https://core.tcl-lang.org/tcl/info/43b065660532eb4a

Please continue the discussion there !

Thank you all,
Harald
0 new messages