Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

opening files with unicode characters in the file name on windows

24 views
Skip to first unread message

Mathias Dahl

unread,
Aug 2, 2004, 8:53:50 AM8/2/04
to
I'm cannot get emacs to open up a file that has a file name
with unicode characters in it. I have created these file
names by copy-paste from the Character Map tool in
Windows. As Emacs has good suupport for reading "unicode
formats" like UTF-8, UTF-16 etc it is a pity that it cannot
open these files.

My emacs version:

GNU Emacs 21.3.50.1 (i386-mingw-nt5.1.2600) of 2004-07-09 on FARIBA

OS:

Windows XP

Any suggestions to how I could open these files (other than
renaming them of course) are appreciated.

Kevin Rodgers

unread,
Aug 2, 2004, 12:14:47 PM8/2/04
to

Does (setq file-name-coding-system 'utf-8) help?

--
Kevin Rodgers

Mathias Dahl

unread,
Aug 3, 2004, 2:32:15 AM8/3/04
to
Kevin Rodgers <ihs_...@yahoo.com> writes:

> > I'm cannot get emacs to open up a file that has a file name
> > with unicode characters in it. I have created these file
> > names by copy-paste from the Character Map tool in
> > Windows. As Emacs has good suupport for reading "unicode
> > formats" like UTF-8, UTF-16 etc it is a pity that it cannot
> > open these files.

> Does (setq file-name-coding-system 'utf-8) help?

No, even though it was a very interesting option. When I set
that variable I can *save* files and the file names looks
very cryptic in explorer.exe, probably because Windows use
UTF-16, but when I set the variable to UTF-16, emacs seems
to lock up and I have to press C-g almost the whole time,
VERY strange...

Anyway, if I used UTF-8 and saved a file containing swedish
characters, this file was visible with correct characters in
for examle dired, and Windows saw then as garbage.

Is UTF-16 not supported in this case or do I have an emacs
that is buggy (I'm using CVS stuff after all)?

Eli Zaretskii

unread,
Aug 3, 2004, 3:19:31 PM8/3/04
to help-gn...@gnu.org
> From: Mathias Dahl <brakj...@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 03 Aug 2004 08:32:15 +0200

>
> > Does (setq file-name-coding-system 'utf-8) help?
>
> No, even though it was a very interesting option. When I set
> that variable I can *save* files and the file names looks
> very cryptic in explorer.exe, probably because Windows use
> UTF-16

Your original message said ``file names with Unicode characters''.
Can you tell what characters are those, and why do you think they are
encoded in some Unicode-related encoding, like UTF-16? Can you look
at the file's name as recorded in the directory with some low-level
tool that actually shows the byte values that encode the file's name?

You see, I suspect that Windows file names are encoded in the system
codepage, not in UTF-16. So perhaps setting file-name-coding-system
to that codepage would solve the problem.


Mathias Dahl

unread,
Aug 4, 2004, 3:46:35 AM8/4/04
to
"Eli Zaretskii" <el...@gnu.org> writes:

> > From: Mathias Dahl <brakj...@hotmail.com>
> > Newsgroups: gnu.emacs.help
> > Date: 03 Aug 2004 08:32:15 +0200
> >
> > > Does (setq file-name-coding-system 'utf-8) help?
> >
> > No, even though it was a very interesting option. When I set
> > that variable I can *save* files and the file names looks
> > very cryptic in explorer.exe, probably because Windows use
> > UTF-16

> Your original message said ``file names with Unicode
> characters''. Can you tell what characters are those, and
> why do you think they are encoded in some Unicode-related
> encoding, like UTF-16?

Well, I have been surfing around for a couple of weeks ago
since I have had to debug some unicode-issues in our
applications. Everywhere I go I rad about how Microsoft uses
unicode internally for string, and also in file names. And
as they say that they use UTF-16 for strings and file
content I just supposed they used it for encoding file names
too. But of course I man be wrong. And I really mean that as
I am a complete beginner when it comes to unicode.

> Can you look at the file's name as recorded in the
> directory with some low-level tool that actually shows the
> byte values that encode the file's name?

No, but I would really like to. :)



> You see, I suspect that Windows file names are encoded in
> the system codepage, not in UTF-16. So perhaps setting
> file-name-coding-system to that codepage would solve the
> problem.

Hmm, ok. I will try that, I just have to figure out which
code page I am currently using. Thanks for the tip, I will
report back my findings here.

Btw, is there some more "low-level" way of opening files in
Emacs so that I can open ANY file regardless of how the file
name is encoded?

/Mathias

jasonr

unread,
Aug 4, 2004, 3:56:08 AM8/4/04
to
Mathias Dahl <brakj...@hotmail.com> writes:

> Hmm, ok. I will try that, I just have to figure out which
> code page I am currently using. Thanks for the tip, I will
> report back my findings here.

Take a look at the value of locale-coding-system. That is the most
likely candidate for file-name-coding-system.

Mathias Dahl

unread,
Aug 4, 2004, 4:42:34 AM8/4/04
to

I tried these now (I actually think they are the same):

(setq file-name-coding-system 'cp1252)
(setq file-name-coding-system 'windows-1252)

And I cannot open the files with cyrillic or arabic or
hebrew characters in them. I am almost convinced that
Windows *do* encode them with UTF-16, but when I set UTF-16
as file-name-coding-system emacs freezes whatever I do and I
have to keep pressing C-g to unfreeze it. :(

Mathias Dahl

unread,
Aug 4, 2004, 10:27:05 AM8/4/04
to
"Eli Zaretskii" <el...@gnu.org> writes:

> Your original message said ``file names with Unicode characters''.
> Can you tell what characters are those, and why do you think they
> are encoded in some Unicode-related encoding, like UTF-16? Can you
> look at the file's name as recorded in the directory with some
> low-level tool that actually shows the byte values that encode the
> file's name?

I have done some investigation and I am pretty sure UTF-16 is the
encoding used. The following VBScript program (sorry for pasting
non-emacs related stuff here) loops through all files in a folder and
if the file names contain character values > 255 displays a list with
unicode code point values:

' -- TestUnicoceFileNames.vbs ---

Option Explicit

' --------- Main program starts

Dim sFileName
Dim oFSO
Dim oFile

Set oFSO = CreateObject("Scripting.FileSystemObject")

For Each oFile In oFSO.GetFolder("c:\document\my docs").Files
checkUnicodeFileName(oFile.Name)
Next

Set oFSO = Nothing

' --------- Main program ends

Private Sub checkUnicodeFileName(fileName)

Dim i
Dim c
Dim n

For i = 1 to Len(fileName)

c = Mid(fileName, i, 1)
n = AscW(c)

If n > 255 Then
MsgBox "File name contains unicode characters: " & _
Chr(10) & Chr(10) & _
"File name: " & fileName & _
Chr(10) & Chr(10) & _
"Characters and their unicode code points:" & _
Chr(10) & Chr(10) & _
getStringInfo(fileName)
Exit Sub
End If

Next

End Sub

Private Function getStringInfo(s)
Dim i
Dim n
Dim c
Dim h
Dim result

result = "Char" & Chr(9) & "U+NNNN" & Chr(10) & Chr(10)

For i = 1 to Len(s)
c = Mid(s, i, 1)
n = AscW(c)
h = Hex(n)
result = result & c & Chr(9) & Right("0000" & h, 4) & Chr(10)
Next

getStringInfo = result

End Function

' -- TestUnicoceFileNames.vbs end here---

The output looks like this (you do not see the actual characters which
I do if I use a "unicode font" for message boxes):

File name contains unicode characters:

File name: pravda_правда.txt

Characters and their unicode code points:

Char U+NNNN

p 0070
r 0072
a 0061
v 0076
d 0064
a 0061
_ 005F
п 043F
р 0440
а 0430
в 0432
д 0434
а 0430
. 002E
t 0074
x 0078
t 0074

/Mathias

Eli Zaretskii

unread,
Aug 4, 2004, 12:29:53 PM8/4/04
to help-gn...@gnu.org
> From: Mathias Dahl <brakj...@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 04 Aug 2004 10:42:34 +0200

>
> I am almost convinced that Windows *do* encode them with UTF-16, but
> when I set UTF-16 as file-name-coding-system emacs freezes whatever
> I do and I have to keep pressing C-g to unfreeze it. :(

What value, exactly, did you try to use for file-name-coding-system?


Mathias Dahl

unread,
Aug 5, 2004, 7:28:26 AM8/5/04
to
"Eli Zaretskii" <el...@gnu.org> writes:

> > I am almost convinced that Windows *do* encode them with UTF-16,
> > but when I set UTF-16 as file-name-coding-system emacs freezes
> > whatever I do and I have to keep pressing C-g to unfreeze it. :(
>
> What value, exactly, did you try to use for file-name-coding-system?

I did this:

(setq file-name-coding-system 'utf-16)

I also tested this after starting up with --no-init-file.

If I do

(setq file-name-coding-system 'utf-8)

it works and my file names looks very funny in Explorer.exe if I save
a file with, for example, Swedish characters... :)

I tried something similar on Emacs 21.3 on Mandrake at home, setting
the coding-system to mule-utf-16-le and now my Putty-window sits
there, freezing... :)

Eli Zaretskii

unread,
Aug 6, 2004, 5:38:45 AM8/6/04
to help-gn...@gnu.org
> From: Mathias Dahl <brakj...@hotmail.com>
> Newsgroups: gnu.emacs.help
> Date: 05 Aug 2004 13:28:26 +0200

>
> > > I am almost convinced that Windows *do* encode them with UTF-16,
> > > but when I set UTF-16 as file-name-coding-system emacs freezes
> > > whatever I do and I have to keep pressing C-g to unfreeze it. :(
> >
> > What value, exactly, did you try to use for file-name-coding-system?
>
> I did this:
>
> (setq file-name-coding-system 'utf-16)

Sounds like a bug, so please take this to gnu.emacs.bug.

Meanwhile, if you set debug-on-quit to t and repeat what you told
above, what traceback do you see after C-q? That traceback should
show what function is inflooping for utf-16.


Mathias Dahl

unread,
Aug 6, 2004, 7:44:45 AM8/6/04
to
"Eli Zaretskii" <el...@gnu.org> writes:

> Sounds like a bug, so please take this to gnu.emacs.bug.

Last time I tried to post there I got a reply to the e-mail
address I use that the article could not be posted. Is there
another interface than news, maybe an e-mail gateway?



> Meanwhile, if you set debug-on-quit to t and repeat what
> you told above, what traceback do you see after C-q? That
> traceback should show what function is inflooping for
> utf-16.

This is what I get:

Debugger entered--Lisp error: (quit)
utf-8-pre-write-conversion(1 4)
file-exists-p("h:/")
make-directory("h:/.emacs.d/auto-save-list/" t)

This is what I do to get this error:

1. Start emacs using --no-init-file
2. M-: (setq debug-on-quit t)
3. C-g -- to make emacs load debug, which it will not be
able to load after I change the
file-name-coding-system
4. (setq file-name-coding-system 'utf-16)
5. Wait for a while
6. After switching back to emacs it has frozen
7. "Unfreeze" with C-g

The error is not all the one above, even though the
make-directory part seems to be there most of the times.

Also, I'll try setting my home to a local drive to see if
things change.

/Mathias

Mathias Dahl

unread,
Aug 6, 2004, 9:08:03 AM8/6/04
to
"Eli Zaretskii" <el...@gnu.org> writes:

> >
> > I did this:
> >
> > (setq file-name-coding-system 'utf-16)
>
> Sounds like a bug, so please take this to gnu.emacs.bug.

I reported the bug via e-mail.

Thanks for all the suggestions!

/Mathias

0 new messages