I have to say, this report is anything but informative, and I have (had?) a hard time making sense of some sections:
> Files and directories in Unicode (UTF-8) cannot be dealt with.
Do you mean that Mercurial does not transcode file names from some original encoding to NTFS-flavored UTF-16 [-1]?
> Our Problem
You don't really explain your problem, give any example, allow for any repro or anything like that, I do not see how the core team could help you without you providing any useful information.
I am guessing you created some non-ascii file names on a system using utf-8 (likely a linux system?), and the names came out garbled on Windows? I created a test repository containing non-ascii file names on OSX [0] and could reproduce the issue (I believe) after cloning it under Windows 7 using TortoiseHG: the names come out OK in TortoiseHG's log [1] but not on the file system [2]. There was no problem editing a file, even through the most basic text editor available on the plateform (Notepad), a revision was created by opening a file in notepad, adding a line taken from Wikipedia (korean I believe, I just took a line on a random non-eurolang wikipedia) [3].
Have you checked the Mercurial Wiki page on encoding, at least as reference? [4]. Quite simply (see section 4) mercurial simply assumes that non-ascii file names are not portable between systems (because they are not) and (as far as I understand what I read, I may be wrong) treats file names as byte streams, without trying to perform any manipulation on them. This is similar to what Unix filesystems (e.g. the ext* family) generally do.
> As unicode is not canonical (accented letter may also be letter+accent), one probably should also use normalization like NFC / NKFC.
I see no reason to do that, short of a tool or filesystem dealing incorrectly with a normalization it does not like.
[-1] http://blogs.msdn.com/b/michkap/archive/2006/09/10/748699.aspx
[0] https://bitbucket.org/masklinn/filenames
[1] http://imgur.com/5jkNV
[2] http://imgur.com/16Rwp
[3] https://bitbucket.org/masklinn/filenames/changeset/cbcbdaf5c8f6
[4] http://mercurial.selenic.com/wiki/EncodingStrategy
_______________________________________________
Mercurial mailing list
Merc...@selenic.com
http://selenic.com/mailman/listinfo/mercurial
> On 2011-04-14, at 13:12 , Joop Eggen wrote:
>
>> Our Problem
>
> You don't really explain your problem, give any example, allow for any
> repro or anything like that, I do not see how the core team could help
> you without you providing any useful information.
Well, there is no need for a reproduction script -- the problem he
describes is unfortunately a well-known problem in Mercurial: filenames
are read and written as bytes whereas other systems, Subversion in
particular, writes them as Unicode characters.
> Have you checked the Mercurial Wiki page on encoding, at least as
> reference? [4]. Quite simply (see section 4) mercurial simply assumes
> that non-ascii file names are not portable between systems (because
> they are not)
That is not true in general: it depends on the tool you use to access
the files with. Modern tools like OpenOffice and Subversion use the
current locale settings to decode the bytes themselves to decode them
into Unicode characters.
Older tools like make and CVS have not clue about character encodings in
filenames and Mercurial has unfortunately chosen to follow along in that
old tradition.
(I know that there might be filenames that cannot be decoded or encoded
with the current locale settings, but I feel that it would be better if
Mercurial dealt with that instead of punting on the issue.)
--
Martin Geisler
Mercurial links: http://mercurial.ch/
2011/4/16, Martin Geisler <m...@lazybytes.net>:
--
从我的移动设备发送
此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo
Am 14.10.2011 10:59 schrieb "Andrey" <py4...@gmail.com>:
>
> As far as I can see the message is clear: the problem is known but ignored.
As I understood it, it's more like "nobody knows how to fix it properly, without breaking other usecases "