[patch][Win32] Problem with very long file name

116 views
Skip to first unread message

Ken Takata

unread,
Oct 12, 2014, 12:23:41 PM10/12/14
to vim...@googlegroups.com
Hi,

When opening a very long file name on Windows, the file becomes an empty file
if 'encoding' is not utf-8.
E.g.:

Step 1: Create a file with a very long name (longer than MAX_PATH bytes and
shorter than MAX_PATH characters)

> vim あいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえお.txt
(A file name with 200 multibyte characters and '.txt')
iHello<Esc>
:wq

The length of the file name is:
- 404 bytes in CP932 (longer than MAX_PATH bytes)
- 204 characters in UTF-16LE (shorter than MAX_PATH characters)

Step 2: Open the file again
> vim あいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえおあいうえお.txt

The file becomes empty (when swapfile is on).


The following block is extracted from findswapname() in memline.c:

/*
* If we start editing a new file, e.g. "test.doc", which resides on an
* MSDOS compatible filesystem, it is possible that the file
* "test.doc.swp" which we create will be exactly the same file. To avoid
* this problem we temporarily create "test.doc". Don't do this when the
* check below for a 8.3 file name is used.
*/
if (!(buf->b_p_sn || buf->b_shortname) && buf_fname != NULL
&& mch_getperm(buf_fname) < 0)
dummyfd = mch_fopen((char *)buf_fname, "w");

If buf_fname is longer than MAX_PATH bytes, mch_getperm() always returns an
error and mch_fopen() creates an empty file.


open() and fopen() are implemented with CreateFileA(), and CreateFileA() can
handle maximum MAX_PATH characters (*1).
However, some other CRT functions are limited to MAX_PATH bytes (e.g. stat()).
This inconsistency causes the problem.

(*1) http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858.aspx
"In the ANSI version of this function, the name is limited to MAX_PATH
characters."

To avoid this problem, maximum length should be limited to MAX_PATH bytes.
(If someone want to handle a file name which is longer than MAX_PATH bytes and
shorter than MAX_PATH characters, he should set 'enc' to utf-8.)

I wrote a patch to fix this problem.
Please check the attached patch.

Regards,
Ken Takata
fix-long-filename-with-ANSI-API.patch

Bram Moolenaar

unread,
Oct 15, 2014, 6:57:03 AM10/15/14
to Ken Takata, vim...@googlegroups.com
Can't we fix the places that are restricted to MAX_PATH bytes? If this
is 256 bytes that's quite short. Perhaps we can change a few places to
use MAX_PATH * 2, since this appears to be a problem with double-byte
encodings only.

We could check somewhere that MAX_PATH is 256, and then define another
variable to 512. Perhaps MAX_PATH_BYTES.

--
Edison's greatest achievement came in 1879, when he invented the
electric company. Edison's design was a brilliant adaptation of the
simple electrical circuit: the electric company sends electricity
through a wire to a customer, then immediately gets the electricity
back through another wire

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Ken Takata

unread,
Oct 16, 2014, 10:01:23 AM10/16/14
to vim...@googlegroups.com, ktakat...@gmail.com
Hi Bram,

2014/10/15 Wed 19:57:03 UTC+9 Bram Moolenaar wrote:


> Ken Takata wrote:
>
> > open() and fopen() are implemented with CreateFileA(), and CreateFileA() can
> > handle maximum MAX_PATH characters (*1).
> > However, some other CRT functions are limited to MAX_PATH bytes (e.g. stat()).
> > This inconsistency causes the problem.
> >
> > (*1) http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858.aspx
> > "In the ANSI version of this function, the name is limited to MAX_PATH
> > characters."
> >
> > To avoid this problem, maximum length should be limited to MAX_PATH bytes.
> > (If someone want to handle a file name which is longer than MAX_PATH bytes and
> > shorter than MAX_PATH characters, he should set 'enc' to utf-8.)
> >
> > I wrote a patch to fix this problem.
> > Please check the attached patch.
>
> Can't we fix the places that are restricted to MAX_PATH bytes? If this
> is 256 bytes that's quite short. Perhaps we can change a few places to
> use MAX_PATH * 2, since this appears to be a problem with double-byte
> encodings only.
>
> We could check somewhere that MAX_PATH is 256, and then define another
> variable to 512. Perhaps MAX_PATH_BYTES.

At first, MAX_PATH (or _MAX_PATH) is 260, not 256.

I think there are two problems, if 'enc' is not utf-8:

1. Vim cannot handle a very long filename which is longer than MAX_PATH bytes
and shorter than MAX_PATH characters properly.
2. Vim breaks an existing file if the filename is longer than MAX_PATH bytes
and shorter than MAX_PATH characters. This is a special case of 1 and
a critical problem.

My patch only fixes the problem 2.
I don't think the problem 1 needs to be fixed, because using enc=utf-8 doesn't
have the problem 1. When 'enc' is utf-8, Vim can handle a very long filename
properly.

Using MAX_PATH * 2 bytes doesn't solve the problem 1, because stat(),
FindFirstFileA() or some other API/CRT functions cannot handle longer than
MAX_PATH bytes. E.g. FindFirstFileA() uses WIN32_FIND_DATAA structure (*1)
and its cFileName member is only MAX_PATH bytes.
(*1) http://msdn.microsoft.com/en-us/library/windows/desktop/aa365740.aspx

If we want to use a very long filename even if 'enc' is not utf-8, we should
always use wide APIs on WinNT and later. We need to modify many part and it
has very high risk, I think.

Regards,
Ken Takata

Charles Campbell

unread,
Oct 16, 2014, 11:49:55 AM10/16/14
to vim...@googlegroups.com
Ken Takata wrote:
<snip>
> To avoid this problem, maximum length should be limited to MAX_PATH
> bytes. (If someone want to handle a file name which is longer than
> MAX_PATH bytes and shorter than MAX_PATH characters, he should set
> 'enc' to utf-8.) I wrote a patch to fix this problem. Please check the
> attached patch. Regards, Ken Takata
<snip>

As an English speaker and writer, I seldom would personally encounter
this problem. However, doesn't utf-32 permit 4-byte characters, and
wouldn't such characters in a sufficiently long filename still be
problematic?

Regards,
Chip Campbell

Reply all
Reply to author
Forward
0 new messages