[BUG] 'non-empty string' >? '' returns false on amd64 arch

21 views
Skip to first unread message

ZyX

unread,
May 23, 2011, 3:33:23 PM5/23/11
to vim...@googlegroups.com, vim...@googlegroups.com
Reply to message «Re: question about string expression evaluation / bug?»,
sent 22:07:10 23 May 2011, Monday
by hsitz:

> Still wondering about the issue with different values for my empty
> string comparison, though. Seems like it must be a bug in either
> design or implementation of 'ignorecase'. I wonder whether with
> 'ignorecase' set the expression: 'abc' > '' returns 0 on 64-bit vim
> and 1 on 32-bit. . .
I guess it's time to post this to vim-dev as a bug since we managed to deduce
conditions under which it is reproduced. I personally can say that I was able to
reproduce it on vim-7.3.198 (--with-features=huge --enable-perlinterp --enable-
tclinterp --enable-luainterp --enable-rubyinterp --enable-python3interp,
revision f0cc719cd129) and vim-7.3.189 (USE='X acl bash-completion cscope gpm
nls perl python ruby vim-pager -debug -minimal') from gentoo repos on amd64.

Original message:
> On May 23, 8:12 am, hsitz <hes...@gmail.com> wrote:
> > > This is a reason why I never use `==', `!=', `>', `>=', `<', `<=' for
> > > comparing strings, only `is'/`isnot' (it looks better then `==#' and
> > > `!=#') and operators with either `?' or `#' at the end.
>
> Zyx -- I didn't even realize 'is'/'isnot' were defined for strings.
> However, it seems that they are equivalent to '==' and '!=' and not
> the matchcase operators you suggest. From the docs:
> "the original |List|. When using "is" without a |List| it is
> equivalent to
> using "equal", using "isnot" equivalent to using "not equal". Except
> that a
> different type means the values are different. "4 == '4'" is true, "4
> is '4'"
> is false."
>
> E.g.,
>
> :set ignorecase
> :echo 'abc' is 'ABC' (output is 1)
> :echo 'abc' == 'ABC' (output is 1)
> :echo 'abc' ==# 'ABC' (output is 0)
>
> Your point about specifying matchcase or ignorecase expressly is a
> good one. I will be modifying my code to do that.
>
> Still wondering about the issue with different values for my empty
> string comparison, though. Seems like it must be a bug in either
> design or implementation of 'ignorecase'. I wonder whether with
> 'ignorecase' set the expression: 'abc' > '' returns 0 on 64-bit vim
> and 1 on 32-bit. . .
>
> -- Herb
>
> > Zyx -- Thanks very much, I think you're onto something.
> >
> > However on my machine the two expressions you give above both evaluate
> > to 1. What is the explanation for the difference?:
> > ------------------------------------------
> > :echo 'DONE' ># ''
> > 1
> >
> > :echo 'DONE' >? ''
> >
> > 1
> > ----------------------------------------

signature.asc

Ivan Krasilnikov

unread,
May 24, 2011, 10:21:00 AM5/24/11
to vim...@googlegroups.com, vim...@googlegroups.com
+vim_dev@

On Tue, May 24, 2011 at 18:14, Ivan Krasilnikov <inf...@gmail.com> wrote:
> I confirm the problem. Looks like there's a bug in UTF-8 handling in
> function mb_strnicmp() in mbyte.c, specifically in the following "if"
> which was introduced by patch 7.3.040:
>
> /* Don't case-fold illegal bytes or truncated characters. */
> if (utf_ptr2len(s1 + i) < l || utf_ptr2len(s2 + i) < l)
>  return -1;
>
> The check "utf_ptr2len(s2 + i) < l" is wrong.

Ivan Krasilnikov

unread,
May 24, 2011, 8:56:47 PM5/24/11
to vim...@googlegroups.com, vim...@googlegroups.com
Also mb_strnicmp() assumes that lowercase and uppercase characters
have the same length in UTF-8 representation. This isn't the case.
Here are a few counterexamples:

$ python -c 'print " ".join(["0x%.2X" % n for n in range(65536) if
len(unichr(n).encode("utf8")) !=
len(unichr(n).lower().encode("utf8"))])'

0x130 0x23A 0x23E 0x1E9E 0x2126 0x212A 0x212B 0x2C62 0x2C64 0x2C6D 0x2C6E 0x2C6F

So I think the UTF-8 part of mb_strncimp() needs to be completely rewritten.

Tony Mechelynck

unread,
May 24, 2011, 9:49:03 PM5/24/11
to vim...@googlegroups.com, Ivan Krasilnikov, vim...@googlegroups.com

Yes, and in Turkish (i.e. with ":lang ctype tr" and 'casemap' empty), I
and i (1 byte each) have as respective case-counterparts ı and İ (2
bytes each).


Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
94. Now admit it... How many of you have made "modem noises" into
the phone just to see if it was possible? :-)

Ivan Krasilnikov

unread,
May 25, 2011, 2:39:24 PM5/25/11
to vim...@googlegroups.com, vim...@googlegroups.com
On Wed, May 25, 2011 at 14:09, Bram Moolenaar <Br...@moolenaar.net> wrote:
> Yes, this code just returns -1, no matter if the first or second string
> is bigger.
>
> Your other remark about difference in byte length of a character is
> right, but it's not so easy to fix.  Can you suggest a patch?
> Preferably with a test.

Hi, here's my patch for mbyte.c and a few testcases.

I've eliminated those return -1's by doing a bytewise comparison of
strings after the first corrupted character. This should make the
comparisons transitive at least.

mbyte.patch
strnicmp.test.vim

Bram Moolenaar

unread,
May 25, 2011, 3:18:24 PM5/25/11
to Ivan Krasilnikov, vim...@googlegroups.com

Ivan Krasilnikov wrote:

> On Wed, May 25, 2011 at 14:09, Bram Moolenaar <Br...@moolenaar.net> wrote:
> > Yes, this code just returns -1, no matter if the first or second string
> > is bigger.
> >
> > Your other remark about difference in byte length of a character is

> > right, but it's not so easy to fix. =A0Can you suggest a patch?


> > Preferably with a test.
>
> Hi, here's my patch for mbyte.c and a few testcases.
>
> I've eliminated those return -1's by doing a bytewise comparison of
> strings after the first corrupted character. This should make the
> comparisons transitive at least.

Thanks, I'll look into it soon.

--
hundred-and-one symptoms of being an internet addict:

113. You are asked about a bus schedule, you wonder if it is 16 or 32 bits.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Ivan Krasilnikov

unread,
May 26, 2011, 11:10:34 PM5/26/11
to vim...@googlegroups.com, vim...@googlegroups.com
On Wed, May 25, 2011 at 22:39, Ivan Krasilnikov <inf...@gmail.com> wrote:
> Hi, here's my patch for mbyte.c and a few testcases.
>
> I've eliminated those return -1's by doing a bytewise comparison of
> strings after the first corrupted character. This should make the
> comparisons transitive at least.
>

Had a bug in the patch - incorrectly checked for utf_ptr2char()'s
failure. Fixed patch and more tests in vimscript, suitable for
src/testdir/, are attached.

mbyte2.patch
Reply all
Reply to author
Forward
0 new messages