Re: How to match a Chinese character in VIM?

56 views
Skip to first unread message

Tony Mechelynck

unread,
Aug 23, 2008, 9:03:43 PM8/23/08
to Vim Developers, vim...@googlegroups.com, Bram Moolenaar
On 24/08/08 01:58, .幻想之诚 wrote:
> Is that a bug, Tony?

I think so; but I suppose Bram would know better than I what causes it.

A little trial and error seems to indicate that if the lower and upper
bounds differ by 256 or less (i.e. if there are at most 257 values in
range) the search proceeds OK, but if they differ by more than that we
get E16: Invalid range.

For instance:
/[一-伀]
(4E00 to 4F00) OK
/[一-企]
(4E00 to 4F01) error
/[丐-伐]
(4E10 to 4F10) OK
/[丐-休]
(4E10 to 4F11) error

I'm (trying to) redirect this thread to the vim_dev list.

Best regards,
Tony.

>
> On Sun, Aug 24, 2008 at 12:06 AM, Tony Mechelynck<
> antoine.m...@gmail.com> wrote:
>
>> On 23/08/08 11:24, Anton Sharonov wrote:
>>> I can reproduce this as well (vim 7.2 patches 1-2, Linux OpenSUSE
>>> 10.2, big version with gtk2, +multi_byte +multi_lang,
>>> fenc=enc=utf8), neither of both patterns works:
>>>
>>> /[\u4e00-\u9fa5]
>>> /[不-限]
>>>
>>> But what is interesting, for characters which are belongs to
>>> russian unicode area it seems to be ok, so following is working
>>> properly:
>>>
>>> /[\u0430-\u044f]
>>> /[а-я]
>>>
>>> Normal search for Chinese characters, single or in group, seems
>>> as well to be ok, so following working properly as well:
>>>
>>> /不
>>> /限
>>> /不局限
>>>
>>>
>>> Sample text:
>>>
>> ----------------------------------------------------------------------------
>>> русский текст
>>> вим - это класс
>>> 算法并不局限于计算机和网络
>>>
>> ----------------------------------------------------------------------------
>>> Anton.
>> I confirm the above, but what puzzles me even more is that with a
>> smaller range, such as
>>
>> /[一-丅]
>>
>> i.e. U+4E00 to U+4E05, the search works correctly, with no error.
>>
>>
>> Best regards,
>> Tony.
>> --
>> The basic idea behind malls is that they are more convenient than
>> cities. Cities contain streets, which are dangerous and crowded and
>> difficult to park in. Malls, on the other hand, have parking lots,
>> which are also dangerous and crowded and difficult to park in, but --
>> here is the big difference -- in mall parking lots, THERE ARE NO
>> RULES. You're allowed to do anything. You can drive as fast as you
>> want in any direction you want. I was once driving in a mall parking
>> lot when my car was struck by a pickup truck being driven backward by a
>> squat man with a tattoo that said "Charlie" on his forearm, who got out
>> and explained to me, in great detail, why the accident was my fault,
>> his reasoning being that he was violent and muscular, whereas I was
>> neither. This kind of reasoning is legally valid in mall parking
>> lots.
>> -- Dave Barry, "Christmas Shopping: A Survivor's Guide"
>>
>
> >

Bram Moolenaar

unread,
Aug 24, 2008, 4:39:55 PM8/24/08
to Vim Developers, vim...@googlegroups.com

Tony Mechelynck wrote:

> On 24/08/08 01:58, .幻想之诚 wrote:
> > Is that a bug, Tony?
>
> I think so; but I suppose Bram would know better than I what causes it.
>
> A little trial and error seems to indicate that if the lower and upper
> bounds differ by 256 or less (i.e. if there are at most 257 values in
> range) the search proceeds OK, but if they differ by more than that we
> get E16: Invalid range.
>
> For instance:
> /[一-伀]
> (4E00 to 4F00) OK
> /[一-企]
> (4E00 to 4F01) error
> /[丐-伐]
> (4E10 to 4F10) OK
> /[丐-休]
> (4E10 to 4F11) error
>
> I'm (trying to) redirect this thread to the vim_dev list.

Currently only ranges of 256 characters are supported. There probably
is a todo item somewhere to support more. It's not all that easy to
implement.

--
No engineer can take a shower without wondering if some sort of Teflon coating
would make showering unnecessary.
(Scott Adams - The Dilbert principle)

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ download, build and distribute -- http://www.A-A-P.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Tony Mechelynck

unread,
Aug 24, 2008, 5:24:26 PM8/24/08
to vim...@googlegroups.com, Vim Developers
On 24/08/08 22:39, Bram Moolenaar wrote:
>
> Tony Mechelynck wrote:
>
>> On 24/08/08 01:58, .幻想之诚 wrote:
>>> Is that a bug, Tony?
>> I think so; but I suppose Bram would know better than I what causes it.
>>
>> A little trial and error seems to indicate that if the lower and upper
>> bounds differ by 256 or less (i.e. if there are at most 257 values in
>> range) the search proceeds OK, but if they differ by more than that we
>> get E16: Invalid range.
>>
>> For instance:
>> /[一-伀]
>> (4E00 to 4F00) OK
>> /[一-企]
>> (4E00 to 4F01) error
>> /[丐-伐]
>> (4E10 to 4F10) OK
>> /[丐-休]
>> (4E10 to 4F11) error
>>
>> I'm (trying to) redirect this thread to the vim_dev list.
>
> Currently only ranges of 256 characters are supported. There probably
> is a todo item somewhere to support more. It's not all that easy to
> implement.
>

257 actually (from 0 to 100 is a total of 101 values). What would happen
if the if statement spotted by François were removed? Or changed to a
higher maximum value such as 1000 or 10000 ("endc > startc+999" or "endc
> startc+9999")?

Or would it be better to change the logic so that [0-9] is not
implemented as "0 or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9" but as
"not less than 0 and not greater than 9" (with, if necessary, the
appropriate "special handling" for alpha-only ranges on EBCDIC)? If it
could be done, I expect this would make the code more efficient in terms
of both speed and memory use when handling large or even moderate ranges.


Best regards,
Tony.
--
PISCES (Feb. 19 - Mar. 20)
You have a vivid imagination and often think you are being
followed by the CIA or FBI. You have minor influence over your
associates and people resent your flaunting of your power. You lack
confidence and you are generally a coward. Pisces people do terrible
things to small animals.

Reply all
Reply to author
Forward
0 new messages