Patch 8.2.2654

Bram Moolenaar

unread,

Mar 26, 2021, 8:34:50 AM3/26/21

to vim...@googlegroups.com

Patch 8.2.2654
Problem: Vim9: getting a character from a string can be slow.
Solution: Avoid a function call to get the character byte size. (#8000)
Files: src/vim9execute.vim

*** ../vim-8.2.2653/src/vim9execute.c 2021-03-25 21:12:09.902618807 +0100
--- src/vim9execute.c 2021-03-26 13:31:40.773137855 +0100
***************
*** 1067,1079 ****
return NULL;
slen = STRLEN(str);

! // do the same as for a list: a negative index counts from the end
if (index < 0)
{
int clen = 0;

for (nbyte = 0; nbyte < slen; ++clen)
! nbyte += mb_ptr2len(str + nbyte);
nchar = clen + index;
if (nchar < 0)
// unlike list: index out of range results in empty string
--- 1067,1088 ----
return NULL;
slen = STRLEN(str);

! // Do the same as for a list: a negative index counts from the end.
! // Optimization to check the first byte to be below 0x80 (and no composing
! // character follows) makes this a lot faster.
if (index < 0)
{
int clen = 0;

for (nbyte = 0; nbyte < slen; ++clen)
! {
! if (str[nbyte] < 0x80 && str[nbyte + 1] < 0x80)
! ++nbyte;
! else if (enc_utf8)
! nbyte += utfc_ptr2len(str + nbyte);
! else
! nbyte += mb_ptr2len(str + nbyte);
! }
nchar = clen + index;
if (nchar < 0)
// unlike list: index out of range results in empty string
***************
*** 1081,1087 ****
}

for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar)
! nbyte += mb_ptr2len(str + nbyte);
if (nbyte >= slen)
return NULL;
return vim_strnsave(str + nbyte, mb_ptr2len(str + nbyte));
--- 1090,1103 ----
}

for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar)
! {
! if (str[nbyte] < 0x80 && str[nbyte + 1] < 0x80)
! ++nbyte;
! else if (enc_utf8)
! nbyte += utfc_ptr2len(str + nbyte);
! else
! nbyte += mb_ptr2len(str + nbyte);
! }
if (nbyte >= slen)
return NULL;
return vim_strnsave(str + nbyte, mb_ptr2len(str + nbyte));
*** ../vim-8.2.2653/src/version.c 2021-03-25 22:22:26.490934354 +0100
--- src/version.c 2021-03-26 13:32:50.708982626 +0100
***************
*** 752,753 ****
--- 752,755 ----
{ /* Add new patch number below this line */
+ /**/
+ 2654,
/**/

--
How To Keep A Healthy Level Of Insanity:
9. As often as possible, skip rather than walk.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Dominique Pellé

unread,

Mar 26, 2021, 10:17:30 AM3/26/21

to vim_dev

Bram Moolenaar wrote:

> Patch 8.2.2654
> Problem: Vim9: getting a character from a string can be slow.
> Solution: Avoid a function call to get the character byte size. (#8000)
> Files: src/vim9execute.vim

...snip...

> for (nbyte = 0; nbyte < slen; ++clen)
> ! {
> ! if (str[nbyte] < 0x80 && str[nbyte + 1] < 0x80)
> ! ++nbyte;
> ! else if (enc_utf8)
> ! nbyte += utfc_ptr2len(str + nbyte);
> ! else
> ! nbyte += mb_ptr2len(str + nbyte);
> ! }

Is this correct? I would have thought that the following line is
correct for utf8 encoding but not for all other encodings:

if (str[nbyte] < 0x80 && str[nbyte + 1] < 0x80)

So I think that loop should rather be something like this:

if (enc_utf8)

for (nbyte = 0; nbyte < slen; ++clen)

{

if (str[nbyte] < 0x80 && str[nbyte + 1] < 0x80)

++nbyte;
else

nbyte += utfc_ptr2len(str + nbyte);
}

else

for (nbyte = 0; nbyte < slen; ++clen)

nbyte += mb_ptr2len(str + nbyte);

Regards
Dominique

Bram Moolenaar

unread,

Mar 26, 2021, 10:43:59 AM3/26/21

to vim...@googlegroups.com, Dominique Pellé

You can compare with some benchmark what the difference is. I would
think that enc_utf8 is cached and the choice between utfc_ptr2len() and
mb_ptr2len() doesn't make a measurable difference. Not worth for
duplicating the loop.

AFAIK there is no encoding where a character with a first byte below
0x80 takes more than one byte.