[vim/vim] Vim9: iterating over characters in long string is slow (#8000)

78 views
Skip to first unread message

lacygoill

unread,
Mar 23, 2021, 5:35:08 PM3/23/21
to vim/vim, Subscribed

Describe the bug

In Vim9 script, iterating over characters in a long string is slow.

To Reproduce

Run this shell command:

vim -Nu NONE -S <(cat <<'EOF'
    vim9script
    var n: number = 100'000
    var data: string = repeat('x', n)
    def Func()
        var i: number = 0
        var c: string
        while i < n
            c = data[i]
            i += 1
        endwhile
    enddef
    var time = reltime()
    Func()
    setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
EOF
)

On my machine, this is written in the buffer:

14.956 seconds to run Func()

Now, run this other shell command:

vim -Nu NONE -S <(cat <<'EOF'
    scriptversion 4
    let s:n = 100'000
    let s:data = repeat('x', s:n)->split('\zs')
    fu Func()
        let i = 0
        while i < s:n
            let c = s:data[i]
            let i += 1
        endwhile
    endfu
    let s:time = reltime()
    call Func()
    call setline(1, reltime(s:time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
EOF
)

On my machine, this is written in the buffer:

0.512 seconds to run Func()

Expected behavior

Since both commands run the same code, one in Vim9 script, the other in legacy Vim script, I would expect the former to be faster than the latter, and not the other way around.

Environment

  • Vim version: 8.2 Included patches: 1-2646
  • OS: Ubuntu 16.04.7 LTS
  • Terminal: xterm(366)

Additional context

I know that the [i] subscript has a different meaning in legacy and in Vim9; the former interprets it as a byte index, the latter as a character index. But here, since all the characters are simple x, and only need 1 byte, that shouldn't make a difference (right?).


When iterating over a list, the results are different:

vim9script
var n: number = 100'000
var data: list<string> = repeat(['x'], n)
def Func()
    var i: number = 0
    var c: string
    while i < n
        c = data[i]
        i += 1
    endwhile
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
0.022 seconds to run Func()
scriptversion 4
let s:n = 100'000
let s:data = repeat(['x'], s:n)
fu Func()
    let i = 0
    while i < s:n
        let c = s:data[i]
        let i += 1
    endwhile
endfu
let s:time = reltime()
call Func()
call setline(1, reltime(s:time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
0.499 seconds to run Func()

This time, Vim9 is more than 20 times faster; I would have expected similar results when iterating over a string.


Possible relevant todo item:

https://github.com/vim/vim/blob/c54f347d63bcca97ead673d01ac6b59914bb04e5/runtime/doc/todo.txt#L159-L161

Quoted for people reading on the mailing list:

  • Make accessing varargs faster: arg[expr]
    EVAL expr
    LOADVARARG (varargs idx)

Although, why would this future optimization have a greater effect when arg is a string, than when it's a list?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

lacygoill

unread,
Mar 23, 2021, 5:43:15 PM3/23/21
to vim/vim, Subscribed

I made a mistake in one of the snippets, but it doesn't change the overall observations. I updated the OP.

Gary Johnson

unread,
Mar 23, 2021, 6:03:20 PM3/23/21
to reply+ACY5DGDRY2YQVOC46B...@reply.github.com, vim...@googlegroups.com
On 2021-03-23, lacygoill wrote:
> I made a mistake in one of the snippets, but it doesn't change the overall
> observations. I updated the OP.

Do updates like that make it to the mailing list? I haven't seen
anything in this thread but two messages from you: the original
post and the message in-line above.

Regards,
Gary

vim-dev ML

unread,
Mar 23, 2021, 6:03:39 PM3/23/21
to vim/vim, vim-dev ML, Your activity

lacygoill

unread,
Mar 23, 2021, 6:08:43 PM3/23/21
to vim/vim, vim-dev ML, Comment

No, I don't think they do. That's why usually, I try to avoid updating my posts, and just provide new information in a separate post. Although, here, I don't think it's important; originally, the second snippet was iterating over a list of strings, which made it look like Vim script legacy was faster than it really was. But again, even once the snippet is fixed, Vim script legacy is still – unexpectedly – faster.


You are receiving this because you commented.

Gary Johnson

unread,
Mar 23, 2021, 6:55:55 PM3/23/21
to reply+ACY5DGHFT4KHNSUWB2...@reply.github.com, vim...@googlegroups.com
On 2021-03-23, lacygoill wrote:
> No, I don't think they do. That's why usually, I try to avoid updating my
> posts, and just provide new information in a separate post. Although, here, I
> don't think it's important; originally, the second snippet was iterating over a
> list of strings, which made it look like Vim script legacy was faster than it
> really was. But again, even once the snippet is fixed, Vim script legacy is
> still – unexpectedly – faster.

Got it. Thanks.

Regards,
Gary

vim-dev ML

unread,
Mar 23, 2021, 6:56:18 PM3/23/21
to vim/vim, vim-dev ML, Your activity

Dominique Pellé

unread,
Mar 23, 2021, 7:17:35 PM3/23/21
to vim/vim, vim-dev ML, Comment

@lacygoill wrote:

I know that the [i] subscript has a different meaning in legacy
and in Vim9; the former interprets it as a byte index, the latter
as a character index. But here, since all the characters are
simple x, and only need 1 byte, that shouldn't make a
difference (right?).

It makes a big difference. For vim9 string, accessing
data[i] has complexity O(i). There is no random access
because characters are potentially multi-bytes.
It could become O(1) if strings were stored as a vector
of uint32_t (i.e. in UTF-32) but strings would then
generally use 4 times more memory if we assume that
1 byte characters are most frequent.

So the following loop...

    while i < s:n
        let c = s:data[i]
        let i += 1
    endwhile

... is actually 2 nested loop internally:

  • outer loop with i going from 0 to n
  • inner loop from 0 to i (when doing data[i])

It has thus complexity O(n^2), hence slow for large value of n.

Here is the timing with various values of n on my machine:

n         time (sec)
------  ------
100000  14.467
 50000   3.627
 25000   0.912
 12500   0.226
14.467sec

We can see that time is multiplied by 4 when doubling n,
which confirms the quadratic complexity.

It could become much faster i.e. O(n) instead of O(n^2) if we could instead
loop on each characters with a for loop on a string as in python like this:

for c in s:data
  ...
endfor

But I see that it currently gives E714: List required (for loop is only
possible on lists, not on strings)


You are receiving this because you commented.

lacygoill

unread,
Mar 23, 2021, 7:52:41 PM3/23/21
to vim/vim, vim-dev ML, Comment

I see, thank you for the clear explanation. Accessing a character from a string is necessarily slower than accessing a byte, because you don't know in advance how many bytes each of the previous characters need. So – in the general case – there's no simple way to derive the byte offset just from i; and since [i] means the i-th character in Vim9 while the i-th byte in legacy, it can be much more costly in Vim9. That makes sense.

I don't really mind the current results. If I had to iterate over the characters, I would first split the string with ->split('\zs'), and Vim9 would then be much faster than legacy.

I still leave the issue open to let the devs decide whether there is something to improve. Either in the documentation, by warning the user that when they iterate over the characters in a string, they should first split it if efficiency is a concern. And/or maybe in the code; like you suggested:

It could become much faster i.e. O(n) instead of O(n^2) if we could instead
loop on each characters with a for loop on a string as in python like this:

for c in s:data

  ...

endfor

But I see that it currently gives E714: List required (for loop is only
possible on lists, not on strings)

... maybe Vim could compile:

for c in s:data

... into a sequence of instructions equivalent to:

for c in s:data->split('\zs')

... when s:data is a string.


You are receiving this because you commented.

LemonBoy

unread,
Mar 24, 2021, 5:27:08 AM3/24/21
to vim/vim, vim-dev ML, Comment

The code in char_from_string is quite slow, it's calling mb_ptr2len in a loop that's a function pointer depending on the chosen encoding value and this slows everything down by a considerable amount.

The trick here is to hide the non-constant complexity of the indexing operation by optimizing the common case of ASCII chars.
Eg. you can slash the script runtime from 28s to 11s with this simple patch (that's not correct as it assumes the input to be utf8 encoded):

diff --git a/src/vim9execute.c b/src/vim9execute.c
index 6b8cd075c..6b5bdaf01 100644
--- a/src/vim9execute.c
+++ b/src/vim9execute.c
@@ -1018,8 +1018,9 @@ char_from_string(char_u *str, varnumber_T index)
 	    return NULL;
     }
 
-    for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar)
-	nbyte += mb_ptr2len(str + nbyte);
+    for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar) {
+	nbyte += (str[nbyte] < 0x80) ? 1 : mb_ptr2len(str + nbyte);
+    }
     if (nbyte >= slen)
 	return NULL;
     return vim_strnsave(str + nbyte, mb_ptr2len(str + nbyte));

A 2x speedup can be easily obtained by avoiding the function call even though the loop body now contains a conditional branch (!), that's how heavy the function call is.


You are receiving this because you commented.

Bram Moolenaar

unread,
Mar 24, 2021, 5:21:36 PM3/24/21
to vim/vim, vim-dev ML, Comment

I can think of two solutions.

  1. Add a data type for a string with 32-bit characters. Conversion to/from a normal string would need to be added. And then we'll get a bunch of requests to add more functions to manipulate this new data type.
  2. make "for c in text" work. This is not that difficult, but only sequential access is possible

Considering that using very long strings is rare, I don't think it justifies adding a new data type and all kinds of support functions.

For small changes to make it faster: please do compile Vim with the optimizer, otherwise it may not give a good indication.


You are receiving this because you commented.

lacygoill

unread,
Mar 24, 2021, 5:27:58 PM3/24/21
to vim/vim, vim-dev ML, Comment

make "for c in text" work. This is not that difficult, but only sequential access is possible

Yes, it would be a nice syntax to use.

Considering that using very long strings is rare, I don't think it justifies adding a new data type and all kinds of support functions.

FWIW, I agree.

For small changes to make it faster: please do compile Vim with the optimizer, otherwise it may not give a good indication.

I don't understand. By default, the Makefile already enables the optimizations with something like -O2, right? I don't disable them. That's how my script always configures the Vim binary:

./configure  \
  --enable-fail-if-missing       \
  --enable-gui=gtk2              \
  --enable-python3interp=dynamic \
  --prefix=/usr/local            \
  --with-compiledby=user


You are receiving this because you commented.

lacygoill

unread,
Mar 25, 2021, 6:47:19 AM3/25/21
to vim/vim, vim-dev ML, Comment

I don't understand.

Ah, that was for the previous comment; my bad.


You are receiving this because you commented.

Marius Gedminas

unread,
Mar 25, 2021, 7:43:41 AM3/25/21
to vim/vim, vim-dev ML, Comment

For the special case of UTF-8, caching could be a solution: store the (char_index, byte_offset) pair of the last access in the string, and if the next access has a small char_index difference, start looping from the previous byte_offset.

UTF-8 is self-synchronizing: the start byte of any character has its MSB set to 0, so you can iterate backwards and forwards from any position.


You are receiving this because you commented.

Bram Moolenaar

unread,
Mar 25, 2021, 2:28:20 PM3/25/21
to vim...@googlegroups.com, Marius Gedminas

Marius Gedminas wrote:

> For the special case of UTF-8, caching could be a solution: store the
> (char_index, byte_offset) pair of the last access in the string, and
> if the next access has a small char_index difference, start looping
> from the previous byte_offset.
>
> UTF-8 is self-synchronizing: the start byte of any character has its
> MSB set to 0, so you can iterate backwards and forwards from any
> position.

Yes, this is what is done for a list, the last accessed item and its
index is remembered.

For a string it requires an extra allocation for the structure that
stores the cached values. That causes overhead when it's not used.

I wonder if we can put the cached values in a list, so that we don't
need to add a new data type. Something like:

l = [0]
c = char_forward(text, l)
" now l is [3] for a 3-byte character

Adding the character index might be useful:

l = [0, 0]
c = char_forward(text, l)
" now l is [1, 3] for a 3-byte character

Might want a function to get the character as a number and a function to
get the character as a string (which can then also include composing
characters). Although just using a string would be sufficient, there
are other functions to get the character as a number.


--
How To Keep A Healthy Level Of Insanity:
3. Every time someone asks you to do something, ask if they want fries
with that.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

LemonBoy

unread,
Mar 26, 2021, 5:01:27 AM3/26/21
to vim/vim, vim-dev ML, Comment

For small changes to make it faster: please do compile Vim with the optimizer, otherwise it may not give a good indication.

The results are obtained using gcc 8.3.0 and the default -O2 optimization level. This said the optimizer cannot do much here as the chosen ptr2len function depends on a runtime value that's also coming from a different compilation unit, the only way to speed up the loop is to either avoid the function call or to inline the counting function.

A cache for the last-accessed position wouldn't really help for anything beside sequential accesses, that's the objective I think making for c in text is a better solution.


You are receiving this because you commented.

Dominique Pellé

unread,
Mar 26, 2021, 8:03:38 AM3/26/21
to vim/vim, vim-dev ML, Comment

@LemonBoy wrote:

the optimizer cannot do much here as the chosen ptr2len function depends on
a runtime value that's also coming from a different compilation unit, the only
way to speed up the loop is to either avoid the function call or to inline the counting function.

Right. Unless we do something like this following patch and compile with -O2 -flto.
The -flto option (which must be set in CFLAGS and LDFLAGS) is for link time optimization.
The compiler can then see the entire source code of vim when compiling at link time and can
thus inline functions even when they are defined in another compilation unit. Even if it does
not inline, it can still perform some optimizations such as removing some redundant 'if', etc.

diff --git a/src/vim9execute.c b/src/vim9execute.c
index 1ae17d97b..ffad7df88 100644
--- a/src/vim9execute.c
+++ b/src/vim9execute.c
@@ -1080,8 +1080,13 @@ char_from_string(char_u *str, varnumber_T index)
            return NULL;
     }
 
-    for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar)
-       nbyte += mb_ptr2len(str + nbyte);
+    if (enc_utf8)
+       // Optimize for utf8 (most frequent encoding nowadays).
+       for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar)
+           nbyte += utfc_ptr2len(str + nbyte);
+    else
+       for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar)
+           nbyte += mb_ptr2len(str + nbyte);
     if (nbyte >= slen)
        return NULL;
     return vim_strnsave(str + nbyte, mb_ptr2len(str + nbyte));

But we need to benchmark to confirm whether it helps or hurts.
Here is a benchmark, using gcc-10.1 and vim-8.2.2563 with this command:

$ vim -Nu NONE -S <(cat <<'EOF'
    vim9script
    var n: number = 100'000
    var data: string = repeat('x', n)
    def Func()
        var i: number = 0
        var c: string
        while i < n
            c = data[i]
            i += 1
        endwhile
    enddef
    var time = reltime()
    Func()
    setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
EOF
)

I did 3 measurements for each configuration to make sure timing is sufficiently consistent:

Configuration          Time (sec), 3 measurements
------------------     --------------------------
pristine -O2           16.221  16.233  16.232
patched  -O2           13.038  13.083  13.080

pristine -O2 -flto     16.205  16.218  16.323
patched  -O2 -flto     14.486  14.534  15.547

So the patch speeds up when using -O2 (19.6% faster with -O2, 10.6% faster with -O2 -flto).
-flto does not help to speed up. It even hurts performance when using the patch.

The speed up of this patch is too small in my opinion to justify the uglyness of duplicating the for loop.


You are receiving this because you commented.

Bram Moolenaar

unread,
Mar 26, 2021, 8:19:55 AM3/26/21
to vim/vim, vim-dev ML, Comment

My previous reply to Marius only went to the mailing list:

Yes, this is what is done for a list, the last accessed item and its
index is remembered.

For a string it requires an extra allocation for the structure that
stores the cached values. That causes overhead when it's not used.

I wonder if we can put the cached values in a list, so that we don't
need to add a new data type. Something like:

l = [0]
c = char_forward(text, l)
" now l is [3] for a 3-byte character

Adding the character index might be useful:

l = [0, 0]
c = char_forward(text, l)
" now l is [1, 3] for a 3-byte character

Might want a function to get the character as a number and a function to
get the character as a string (which can then also include composing
characters). Although just using a string would be sufficient, there
are other functions to get the character as a number.


You are receiving this because you commented.

LemonBoy

unread,
Mar 26, 2021, 8:27:45 AM3/26/21
to vim/vim, vim-dev ML, Comment

LTO won't do much here, the only way I found to achieve a consistent speed-up is to apply a patch similar to the one you've applied:

diff --git a/src/vim9execute.c b/src/vim9execute.c
index 6b8cd075c..2bb957a30 100644
--- a/src/vim9execute.c
+++ b/src/vim9execute.c
@@ -1018,8 +1018,16 @@ char_from_string(char_u *str, varnumber_T index)
 	    return NULL;
     }
 
-    for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar)
-	nbyte += mb_ptr2len(str + nbyte);
+    if (enc_utf8)
+	for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar) {
+	    if (str[nbyte] < 0x80) nbyte += 1;
+	    else nbyte += utfc_ptr2len(str + nbyte);
+	}
+    else
+	for (nbyte = 0; nchar > 0 && nbyte < slen; --nchar) {
+	    nbyte += mb_ptr2len(str + nbyte);
+	}
+
     if (nbyte >= slen)
 	return NULL;
     return vim_strnsave(str + nbyte, mb_ptr2len(str + nbyte));

It's not the most elegant code out there but slashes the runtime in half.
Another possibility would be to use the encoding value set at compilation time rather than the one set at runtime, allowing the vimscript compiler to generate specialized indexing opcodes for utf8 and leave the uncommon cases fall into the slow path.


You are receiving this because you commented.

Bram Moolenaar

unread,
Mar 26, 2021, 8:29:44 AM3/26/21
to vim/vim, vim-dev ML, Comment

I was trying that, but it makes the test fail, because an ASCII character does not check for a following composing character.
That can be checked for, and I assume it's still fast, because it's still a CPU-local operation.


You are receiving this because you commented.

Bram Moolenaar

unread,
Mar 26, 2021, 8:36:15 AM3/26/21
to vim/vim, vim-dev ML, Comment

I made a patch to make it more efficient, 8.2.2654. I haven't done profiling, feel free to see if you can further improve it.


You are receiving this because you commented.

lacygoill

unread,
Apr 8, 2021, 4:19:58 AM4/8/21
to vim/vim, vim-dev ML, Comment

The patch 8.2.2654 made the code much faster:

vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func()
    var i: number = 0
    var c: string
    while i < n
        c = data[i]
        i += 1
    endwhile
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
7.106 seconds to run Func()

On my machine, it went from 15 seconds down to 7 seconds.


The patch 8.2.2658 which introduces a new syntax allowing us to iterate directly over a string makes it a little faster:

vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func
()
    for c in data
    endfor
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
6.343 seconds to run Func()

It goes down to almost 6 seconds.


But overall, it's still much slower than splitting the string manually first:

vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func
()
    for c in data->split('\zs')
    endfor
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
0.088 seconds to run Func()

That's more than 70 faster.

Here's how Vim compiles the slow Func(), as reported by :disassemble:

vim -es -Nu NONE -S <(cat <<'EOF'
    vim9script
    var n: number = 100'000
    var data: string = repeat('x', n)
    def Func()
        for c in data
        endfor
    enddef
    pu =execute('disa Func')
    :%p
    qa!
EOF
)

0 STORE -1 in $0
1 LOADSCRIPT data-1 from /proc/5131/fd/11
2 FOR $0 -> 5
3 STORE $1
4 JUMP -> 2
5 DROP
6 RETURN 0

Here's how Vim compiles the fast Func():

vim -es -Nu NONE -S <(cat <<'EOF'
    vim9script
    var n: number = 100'000
    var data: string = repeat('x', n)
    def Func()
        for c in data->split('\zs')
        endfor
    enddef
    pu =execute('disa Func')
    :%p
    qa!
EOF
)

0 STORE -1 in $0
1 LOADSCRIPT data-1 from /proc/5416/fd/11
2 PUSHS "\zs"
3 BCALL split(argc 2)
4 FOR $0 -> 7
5 STORE $1
6 JUMP -> 4
7 DROP
8 RETURN 0

The only difference are the instructions 2 and 3:

2 PUSHS "\zs"
3 BCALL split(argc 2)

When we iterate over a string in a :def function, at compile time, could Vim automatically insert those 2 instructions, so that we get the x70 boost without having to invoke split()?


You are receiving this because you commented.

Christian Brabandt

unread,
Apr 8, 2021, 4:34:59 AM4/8/21
to vim/vim, vim-dev ML, Comment

hm, i would expect that splitting the string first into a list, should make it slower. Perhaps storing the character takes time? Perhaps some profiling would be needed to see what is the slow path... BTW: how does split('\zs') handle combining characters?


You are receiving this because you commented.

lacygoill

unread,
Apr 8, 2021, 4:53:45 AM4/8/21
to vim/vim, vim-dev ML, Comment

BTW: how does split('\zs') handle combining characters?

It seems to work as I would expect:

vim9script

def Func()

    var data: string = 'EDIT̞OR'

    for c in data->split('\zs'
)

        echo c

    endfor

enddef

Func()
Ë́

D͗̃

Iͯ̀

T̞

Ŏ̆

R̓͆

That is, the composing characters are not indexed separately; they are kept with their respective base characters.


You are receiving this because you commented.

lacygoill

unread,
Apr 8, 2021, 6:41:11 AM4/8/21
to vim/vim, vim-dev ML, Comment

Perhaps some profiling would be needed to see what is the slow path...

Here are some profiling logs obtained with gprof(1):

But I don't know how to interpret the results yet.


You are receiving this because you commented.

LemonBoy

unread,
Apr 8, 2021, 6:55:42 AM4/8/21
to vim/vim, vim-dev ML, Comment

The FOR opcode is also calling strlen every time it's executed, making it slower than necessary (but robust against modifications of the string it's iterating over)


You are receiving this because you commented.

lacygoill

unread,
Apr 8, 2021, 7:11:15 AM4/8/21
to vim/vim, vim-dev ML, Comment

Not sure it's relevant, but Vim9 does not provide any guarantee that the code will work as expected if the list which is being iterated over is modified. If that's an issue, it's recommended to make a copy first:

Legacy Vim script has some tricks to make a for loop over a list handle
deleting items at the current or previous item. In Vim9 script it just uses
the index, if items are deleted then items in the list will be skipped.
Example legacy script: >
let l = [1, 2, 3, 4]
for i in l
echo i
call remove(l, index(l, i))
endfor
Would echo:
1
2
3
4
In compiled Vim9 script you get:
1
3
Generally, you should not change the list that is iterated over. Make a copy
first if needed.

Assuming that the current implementation of for c in string does provide this guarantee, maybe we could drop it to get better performance? It would be more consistent with lists anyway.

We could document this, and recommend to make a copy first if the user intends to modify the string while it's being iterated over. I would argue that the latter use is a corner case, and that the general performance are more important.

Dominique Pellé

unread,
Apr 8, 2021, 7:26:11 AM4/8/21
to vim/vim, vim-dev ML, Comment

@lacygoill wrote:

Here are some profiling logs obtained with gprof(1):

Something is wrong with the gprof files: 100% of the time is spent in call_def_function.
That's not right. I'll find time later to run gprof myself later and hopefully I get more realistic profiling.

lacygoill

unread,
Apr 8, 2021, 7:31:24 AM4/8/21
to vim/vim, vim-dev ML, Comment

That's not right. I'll find time later to run gprof myself later and hopefully I get more realistic profiling.

Thanks. For reference, here's what I think I did:

  • in src/Makefile, uncomment line 660:

    #PROFILE_CFLAGS = -pg -g -DWE_ARE_PROFILING
    
  • in src/Makefile, uncomment line 664:

    #PROFILE_LIBS = -pg -no-pie
    
  • ./configure and make

  • ./src/vim -Nu NONE -S /tmp/t.vim

  • gprof ./src/vim ./gmon.out

Christian Brabandt

unread,
Apr 8, 2021, 7:47:20 AM4/8/21
to vim/vim, vim-dev ML, Comment

Something is wrong with the gprof files: 100% of the time is spent in call_def_function.

Isn't that expected, since we are running the vim9 function for about 7 seconds?

                6.15    0.00       1/1           call_user_func [8]
[7]    100.0    6.15    0.00       1         call_def_function [7]
                0.00    0.00  100010/100039      clear_tv <cycle 4> [19]
                0.00    0.00  100000/103596      utfc_ptr2len [16]
                0.00    0.00  100000/100682      vim_strnsave [18]
                0.00    0.00    3000/3061        line_breakcheck [22]
                0.00    0.00       3/521         ga_init2 [39]
                0.00    0.00       3/7           ui_breakcheck_force [221]
                0.00    0.00       3/7           ui_breakcheck [220]
                0.00    0.00       2/105033      vim_free [15]
                0.00    0.00       1/1           func_needs_compiling [611]
                0.00    0.00       1/1           funcdepth_increment [614]
                0.00    0.00       1/1           funcdepth_get [613]
                0.00    0.00       1/1           compile_def_function [552]
                0.00    0.00       1/532         ga_grow_inner [38]
                0.00    0.00       1/2753        ga_grow [23]
                0.00    0.00       1/1           funcdepth_restore [615]
                0.00    0.00       1/2           estack_push_ufunc [413]
                0.00    0.00       1/2           estack_pop [412]
                0.00    0.00       1/1           call_dfunc [528]
                0.00    0.00       1/4           copy_tv [271]

Bram Moolenaar

unread,
Apr 8, 2021, 7:58:17 AM4/8/21
to vim/vim, vim-dev ML, Comment


> The patch [8.2.2654](https://github.com/vim/vim/releases/tag/v8.2.2654) made the code much faster:
> ```vim

> vim9script
> var n: number = 100'000
> var data: string = repeat('x', n)
> def Func()
> var i: number = 0
> var c: string
> while i < n
> c = data[i]
> i += 1
> endwhile
> enddef
> var time = reltime()
> Func()
> setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
> ```
> 7.106 seconds to run Func()
>
> On my machine, it went from 15 seconds down to 7 seconds.

I get about 21 seconds for using the while loop.

> The patch [8.2.2658](https://github.com/vim/vim/releases/tag/v8.2.2658) which introduces a new syntax allowing us to iterate directly over a string makes it a little faster:
> ```vim

> vim9script
> var n: number = 100'000
> var data: string = repeat('x', n)
> def Func()
> for c in data
> endfor
> enddef
> var time = reltime()
> Func()
> setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
> ```
> 6.343 seconds to run Func()
>
> It goes down to almost 6 seconds.

Not sure what you did here, for me it goes down to about 0.1 second.
Please try this again.


> But overall, it's still much slower than splitting the string manually first:
> ```vim

> vim9script
> var n: number = 100'000
> var data: string = repeat('x', n)
> def Func()
> for c in data->split('\zs')
> endfor
> enddef
> var time = reltime()
> Func()
> setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
> ```
> 0.088 seconds to run Func()
>
> That's more than 70 faster.

I get about 0.09 seconds here. As Christian mentioned, I'm surprised
that splitting and creating a list with each individual character is
this fast.

> Here's how Vim compiles the slow `Func()`, as reported by `:disassemble`:
> for c in data->split('\zs')
> endfor
> enddef
> pu =execute('disa Func')
> :%p
> qa!
> EOF
> )
>
> 0 STORE -1 in $0
> 1 LOADSCRIPT data-1 from /proc/5416/fd/11
> 2 PUSHS "\zs"
> 3 BCALL split(argc 2)
> 4 FOR $0 -> 7
> 5 STORE $1
> 6 JUMP -> 4
> 7 DROP
> 8 RETURN 0
>
> The only difference are the instructions 2 and 3:
>
> 2 PUSHS "\zs"
> 3 BCALL split(argc 2)
>
> When we iterate over a string in a `:def` function, at compile time,
> could Vim automatically insert those 2 instructions, so that we get
> the x70 boost without having to invoke `split()`?

I don't see much difference. It would require profiling to find out
where the time is spent.

--
hundred-and-one symptoms of being an internet addict:
88. Every single time you press the 'Get mail' button...it does get new mail.


/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

lacygoill

unread,
Apr 8, 2021, 9:05:07 AM4/8/21
to vim/vim, vim-dev ML, Comment

I get about 21 seconds for using the while loop.

I still get about 7 seconds on my machine.

Not sure what you did here, for me it goes down to about 0.1 second.

I just ran this code with a regular Vim compiled as usual:

./configure  \
  --enable-fail-if-missing       \
  --enable-gui=gtk2              \
  --enable-python3interp=dynamic \
  --prefix=/usr/local            \
  --with-compiledby=user

make

Please try this again.

I did several times, as soon as the patches were available. The results have always been about the same for more than 10 days now.

I get about 0.09 seconds here.

Yes, that's consistent with what I observe here too.

I don't see much difference. It would require profiling to find out
where the time is spent.

I'll try to learn how to better profile and report what I find.

Christian Brabandt

unread,
Apr 8, 2021, 9:23:27 AM4/8/21
to vim/vim, vim-dev ML, Comment

For the record I get:

9.543 seconds to run Func_while()
2.771 seconds to run Func()
0.067 seconds to run Func_split()

Bram Moolenaar

unread,
Apr 8, 2021, 12:02:51 PM4/8/21
to vim...@googlegroups.com, LemonBoy

> The `FOR` opcode is also calling `strlen` every time it's executed,
> making it slower than necessary (but robust against modifications of
> the string it's iterating over)

Good point! We can just check for the NUL that ends the string. I
don't think it is possible that the string changes while looping.

--
hundred-and-one symptoms of being an internet addict:
89. In addition to your e-mail address being on your business
cards you even have your own domain.

lacygoill

unread,
Apr 8, 2021, 12:33:39 PM4/8/21
to vim/vim, vim-dev ML, Comment

Fixed by 8.2.2736:

vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func()
    var i: number = 0
    var c: string
    while i < n
        c = data[i]
        i += 1
    endwhile
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
7.059 seconds to run Func()
vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func()
    for c in data->split('\zs')
    endfor
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
0.082 seconds to run Func()
vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func()
    for c in data
    endfor
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
0.005 seconds to run Func()

The new for c in string syntax is now the fastest of all the syntaxes by far. It's also the easiest to use and the most readable.

lacygoill

unread,
Apr 8, 2021, 12:33:40 PM4/8/21
to vim/vim, vim-dev ML, Comment

Closed #8000.

Christian Brabandt

unread,
Apr 8, 2021, 12:34:36 PM4/8/21
to vim/vim, vim-dev ML, Comment

Thanks, I can confirm.

Dominique Pellé

unread,
Apr 8, 2021, 1:33:58 PM4/8/21
to vim/vim, vim-dev ML, Comment

Something is wrong with the gprof files: 100% of the time is spent in call_def_function.

Isn't that expected, since we are running the vim9 function for about 7 seconds?

I expected to see time spent in call_def_function
but also in its callees.

Anyway, the slowness is now fixed by vim-8.2.2736.
I profiled with vim-8.2.2737 (debug mode) after modifying the script to iterate 10 million times:

$ cat > iterate-string.vim <<EOF
vim 
vim9script
var n: number = 10'000'000
var data: string = repeat('x', n)
def Func()
    for c in data
    endfor
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')
EOF

$ vim --clean -S iterate-string.vim

I get:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 67.57      0.25     0.25        1   250.00   339.62  call_def_function
  8.11      0.28     0.03 10000264     0.00     0.00  clear_tv
  4.05      0.30     0.02 10179616     0.00     0.00  utfc_ptr2len
  4.05      0.31     0.02 10027765     0.00     0.00  alloc
  2.70      0.32     0.01 10039739     0.00     0.00  vim_free
  2.70      0.33     0.01 10027840     0.00     0.00  lalloc
  2.70      0.34     0.01 10004840     0.00     0.00  vim_strnsave
  2.70      0.35     0.01       96     0.10     0.10  get_attr_entry

mg979

unread,
Aug 31, 2021, 7:09:56 AM8/31/21
to vim/vim, vim-dev ML, Comment

How has this been fixed?

vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func()
    var i: number = 0
    var c: string
    while i < n
        c = data[i]
        i += 1
    endwhile
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')

7.059 seconds to run Func()

This doesn't seem a good result to me. If accessing strings by index is still this slow and slower than in legacy vimscript it means it's useless to access characters this way.

If splitting the string by characters with split(s, '\zs') is much faster anyway, can't vim9 internally use the same method for expressions like string[index] and split the string, then returning the index in the resulting list? Or maybe use the same logic that function uses?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

Triage notifications on the go with GitHub Mobile for iOS or Android.

Christian Brabandt

unread,
Aug 31, 2021, 7:48:46 AM8/31/21
to vim/vim, vim-dev ML, Comment

I think the suggested improvement was to use for i in data and have it directly iterate over each character.

mg979

unread,
Aug 31, 2021, 7:53:48 AM8/31/21
to vim/vim, vim-dev ML, Comment

Yes I got that. The problem is that the form string[index] is legit and even obvious and in the current form it's a performance trap.

mg979

unread,
Aug 31, 2021, 8:01:50 AM8/31/21
to vim/vim, vim-dev ML, Comment

I guess this also affects string splicing like string[index : ]?

Bram Moolenaar

unread,
Sep 1, 2021, 4:50:08 PM9/1/21
to vim/vim, vim-dev ML, Comment

As said, if you iterate over a string use the "for c in text" loop. That is optimized for it.
If you get a character by index, it has to find the character every time, so that is obviously slow.
Splicing is even more costly, since it has to create a new string.
Compiling the commands doesn't make the work go faster.

mg979

unread,
Sep 2, 2021, 4:13:14 AM9/2/21
to vim/vim, vim-dev ML, Comment
vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func()
    var i: number = 0
    var c: string
    while i < n
        c = data[i]
        i += 1
    endwhile
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')

7.059 seconds to run Func()

vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func
()
    
for c in data->split('\zs')
    endfor
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')

0.082 seconds to run Func()

vim9script
var n: number = 100'000
var data: string = repeat('x', n)
def Func
()
    
for c in data
    endfor
enddef
var time = reltime()
Func()
setline(1, reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func()')

0.005 seconds to run Func()

Ok so these tests results are totally misleading because they are doing different things.

Test 1 finds an index inside a string long 100.000 characters, for 100.000 times.
Test 2 and 3 iterate the same string once.

Dominique Pellé

unread,
Sep 2, 2021, 4:27:28 AM9/2/21
to vim/vim, vim-dev ML, Comment

mg979 wrote:

Test 1 finds an index inside a string long 100.000 characters, for 100.000 times.
Test 2 and 3 iterate the same string once.

So test1 took 7.059 seconds and test3 took 0.005 seconds.

That is expected considering that string[index] is O(n) in Vim9 (making test3 O(n^2)).
Being O(n) was a design tradeoff as strings are stored in utf-8, which is variable size per characters.
An alternative would be to use utf-32, which would make it O(1), but it would consume more memory in general (4 times more memory for ASCII strings).

mg979

unread,
Sep 2, 2021, 4:37:52 AM9/2/21
to vim/vim, vim-dev ML, Comment

So test1 took 7.059 seconds and test3 took 0.005 seconds.

Still these tests can't say what is fastest. Since test 1 runs 100.000 the number of times of test 3, it might be that string indexing didn't need to be optimized because it's faster. But since the test is different, how can one say?

Dominique Pellé

unread,
Sep 2, 2021, 4:52:07 AM9/2/21
to vim/vim, vim-dev ML, Comment

@mg979:

Still these tests can't say what is fastest. Since test 1 iterates the string 100.000 the number of times of test 3

No, they do the same thing i.e. each test iterates on a string of size n=100,000 characters.

Test1 and test2 are both O(n), but with as bigger constant factor for test2, so test2 is slower.
Test3 is O(n^2) as string[i] in each iteration is implemented internally with O(n) complexity, so it's much slower of course.
To me the tests illustrate well the benefit of avoiding string[i] when iterating on all char of a string.

mg979

unread,
Sep 2, 2021, 9:14:06 AM9/2/21
to vim/vim, vim-dev ML, Comment

Rewriting the tests this way I get very different results. Notice that I made n only 10000 to make them faster.

vim9script

var n: number = 10000
var data: string = repeat('x', n)
var time = reltime()

def Func1()
    
var i: number = 0
    var c: string
    while i < n
        c = data[i]
        i += 1
    endwhile
enddef
Func1()
append('$', reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func1()')

def Func2()
    
var i: number = 0
    var j: number = 0
    var s = data->split('\zs')
    while i < n
        for c in s
            if i == j
                break
            endif
            j += 1
        endfor
        i += 1
        j = 0
    endwhile
enddef
time = reltime()
Func2()
append('$', reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func2()')

def Func3()
    
var i: number = 0
    var j: number = 0
    while i < n
        for c in data
            if i == j
                break
            endif
            j += 1
        endfor
        i += 1
        j = 0
    endwhile
enddef
time = reltime()
Func3()
append('$', reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func3()')

I get:

  0.063 seconds to run Func1()
  3.630 seconds to run Func2()
  3.401 seconds to run Func3()

This other way loops are simpler, they iterate the whole string each time.
That is, I use c = data[n] in Func1, and I don't break the loop in the other 2.
Func2 and Func3 result faster, but they don't do anything useful this way.

vim9script

var n: number = 10000
var data: string = repeat('x', n)
var time = reltime()

def Func1()
    
var i: number = 0
    var c: string
    while i < n
        c = data[n]
        
i += 1
    endwhile
enddef
Func1()
append('$', reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func1()')

def Func2()
    
var i: number = 0
    var s = data->split('\zs')
    while i < n
        for c in s
        endfor
        i += 1
    endwhile
enddef
time = reltime()
Func2()
append('$', reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func2()')

def Func3()
    
var i: number = 0
    while i < n
        for c in data
        endfor
i += 1
    endwhile
enddef
time = reltime()
Func3()
append('$', reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Func3()')

and I get:

  0.083 seconds to run Func1()
  3.471 seconds to run Func2()
  2.808 seconds to run Func3()

Direct index access seems much faster in any case. So the new form for c in data (in Func3) isn't really ideal for most use cases, imho.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub.

Dominique Pellé

unread,
Sep 2, 2021, 9:44:59 AM9/2/21
to vim/vim, vim-dev ML, Comment

@mg979 wrote:

So the new form for c in data (in Func3) isn't really ideal for most use cases, imho.

But your modified tests don't make sense to me.
Func3() is not meant to have a while loop on i?! Now you're making Fun3() O(n^2) for no reasons.
The point of the 3 tests was to iterate though all chars of a long string in different ways, which is not what your modified tests do.

Test3() only meant to loop on each character using the new Vim9 idiom: for c in data which is the fastest way.

mg979

unread,
Sep 2, 2021, 10:23:59 AM9/2/21
to vim/vim, vim-dev ML, Comment

To compare speed I wanted them to do similar things. All three functions iterate the same string in different ways, for the same n number of times.

mg979

unread,
Sep 2, 2021, 10:33:31 AM9/2/21
to vim/vim, vim-dev ML, Comment

Anyway I was mislead by the tests as they were presented, to think that string[index] was abnormally slow, but this isn't the case, I'm not arguing with anything else.

Bram Moolenaar

unread,
Sep 2, 2021, 10:50:39 AM9/2/21
to vim...@googlegroups.com, Dominique Pellé

> mg979 wrote:
>
> > Test 1 finds an index inside a string long 100.000 characters, for 100.000 times.
> > Test 2 and 3 iterate the same string once.
>
> So test1 took 7.059 seconds and test3 took 0.005 seconds.
>
> That is expected considering that `string[index]` is O(n) in Vim9
> (making test3 O(n^2)).
> Being O(n) was a design tradeoff as strings are stored in utf-8, which
> is variable size per characters. An alternative would be to use
> utf-32, which would make it O(1), but it would consume more memory in
> general (4 times more memory for ASCII strings).

And it would introduce a new value type, and require all functions that
currently work on a string (which is very many) also to work with the new
type. And the standard C functions working on strings can't be used,
I'm not sure how good wide string support is, many compilers only use 16
bits for wchar_t.

Using a list to store the characters also works. It remembers the last
used item, looking up by an index close to a previously used index
is fast. Creating the list has overhead though.

--
For large projects, Team Leaders use sophisticated project management software
to keep track of who's doing what. The software collects the lies and guesses
of the project team and organizes them in to instantly outdated charts that
are too boring to look at closely. This is called "planning".
(Scott Adams - The Dilbert principle)

Bram Moolenaar

unread,
Sep 2, 2021, 1:23:45 PM9/2/21
to vim/vim, vim-dev ML, Comment

It is true that when indexing a string by character index Vim has to go over the whole string from the start, since the UTF-8 encoding has a character use any number of bytes. At the same time, most users will want to use the character index, not the byte index. It's finding a balance between speed and what users actually want to do.

mg979

unread,
Sep 2, 2021, 2:15:11 PM9/2/21
to vim/vim, vim-dev ML, Comment

It's an improvement because one can still use strpart() for bytes, on the other hand getting a splice of a string by character wasn't easy in vimscript.

Reply all
Reply to author
Forward
0 new messages