map() slows down dramatically after vim running a while

57 views
Skip to first unread message

Ernie Rael

unread,
Mar 20, 2023, 8:33:46 PM3/20/23
to vim...@googlegroups.com
Is this expected? All the results/table are collected in a single instance
of vim; tried it with "-u NONE". I've run asan and valgrind on
test_filter_map and don't see any leaks. If these results are a surprise, I
can look further. I haven't run the linux/gnu profiler yet. I saw vim's
MEM_PROFILE, I'll check that out. Other suggestions for investigation?
Not just map, but that's part of what I'm looking at now.

Looking to understand some profiling results.
In the following nInput is number of items in "input: list<number>"
The code is run 5 times with the specified parameters nInputXnLoop.
Each cell is the fastest result of the 5 runs.

The first set of results are execute left to right, the second set are run
right to left. I had incorrectly thought that more work and higher number
of loops caused the slower results; but it seems that the greatest
contributor to slowdown is how much vim's been running. The most dramatic
difference can be seen
    in 3x99 (list of 3 numbers, 99 loops) for map(in, Map1)
    119 usec/op vs 5 usec/op.
Note that map(in, 'v:val') has a similar effect but not nearly so dramatic;
it's closer to proportionate for the workload (number of list items).

After the results, is the code that's being profiled

-ernie

=== summary results

   1x33   1x65   1x99   2x33   2x65   2x99   3x33   3x65   3x99 :
nInputXnLoop (cells: usec/op)
  0.652  0.671  0.725  0.916  0.953  0.951  1.188  1.271  1.200 :
map(in, 'v:val')   ###-2
  2.918 16.121 31.023 13.029 41.957 70.292 35.317 76.674    119 :
map(in, Map1)   ###-6

REVERSE column order of execution
   1x33   1x65   1x99   2x33   2x65   2x99   3x33   3x65   3x99 :
nInputXnLoop (cells: usec/op)
  0.859  0.768  0.656  1.091  1.007  0.887  1.299  1.281  1.061 :
map(in, 'v:val')   ###-2
 40.001 28.434 12.215 69.711 41.915 13.326 89.459 52.533  4.873 :
map(in, Map1)   ###-6


=== full results

$ ./vimprof_summarize | ./vimprof_create_table
   1x33   1x65   1x99   2x33   2x65   2x99   3x33   3x65   3x99 :
nInputXnLoop (cells: usec/op)
  0.074  0.070  0.071  0.073  0.074  0.069  0.074  0.078  0.069 :
result = []   ###-0
  0.155  0.161  0.173  0.190  0.202  0.192  0.209  0.222  0.206 : in =
input->copy()   ###-1
  0.652  0.671  0.725  0.916  0.953  0.951  1.188  1.271  1.200 :
map(in, 'v:val')   ###-2
  0.077  0.077  0.080  0.080  0.083  0.084  0.082  0.087  0.083 :
result = []   ###-4
  0.156  0.157  0.156  0.186  0.190  0.187  0.209  0.215  0.211 : in =
input->copy()   ###-5
  2.918 16.121 31.023 13.029 41.957 70.292 35.317 76.674    119 :
map(in, Map1)   ###-6
  0.082  0.090  0.098  0.087  0.098  0.097  0.092  0.102  0.098 :
result = []   ###-8

REVERSE

$ ./vimprof_summarize | ./vimprof_create_table
   1x33   1x65   1x99   2x33   2x65   2x99   3x33   3x65   3x99 :
nInputXnLoop (cells: usec/op)
  0.092  0.076  0.070  0.092  0.076  0.070  0.082  0.081  0.066 :
result = []   ###-0
  0.177  0.163  0.157  0.193  0.182  0.177  0.211  0.208  0.180 : in =
input->copy()   ###-1
  0.859  0.768  0.656  1.091  1.007  0.887  1.299  1.281  1.061 :
map(in, 'v:val')   ###-2
  0.082  0.079  0.078  0.084  0.081  0.080  0.080  0.083  0.077 :
result = []   ###-4
  0.164  0.173  0.160  0.192  0.194  0.188  0.215  0.216  0.195 : in =
input->copy()   ###-5
 40.001 28.434 12.215 69.711 41.915 13.326 89.459 52.533  4.873 :
map(in, Map1)   ###-6
  0.111  0.102  0.091  0.117  0.107  0.093  0.111  0.113  0.079 :
result = []   ###-8


=== code being profiled

def Map1(_, val: number): number
    return val
enddef

def Map(xxx: list<any>)
    result = []                           ###-0
    var in: list<number>
    in = input->copy()                    ###-1
    map(in, 'v:val')                      ###-2
    if expect != in | Oops() | endif
    result = []                           ###-4
    in = input->copy()                    ###-5
    map(in, Map1)                         ###-6
    if expect != in | Oops() | endif
    result = []                           ###-8
enddef

Ernie Rael

unread,
Mar 21, 2023, 1:56:46 AM3/21/23
to vim...@googlegroups.com
It is the profiling itself that is causing the results to be so skewed as
things run longer. I guess the thing to do now is use the linux/gnu
profiler to profile the vim profiler.

During a single run of vim, for the current analysis, 45 profile files are
created; the profile structures are torn down and rebuilt 45 times. I've
modified my profiling framework such that all the programs are still run,
but the profiler is enabled on a per column basis. In addition the output
shows the run order of the various columns.

Only the 3rd and 9th columns are profiled
   1x99   3x99 : nInputXnLoop (cells: usec/op)
     #3     #9 : run order
  0.596  1.177 : map(in, 'v:val')   ###-2
  3.110 17.203 : map(in, Map1)   ###-6

Reverse order
   1x99   3x99 : nInputXnLoop (cells: usec/op)
     #7     #1 : run order
  0.644  1.068 : map(in, 'v:val')   ###-2
  6.113  4.903 : map(in, Map1)   ###-6

All 9 columns are profiled
   1x33   1x65   1x99   2x33   2x65   2x99   3x33   3x65   3x99 :
nInputXnLoop (cells: usec/op)
     #1     #2     #3     #4     #5     #6     #7     #8     #9 : run order
  0.608  0.592  0.603  0.941  0.954  0.976  1.269  1.217  1.208 :
map(in, 'v:val')   ###-2
  2.550  5.884 10.559 26.990 36.463 44.996 79.196     90    106 :
map(in, Map1)   ###-6

All columns profiled in reverse order
   1x33   1x65   1x99   2x33   2x65   2x99   3x33   3x65   3x99 :
nInputXnLoop (cells: usec/op)
     #9     #8     #7     #6     #5     #4     #3     #2     #1 : run order
  0.789  0.735  0.682  0.975  1.690  0.932  1.209  1.108  1.059 :
map(in, 'v:val')   ###-2
 34.596 30.699 27.083 50.110 77.150 28.901 31.590 17.570  4.486 :
map(in, Map1)   ###-6

In the previous results, excepting for the first and last columns, the
order of the column execution was incorrectly stated. The conclusions are
the same and the results make more sense now.

Ernie Rael

unread,
Mar 22, 2023, 1:48:44 PM3/22/23
to vim...@googlegroups.com

On 23/03/20 10:56 PM, Ernie Rael wrote:
> It is the profiling itself that is causing the results to be so skewed as
> things run longer. I guess the thing to do now is use the linux/gnu
> profiler to profile the vim profiler.


Found it.

Doing
    execute 'profile func' funcref(Func)->get('name')
    profile start xxx.profile.out
generates
    Error detected while processing command line..script
/home/err/play/xxx.vim:
    line   15:
    E750: First use ":profile start {fname}"
I assumed (bad programmer) that the list of functions to profile is
cleared by `profile stop`; after all you can't set the functions
without doing `profile start`.

So I've been doing
    profile start xxx.profile.out
    execute 'profile func' funcref(Func)->get('name')
    Func()
    profile stop
and this gets done multiple times while collecting data.

Here's some gprof output
      %   cumulative   self              self     total
     time   seconds   seconds    calls  ms/call  ms/call  name
     31.82      0.07     0.07   542508     0.00     0.00 nfa_regmatch
Without a function name being set multiple times nfa_regmatch
is at most a blip. BTW, the name looks like `<SNR>47_T`.

This profiling behavior is confusing at best.

Throwing out duplicate patters is a thought.
I wonder if `function name {pattern}` that don't require regexp
could/should be handled without it; maybe a table lookup.

-ernie

Ernie Rael

unread,
Mar 22, 2023, 4:43:10 PM3/22/23
to vim...@googlegroups.com
Poking around...

In debugger.c::debuggy_find():976 it looks like there's a missing `break;`.

With my corrected code, has profiling() still looks like ~11% of the the
time (it's got
those two vim_regexec...()); wonder if has_profiling() could/should cache
the results of debuggy_find().

-ernie

Bram Moolenaar

unread,
Mar 22, 2023, 6:02:37 PM3/22/23
to vim...@googlegroups.com, Ernie Rael

Ernie Rael wrote:

> On 23/03/20 10:56 PM, Ernie Rael wrote:
> > It is the profiling itself that is causing the results to be so skewed as
> > things run longer. I guess the thing to do now is use the linux/gnu
> > profiler to profile the vim profiler.
>
> Found it.
>
> Doing
>     execute 'profile func' funcref(Func)->get('name')
>     profile start xxx.profile.out
> generates
>     Error detected while processing command line..script
> /home/err/play/xxx.vim:
>     line   15:
>     E750: First use ":profile start {fname}"
> I assumed (bad programmer) that the list of functions to profile is
> cleared by `profile stop`; after all you can't set the functions
> without doing `profile start`.
>
> So I've been doing
>     profile start xxx.profile.out
>     execute 'profile func' funcref(Func)->get('name')
>     Func()
>     profile stop
> and this gets done multiple times while collecting data.

You can do "profdel *" somewhere.

> Here's some gprof output
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls  ms/call  ms/call  name
>      31.82      0.07     0.07   542508     0.00     0.00 nfa_regmatch
> Without a function name being set multiple times nfa_regmatch
> is at most a blip. BTW, the name looks like `<SNR>47_T`.
>
> This profiling behavior is confusing at best.
>
> Throwing out duplicate patters is a thought.

These normally don't happen. I doubt it will help much in general.

> I wonder if `function name {pattern}` that don't require regexp
> could/should be handled without it; maybe a table lookup.

Possibly, but is it worth it? As with caching, this often is tricky to
get right.

--
Yah, well, we had to carve our electrons out of driftwood we'd
find. In the winter. Uphill. Both ways.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Bram Moolenaar

unread,
Mar 22, 2023, 6:02:37 PM3/22/23
to vim...@googlegroups.com, Ernie Rael

Ernie Rael wrote:

> Poking around...
>
> In debugger.c::debuggy_find():976 it looks like there's a missing `break;`.

Hmm, it needs to loop if "lnum" can be lower in another entry, but for
profiling that doesn't matter. I suppose we could break when
"gap == &prof_ga". Do you want to make a PR for that?

> With my corrected code, has profiling() still looks like ~11% of the the
> time (it's got those two vim_regexec...()); wonder if has_profiling()
> could/should cache the results of debuggy_find().

Caching can be tricky. In this case it can be difficult to clear cached
information when anything changes (new function defined, profiling entry
added/deleted, etc.).

--
You were lucky to have a LAKE! There were a hundred and sixty of
us living in a small shoebox in the middle of the road.

Ernie Rael

unread,
Mar 22, 2023, 9:47:09 PM3/22/23
to vim...@googlegroups.com
On 23/03/22 3:02 PM, Bram Moolenaar wrote:
> Ernie Rael wrote:
>
>> Poking around...
>>
>> In debugger.c::debuggy_find():976 it looks like there's a missing `break;`.
> Hmm, it needs to loop if "lnum" can be lower in another entry, but for
> profiling that doesn't matter. I suppose we could break when
> "gap == &prof_ga". Do you want to make a PR for that?


Will do.

>> With my corrected code, has profiling() still looks like ~11% of the the
>> time (it's got those two vim_regexec...()); wonder if has_profiling()
>> could/should cache the results of debuggy_find().
> Caching can be tricky. In this case it can be difficult to clear cached
> information when anything changes (new function defined, profiling entry
> added/deleted, etc.).
We're caching the result of running regexp on a name (I think); I don't see
how adding a function changes that. I'm thinking of something like

    has_profiling(...)
    {
        result = table->get(fname, null);
        if (result == null)
        {
            result = debuggy_find(...);
            table[fname] = result;
        }
        return result;  // handle direct return and optional forceit
    }

The table is cleared whenever prof_ga is modified. Should be easy to
detect since prof_ga is static in debugger.c.

I was thinking of two tables, for file and function, and avoid constructing
keys that distinguish the cases.

A potential problem is "forceit", I'm not sure what this is about and
whether or not it can be cached. If it can be cached, the a value in the
table is two bits, has_profile and forceit (assuming foreit is boolean).

I'm also assuming that the main if condition is always true for
has_profile; "bp->dbg_type == DBG_EXPR" is never true.

Any simple examples of using hashtable in "C" that I can reference?


> is it worth it?

Some uniform degradation, while undesirable, is not unexpected when
profiling. The problem here is the degradation in results is not uniform
and is large in some cases; it is due to the regex engine and the order of
stuff in a table that's used by the profiler. Can vary while running.

In this case, `map(in, Map1)`, the map builtin invokes a user defined
function; I'm guessing on invocation of that function the profiler looks it
up. The lookup takes a seemingly arbitrary amount of time and counts it
against the `map`. I've seen results 50x slower for the `map`; while lines
that do not involve a user function are not affected.

-ernie

Bram Moolenaar

unread,
Mar 23, 2023, 11:19:52 AM3/23/23
to vim...@googlegroups.com, Ernie Rael

Ernie Rael wrote:

> >> With my corrected code, has profiling() still looks like ~11% of the the
> >> time (it's got those two vim_regexec...()); wonder if has_profiling()
> >> could/should cache the results of debuggy_find().
> > Caching can be tricky. In this case it can be difficult to clear cached
> > information when anything changes (new function defined, profiling entry
> > added/deleted, etc.).
>
> We're caching the result of running regexp on a name (I think); I don't see
> how adding a function changes that. I'm thinking of something like
>
>     has_profiling(...)
>     {
>         result = table->get(fname, null);
>         if (result == null)
>         {
>             result = debuggy_find(...);
>             table[fname] = result;
>         }
>         return result;  // handle direct return and optional forceit
>     }
>
> The table is cleared whenever prof_ga is modified. Should be easy to
> detect since prof_ga is static in debugger.c.
>
> I was thinking of two tables, for file and function, and avoid constructing
> keys that distinguish the cases.

Keep in mind that for debugging the line number also matters. It would
be simpler to only do this for profiling, since for debugging the
performance isn't all that important.

> A potential problem is "forceit", I'm not sure what this is about and
> whether or not it can be cached. If it can be cached, the a value in the
> table is two bits, has_profile and forceit (assuming foreit is boolean).

"When the [!] is added then all functions defined in the script will
also be profiled."

This is handled in prof_def_func(). This doesn't involve any regexp,
thus it would be OK not to cache this.

> I'm also assuming that the main if condition is always true for
> has_profile; "bp->dbg_type == DBG_EXPR" is never true.

Right.

> Any simple examples of using hashtable in "C" that I can reference?

It's all in hashtab.c. E.g. hash_init(), hash_add() and hash_find().
Perhaps the use of b_keywtab can function as an example.
You might want to store the hash number with the function to avoid the
overhead of computing it. The overhead of looking up the key in the
table can't be avoided. Also, when more function names are being cached
updating the hash table can also take a noticable amount of time.

> > is it worth it?
>
> Some uniform degradation, while undesirable, is not unexpected when
> profiling. The problem here is the degradation in results is not uniform
> and is large in some cases; it is due to the regex engine and the order of
> stuff in a table that's used by the profiler. Can vary while running.

If the number of items in "prof_ga" grows then the overhead indeed
rapidly grows, since it goes over the list quite a few times.

Note that with caching the overhead can still be noticable, especially
when adding/removing items from "prof_ga".

> In this case, `map(in, Map1)`, the map builtin invokes a user defined
> function; I'm guessing on invocation of that function the profiler looks it
> up. The lookup takes a seemingly arbitrary amount of time and counts it
> against the `map`. I've seen results 50x slower for the `map`; while lines
> that do not involve a user function are not affected.

Yes, that can give misleading profiling results.

--
Not too long ago, unzipping in public was illegal...

Ernie Rael

unread,
Mar 23, 2023, 10:31:21 PM3/23/23
to vim...@googlegroups.com
Initial results are promising, one measure is how few loops are
required for the results to stabilize.
And the results show faster times when user defined functions
are involved. It's interesting how good that feels, even though
it doesn't mean anything is actually running faster in production.

On 23/03/23 8:19 AM, Bram Moolenaar wrote:
Ernie Rael wrote:

You might want to store the hash number with the function to avoid the
overhead of computing it.

This still needs to be done. To make sure I understand,
this is saving `hash_hash(name)` in ufunc_T and has_profiling has an
additional argument which is the hash. And if this argument is zero
then the hash would be computed in has_profiling.

  The overhead of looking up the key in the
table can't be avoided.  Also, when more function names are being cached
updating the hash table can also take a noticable amount of time.
Yes, but it also would usually indicate that the table has been very useful.

I don't know most use cases; does the number of things being profiled
change dynamically or very often?

-ernie

Ernie Rael

unread,
Mar 24, 2023, 11:50:11 AM3/24/23
to vim...@googlegroups.com
FYI, back to where the latest profiler work started (after nsec clock)

The goal: compare performance "map(in, 'v:val')" vs "map(in, Map1)". Map1 is

        def Map1(_, val: number): number
            return val
        enddef
Turns out for processing a list with 3 items, 1.049 usec vs 1.590 usec;
at least that's the current results. (And not 1.166 vs 103)

Current vim:

   1x33   1x65   1x99   2x33   2x65   2x99   3x33   3x65   3x99 : nInputXnLoop (cells: usec/op)
  0.608  0.625  0.641  0.999  0.957  0.954  1.237  1.179  1.166 : map(in, 'v:val')   ###-2
  0.068  0.072  0.071  0.071  0.068  0.069  0.070  0.070  0.070 : result = []   ###-4
  0.144  0.151  0.146  0.171  0.166  0.164  0.194  0.190  0.189 : in = input->copy()   ###-5
  2.784  6.023 10.140 27.808 36.471 45.324 77.819     92    103 : map(in, Map1)   ###-6

Upgraded function profiling patch
  0.599  0.574  0.568  0.902  0.795  0.807  1.121  1.066  1.049 : map(in, 'v:val')   ###-2
  0.071  0.072  0.071  0.072  0.069  0.071  0.071  0.070  0.070 : result = []   ###-4
  0.152  0.151  0.149  0.179  0.175  0.173  0.200  0.200  0.205 : in = input->copy()   ###-5
  0.807  0.771  0.723  1.179  1.182  1.165  1.663  1.598  1.590 : map(in, Map1)   ###-6


-ernie
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_dev/911e2e0b-ab7b-8d34-e205-6c6cbeea1cc8%40raelity.com.


Bram Moolenaar

unread,
Mar 24, 2023, 1:45:07 PM3/24/23
to vim...@googlegroups.com, Ernie Rael

> > You might want to store the hash number with the function to avoid the
> > overhead of computing it.
>
> This still needs to be done. To make sure I understand,
> this is saving `hash_hash(name)` in ufunc_T and has_profiling has an
> additional argument which is the hash. And if this argument is zero
> then the hash would be computed in has_profiling.

I haven't looked at the PR yet, but I would think that as soon as you
have the function pointer, you can use the field that has the hash
value. It is based on the function name, which doesn't change. Thus it
needs to be computed only once.

However, if you mean the hash value of the pattern, then it would be
stored in "struct debuggy". Obviously it depends on what key you use
for the hash table.

> > The overhead of looking up the key in the
> > table can't be avoided. Also, when more function names are being cached
> > updating the hash table can also take a noticable amount of time.
>
> Yes, but it also would usually indicate that the table has been very
> useful.
>
> I don't know most use cases; does the number of things being profiled
> change dynamically or very often?

That completely depends on what you are doing. Assuming you first start
Vim and all plugins are loaded before you start profiling, then more
plugins would be loaded when you access files with a certain filetype.

Assuming you make changes (to scripts and/or source code) then Vim needs
to restart before you measure another profile. Just make sure you use
the same order of commands each time, otherwise results might not be
showing the effect of your changes correctly.

--
Overflow on /dev/null, please empty the bit bucket.

Ernie Rael

unread,
Mar 24, 2023, 2:14:00 PM3/24/23
to vim...@googlegroups.com
On 23/03/24 10:45 AM, Bram Moolenaar wrote:
>>> You might want to store the hash number with the function to avoid the
>>> overhead of computing it.
>> This still needs to be done. To make sure I understand,
>> this is saving `hash_hash(name)` in ufunc_T and has_profiling has an
>> additional argument which is the hash. And if this argument is zero
>> then the hash would be computed in has_profiling.
> I haven't looked at the PR yet, but I would think that as soon as you
> have the function pointer, you can use the field that has the hash
> value. It is based on the function name, which doesn't change. Thus it
> needs to be computed only once.
Right, that's what I meant . After thinking about it a bit more, I went
ahead and
implemented something, and pushed it last night, as you'll see. Caching
the hash
is an easily detected improvement.

-ernie
Reply all
Reply to author
Forward
0 new messages