Fixed crash in garbage collector by removing recursion

233 views
Skip to first unread message

Benjamin Fritz

unread,
Jun 22, 2014, 4:29:32 PM6/22/14
to vim_dev
The attached patch seems to fix the crash reported here:

https://groups.google.com/d/topic/vim_dev/Nr8Ja4Zjghw/discussion

The fix is simple in concept: any recursive call can be replaced with
an explicit stack to "cheat" your way into an iterative algorithm. So
that is what I did. I kept the calling structure unchanged, but pass
an explicit stack around for hash tables and lists. If the stack is
non-null, setting the reference just adds an item to the stack instead
of actually dropping into the HT or list to set its children. Setting
the children of a list or HT will continue processing the stack until
the stack is empty rather than only processing one item.

Sourcing the attached "crashtest.vim" before the patch crashes Vim
every time. After the patch, Vim is busy for nearly a minute, but then
recovers, and does not crash.

Memory use seems to be handled correctly. Looking at the memory column
in Windows Vista's Task Manager, I sourced the script, then did
":unlet dict list" to free up the memory created in the script that
still has a reference. Doing this repeatedly gave me the following
memory use:

After ":unlet" After ":source"
29500 174100
16016 173832
29436 175872
20748 174076
32892 180872
24568 173656
31772 180896
27168 175796
33008 182448
27244 175732
33144 181944
28864 177812
34500 182464
28516 176716

As you can see, Vim's memory footprint seems to fluctuate a little,
possibly due to fragmentation or something, but does not continue
definitely growing. I would appreciate if someone tested with Valgrind
or something to make sure.

There are problems remaining.

First, if I don't use :unlet dict list, but instead source the script
again and allow the previous values to be garbage collected, Vim stays
busy for at least an hour. I don't know if it ever finishes because I
killed the process. I used gdb on MinGW to see that Vim DOES get
through the "set reference" calls in the garbage collector which my
patch fixed; so I expect Vim is busy with the recursive calls to
actually free the garbage-collected memory. These could probably be
fixed in the same way I fixed the reference setting recursion. I think
this deserves a separate patch.

Secondly, the explicit stacks rely on malloc'ing more memory during
garbage collection, to create the stack entries. If this malloc fails,
I have simply abandoned the current garbage collection run without
doing anything. Should this throw a user-visible error? If it happens
too often Vim may run out of memory, it would be good to give the user
an opportunity to save their work and restart Vim gracefully.

Related to this, I wasn't sure whether I needed to reset any of these
values when aborting the garbage collection:

want_garbage_collect = FALSE;
may_garbage_collect = FALSE;
garbage_collect_at_exit = FALSE;

Do any of these need to be set TRUE to allow Vim another try at
garbage collection later?

Finally, setting references in the LUA interface doesn't currently
allow aborting for failure using this patch. I could not figure out,
how to get a return value from lua_call. Can someone familiar with the
LUA interface code please help with this?
crashtest.vim
iter_GC_set_ref.patch

Ben Fritz

unread,
Jun 23, 2014, 9:27:16 AM6/23/14
to vim...@googlegroups.com
On Sunday, June 22, 2014 3:29:32 PM UTC-5, Ben Fritz wrote:
> The attached patch seems to fix the crash reported here:
>
> https://groups.google.com/d/topic/vim_dev/Nr8Ja4Zjghw/discussion
>

I missed a couple files in the first patch; updated patch attached.

I still need help with the LUA interface.

>
> First, if I don't use :unlet dict list, but instead source the script
> again and allow the previous values to be garbage collected, Vim stays
> busy for at least an hour. I don't know if it ever finishes because I
> killed the process. I used gdb on MinGW to see that Vim DOES get
> through the "set reference" calls in the garbage collector which my
> patch fixed; so I expect Vim is busy with the recursive calls to
> actually free the garbage-collected memory. These could probably be
> fixed in the same way I fixed the reference setting recursion. I think
> this deserves a separate patch.
>

Unfortunately I tried the same approach with the list_free and dict_free chains and there did not seem to be a usable impact on performance. I let it run for half an hour and Vim was still busy...but it did finally finish sometime after I went to bed, so I guess there is that.

iter_GC_set_ref.patch

Bram Moolenaar

unread,
Jun 23, 2014, 3:11:44 PM6/23/14
to Benjamin Fritz, vim_dev
Thanks for making this patch!

Did you run the tests under valgrind? That's a very good way to check
for any memory access problems and leaks. Instructions are in
src/testdir/Makefile. There are a few false positives, compare to a Vim
without your patch.

--
[clop clop]
GUARD #1: Halt! Who goes there?
ARTHUR: It is I, Arthur, son of Uther Pendragon, from the castle of
Camelot. King of the Britons, defeator of the Saxons, sovereign of
all England!
GUARD #1: Pull the other one!
The Quest for the Holy Grail (Monty Python)

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Ben Fritz

unread,
Jun 24, 2014, 12:43:34 AM6/24/14
to vim...@googlegroups.com, fritzo...@gmail.com

On Monday, June 23, 2014 2:11:44 PM UTC-5, Bram Moolenaar wrote:
> Ben Fritz wrote:
>
> > The attached patch seems to fix the crash reported here:
> >
> > https://groups.google.com/d/topic/vim_dev/Nr8Ja4Zjghw/discussion
> >
> >
>
>
> Thanks for making this patch!
>
> Did you run the tests under valgrind? That's a very good way to check
> for any memory access problems and leaks. Instructions are in
> src/testdir/Makefile. There are a few false positives, compare to a Vim
> without your patch.
>

You're welcome! I'm seeing crashes from time to time at work when Vim
isn't really actively doing much of anything, so I figured this stood
some chance of being the cause.

I have not checked in valgrind, because I've never used it before, and I
developed the patch on Windows. I do have a Ubuntu dual-boot, I'll try
logging in and seeing how well I can follow the instructions in the
makefile later this week. Thanks for pointing them out!

Do you have input on the 3 questions I had about aborting the collection
when malloc fails?

1. Does there need to be any user notification of the aborted garbage
collection?
2. Do I need to set any of the garbage collect flags when aborting
collection due to failure in the setup?
3. How to get a return value from LUA interface, to allow aborting if it
fails?

Bram Moolenaar

unread,
Jun 24, 2014, 2:05:58 PM6/24/14
to Ben Fritz, vim...@googlegroups.com

Ben Fritz wrote:

> > > The attached patch seems to fix the crash reported here:
> > >
> > > https://groups.google.com/d/topic/vim_dev/Nr8Ja4Zjghw/discussion
> > >
> > >
> >
> >
> > Thanks for making this patch!
> >
> > Did you run the tests under valgrind? That's a very good way to check
> > for any memory access problems and leaks. Instructions are in
> > src/testdir/Makefile. There are a few false positives, compare to a Vim
> > without your patch.
> >
>
> You're welcome! I'm seeing crashes from time to time at work when Vim
> isn't really actively doing much of anything, so I figured this stood
> some chance of being the cause.
>
> I have not checked in valgrind, because I've never used it before, and I
> developed the patch on Windows. I do have a Ubuntu dual-boot, I'll try
> logging in and seeing how well I can follow the instructions in the
> makefile later this week. Thanks for pointing them out!
>
> Do you have input on the 3 questions I had about aborting the collection
> when malloc fails?
>
> 1. Does there need to be any user notification of the aborted garbage
> collection?

I think it would be useful to do this when 'verbose' is non-zero.
Or perhaps when 'debug' contains "msg". In case a user wonders if he is
suffering from aborted garbage collection he can use the setting to find
out if that is the case.

> 2. Do I need to set any of the garbage collect flags when aborting
> collection due to failure in the setup?

Not sure. We want to avoid that the GC runs over and over, so perhaps
do reset the flag that GC is needed.

> 3. How to get a return value from LUA interface, to allow aborting if it
> fails?

I hope someone else answers that.

--
GUARD #1: Where'd you get the coconut?
ARTHUR: We found them.
GUARD #1: Found them? In Mercea? The coconut's tropical!
ARTHUR: What do you mean?
GUARD #1: Well, this is a temperate zone.

Benjamin Fritz

unread,
Jul 7, 2014, 1:38:06 AM7/7/14
to vim_dev
On Mon, Jun 23, 2014 at 2:11 PM, Bram Moolenaar <Br...@moolenaar.net> wrote:
>
> Ben Fritz wrote:
>
>>
>> Finally, setting references in the LUA interface doesn't currently
>> allow aborting for failure using this patch. I could not figure out,
>> how to get a return value from lua_call. Can someone familiar with the
>> LUA interface code please help with this?
>

Still an issue. I think maybe I need to try a new thread.

>
> Thanks for making this patch!
>
> Did you run the tests under valgrind? That's a very good way to check
> for any memory access problems and leaks. Instructions are in
> src/testdir/Makefile. There are a few false positives, compare to a Vim
> without your patch.
>

I ran the tests ("make test", that is) under valgrind. A *few* false
positives?! Almost every test had at least one "definite leak" in the
output file.

I'm not sure I did it right, I did uncomment this line in the makefile:

+LEAK_CFLAGS = -DEXITFREE

and also the valgrind line from the test makefile.

Then I did "make clean", "make", "make test"

I copied off the test directory from before the patch and compared to
after the patch. For most test cases the only differences are in PID,
total heap size, and some address shifting.

For Test 16, there are plenty of differences in the "possible leak"
stuff. Before the test valgrind reports:

possibly lost: 1,210,076 bytes in 7,894 blocks

After the patch it becomes:

possibly lost: 1,224,143 bytes in 8,193 blocks

I'm really hoping that is a bunch of false positives; nothing looked
especially interesting but obviously I didn't take the time to examine
each line in detail. Test 16 is for the "secure" option which should
have nothing to do with the garbage collection changes, and most of
the supposed problems seem to have to do with the GUI startup.

I also used the makefile as a template to run valgrind on Vim while
running the crashtest.vim script that this patch fixes (after editing
counts to let it run in valgrind in faster than 2 years). The result
is exactly the same as every test except for test 16, except the heap
max size is a whole lot bigger.

Ben Fritz

unread,
Jan 22, 2015, 12:19:59 AM1/22/15
to vim...@googlegroups.com
On Sunday, June 22, 2014 at 3:29:32 PM UTC-5, Ben Fritz wrote:
> The attached patch seems to fix the crash reported here:
>
> https://groups.google.com/d/topic/vim_dev/Nr8Ja4Zjghw/discussion
>
> The fix is simple in concept: any recursive call can be replaced with
> an explicit stack to "cheat" your way into an iterative algorithm.

Updated patch attached. The updates allow aborting if the LUA set_ref
fails, and also shows a message if aborting when 'verbose' is greater than
zero.

I also updated the test file. I *think* this tests the LUA and Python
garbage collection. I grabbed the code snippet from Issue 267
(https://code.google.com/p/vim/issues/detail?id=267) to also test that
issue. Vim doesn't crash with the patch on that code; but I can't get it
to crash without my patch, either.

I think the patch is ready now, unless you need another Valgrind run with
the updates.

I did test again in Windows to make sure task manager doesn't show an
obvious increase in memory on iterations of the test.
iterative_garbagecollect_setref.patch
crashtest.vim
Reply all
Reply to author
Forward
0 new messages