Cry for help regarding with the WDM gem issue.

211 views
Skip to first unread message

Jarmo Pertman

unread,
Jan 19, 2013, 11:11:13 AM1/19/13
to theco...@googlegroups.com
Hello!

I'm writing here because i know that here is the highest concentration of developers who are using Ruby on Windows.

This means that i have my highest hopes that this is the place i could get some help regarding with an issue of Windows-specific wdm [1] gem, which is used by the popular gem guard [2] on Windows for detecting file system changes.

Again, i'm sorry that it isn't CodeShop related topic per se.

The problem is that the gem's C extension seems to leak memory and exits with an error message "[FATAL] failed to allocate memory".

I've done fair amount of investigation how and where it might happen and have found the one suspicious line, which might be the culprit.

However, that line calls ruby-c rb_thread_blocking_region method and i've ran out of ideas as to why it leaks memory.

I have created a minimal code snippet to reproduce the problem [3]. I have also written about my investigation of the problem [4].

The problem is reproducible on my Win8 64bit with Ruby 1.9.3p327 and p374.

Thank you for your possible help regarding this issue.

Jarmo Pertman
-----
IT does really matter - http://itreallymatters.net

Jon

unread,
Jan 19, 2013, 12:15:12 PM1/19/13
to theco...@googlegroups.com
Quick test and haven't looked into the C yet.

I can repro using your sample code on win7 32bit (4GB) with ruby p374 (built with mingw-32-4.6.2) and wdm-0.0.3.gem

C:\Users\Jon\Documents\RubyDev\sandbox>ruby wdm_crash.rb
1: C:/Users/Jon/Documents/RubyDev/sandbox/foobar
...
120: C:/Users/Jon/Documents/RubyDev/sandbox/foobar
[FATAL] failed to allocate memory


Strangely, when I use '-v' it consistently runs longer before failing:

C:\Users\Jon\Documents\RubyDev\sandbox>ruby -v wdm_crash.rb
ruby 1.9.3p374 (2013-01-15 revision 38833) [i386-mingw32]
1: C:/Users/Jon/Documents/RubyDev/sandbox/foobar
...
581: C:/Users/Jon/Documents/RubyDev/sandbox/foobar

Jarmo Pertman

unread,
Jan 19, 2013, 2:25:02 PM1/19/13
to theco...@googlegroups.com
Reproducilibity is always a good thing. Thank you, Jon!

Jarmo

Luis Lavena

unread,
Jan 20, 2013, 10:16:11 AM1/20/13
to theco...@googlegroups.com
On Sat, Jan 19, 2013 at 1:11 PM, Jarmo Pertman <jar...@gmail.com> wrote:
> Hello!
>
> I'm writing here because i know that here is the highest concentration of
> developers who are using Ruby on Windows.
>

Actually rubyinstaller list might have a higher number of Windows
developers, just saying ;-)

> This means that i have my highest hopes that this is the place i could get
> some help regarding with an issue of Windows-specific wdm [1] gem, which is
> used by the popular gem guard [2] on Windows for detecting file system
> changes.
>
> Again, i'm sorry that it isn't CodeShop related topic per se.
>
> The problem is that the gem's C extension seems to leak memory and exits
> with an error message "[FATAL] failed to allocate memory".
>
> I've done fair amount of investigation how and where it might happen and
> have found the one suspicious line, which might be the culprit.
>
> However, that line calls ruby-c rb_thread_blocking_region method and i've
> ran out of ideas as to why it leaks memory.
>

Have you run the code that generates this exception with GDB? Do you
have a backtrace when that exception was triggered to determine who
called rb_thread_blocking_region?

Did you compile the gem or was a pre-compiled binary?

> I have created a minimal code snippet to reproduce the problem [3]. I have
> also written about my investigation of the problem [4].
>
> The problem is reproducible on my Win8 64bit with Ruby 1.9.3p327 and p374.
>

I'm going to take an attempt and see if I can figure out, but not
making promises.

Keep you posted.
--
Luis Lavena
AREA 17
-
Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry

Jarmo Pertman

unread,
Jan 20, 2013, 10:36:27 AM1/20/13
to theco...@googlegroups.com
On Sunday, January 20, 2013 5:16:11 PM UTC+2, Luis Lavena wrote:
Have you run the code that generates this exception with GDB? Do you 
have a backtrace when that exception was triggered to determine who
called rb_thread_blocking_region?

Did you compile the gem or was a pre-compiled binary?

I have not run it with GDB - last time i tried to get it running on my machine i was not successful :(

However, as i wrote on the GitHub issue, i did localize the problem by commenting out C-code and then recompiling. In that way i was able to find out the offending code which calls rb_thread_blocking_region and causes memory to leak.
 

> I have created a minimal code snippet to reproduce the problem [3]. I have
> also written about my investigation of the problem [4].
>
> The problem is reproducible on my Win8 64bit with Ruby 1.9.3p327 and p374.
>

I'm going to take an attempt and see if I can figure out, but not
making promises.

Keep you posted.

Thank You!

Jon

unread,
Jan 20, 2013, 10:37:44 AM1/20/13
to theco...@googlegroups.com
I'll try to find time to fire up GDB late today/tommorrow when I can look at this again, but it appears your sample code is causing ruby to fail in one of the following (ruby_1_9_3 branch) locations:

gc.c:572
gc.c:584
gc.c:739
thread_pthread.c
vm.c:2224

Only the `gc.c` spots look relevant, all deal with threads, and two (lines 572 and 739) deal with not having a ruby thread.

Also, changing the `monitor.watch` callback in your sample code to

monitor.watch(".") do |change|
puts "[#{change.type.to_s.upcase}] #{counter += 1}: #{change.path}"
end

allows it to run ~3x longer on my Win7 32bit before failing.

None of this is really helpful yet, but maybe one of the others will chime in especially since it appears something in wdm's thread usage is confusing ruby.

Jon

Hiroshi Shirosaki

unread,
Jan 20, 2013, 9:04:56 PM1/20/13
to theco...@googlegroups.com
On Mon, Jan 21, 2013 at 12:37 AM, Jon <jon.f...@gmail.com> wrote:
>> [FATAL] failed to allocate memory
>
> I'll try to find time to fire up GDB late today/tommorrow when I can look at this again, but it appears your sample code is causing ruby to fail in one of the following (ruby_1_9_3 branch) locations:
>
> gc.c:572
> gc.c:584
> gc.c:739
> thread_pthread.c
> vm.c:2224
>
> Only the `gc.c` spots look relevant, all deal with threads, and two (lines 572 and 739) deal with not having a ruby thread.

Hello,

https://github.com/Maher4Ever/wdm/issues/5#issuecomment-12395887

When adding `GC.disable` to the test script, "[FATAL] failed to
allocate memory" doesn't occur on my test.
When adding `GC.stress = true`, "[FATAL] failed to allocate memory"
occurs immediately.

So this issue would be related to gc and rb_thread_blocking_region
(without gvl).

If non-ruby thread without gvl triggers gc, the above error will be raised.
https://github.com/ruby/ruby/blob/ruby_1_9_3/gc.c#L739

After some investigation, I found wdm_queue_item_data_new() uses
ALLOC(= xmalloc) instead of WDM_ALLOC.
xmalloc() is possible to trigger gc. WDM_ALLOC uses plain malloc().

wdm_queue_item_data_new() is called at the following line.
https://github.com/Maher4Ever/wdm/blob/master/ext/wdm/rb_monitor.c#L274

This patch seems to work as expected on my Windows XP SP3. Please try this.


diff --git a/ext/wdm/queue.c b/ext/wdm/queue.c
index 11fe554..d7e55a9 100644
--- a/ext/wdm/queue.c
+++ b/ext/wdm/queue.c
@@ -35,7 +35,7 @@ void wdm_queue_item_error_free(WDM_PQueueItemError error) {
WDM_PQueueItemData wdm_queue_item_data_new() {
WDM_PQueueItemData data;

- data = ALLOC(WDM_QueueItemData);
+ data = WDM_ALLOC(WDM_QueueItemData);
data->user_data = NULL;

ZeroMemory(&data->buffer, WDM_BUFFER_SIZE);


--
Hiroshi Shirosaki

Jon

unread,
Jan 21, 2013, 12:32:39 AM1/21/13
to theco...@googlegroups.com
Just got back in, fired up email and see that Hiroshi-san delivered the awesome good-fu once again! :) :) :)


Jarmo: you likely already know the ruby memory mgmt code Hiroshi described, but if not, here's a snippet walkthrough for future reference. Hiroshi, please correct me if I've mislead.

`ALLOC` is a macro defined in `include/ruby/ruby.h:1035` as

#define ALLOC(type) ((type*)xmalloc(sizeof(type)))

and `xmalloc` is aliased in `include/ruby/defines.h:64` as

#define xmalloc ruby_xmalloc

with `ruby_xmalloc` defined in `gc.c:856` as

void *
ruby_xmalloc(size_t size)
{
return vm_xmalloc(&rb_objspace, size);
}

leading to a `vm_xmalloc` definition in `gc.c:791` of

static void *
vm_xmalloc(rb_objspace_t *objspace, size_t size)
{
void *mem;

size = vm_malloc_prepare(objspace, size);
TRY_WITH_GC(mem = malloc(size));
return vm_malloc_fixup(objspace, mem, size);
}

This does some pre and post-magic around a normal `malloc`. The `vm_malloc_prepare` pre-magic in `gc.c:748` will do a GC in the following case (Hiroshi forced the immediate fail with `GC.stress = true` in your script)

if ((ruby_gc_stress && !ruby_disable_gc_stress) ||
(malloc_increase+size) > malloc_limit) {
garbage_collect_with_gvl(objspace);
}

Finally, `garbage_collect_with_gvl` at `gc.c:727`

static int
garbage_collect_with_gvl(rb_objspace_t *objspace)
{
if (dont_gc) return TRUE;
if (ruby_thread_has_gvl_p()) {
return garbage_collect(objspace);
}
else {
if (ruby_native_thread_p()) {
return (int)(VALUE)rb_thread_call_with_gvl(gc_with_gvl, (void *)objspace);
}
else {
/* no ruby thread */
fprintf(stderr, "[FATAL] failed to allocate memory\n");
exit(EXIT_FAILURE);
}
}
}

and your sample code got the last `else` clause.

However, as Hiroshi shows, if WDM had used it's internal `WDM_ALLOC` defined in `ext/wdm/memory.h:23` as

`#define WDM_ALLOC(type) ((type*)wdm_memory_malloc(sizeof(type)))

using `wdm_memory_malloc` in `ext/wdm/memory.c:6`

void *
wdm_memory_malloc (size_t size)
{
void *memory = malloc(size);

if ( memory == NULL ) {
rb_fatal("failed to allocate memory");
}

return memory;
}

instead of ruby's GC-using `ALLOC/xmalloc`, things should work since (from what I've seen so far) WDM does it's own memory mgt via `malloc/free` and shouldn't mix with ruby's memory mgmt.

I'm trying Hiroshi's patch tomorrow on Win7 32bit with your sample code. It will be interesting to hear if it solves the WDM bug.

My money's on Hiroshi :)

Jon

---
Fail fast. Fail often. Fail publicly. Learn. Adapt. Repeat.
http://thecodeshop.github.com | http://jonforums.github.com/
twitter: @jonforums

Jarmo Pertman

unread,
Jan 21, 2013, 12:28:44 PM1/21/13
to theco...@googlegroups.com
You guys are just awesome :)

I can confirm that this change fixes the leak on my Win8 64bit machine too!

Thank you Hiroshi for detecting and fixing the problem. And Jon, thank you for such a detailed write-up.

I've created a pull request [1] for wdm and mentioned you guys there too :)


Jarmo Pertman
-----
IT does really matter - http://itreallymatters.net
Reply all
Reply to author
Forward
0 new messages