help debugging a GC problem

79 views
Skip to first unread message

Joel Davis

unread,
Nov 22, 2015, 3:54:36 AM11/22/15
to Haxe

Hey, I need some help debugging a hxcpp GC crash. I am pretty sure it's my own fault, as I'm using a not-quite standard build setup, but I'm kind of running out of things to try.


This is using luxe on apple tvOS, and I'm semi-manually adding all the hxcpp runtime and generated cpp files to my xcode project, but as far as I can tell my build flags and defines match a regular hxcpp build. If I build for desktop (using flow, not my project), everything works OK, so I don't think it's a hxcpp problem, but I'm just not sure how to figure out where my build differs.


What happens: Intermittently -- right now this happens often -- I will get a crash (bad access) in Array.h/cpp. This usually happens in a ArrayBase::Realloc, at the point where where it's trying to call GetElementSize, but it happens in a few other places too (often obj->__Mark). Looking at the stack trace and the debugger, it looks like the vtable is messed up, it's trying to call a virtual function but the address in the vtable is 0x1010101010101, which my guess is a scribble value that something is setting to mark deleted memory (?). So it looks like some hx::Object is getting deleted or reset, but it's still trying to use it. Or something?


I'm not too familiar with the inner workings of the GC itself. I tried turning on some of the debug defines like SHOW_MEM_EVENTS but nothing seems obviously wrong. If I put a 'return' at the top of Collect in Immix.cpp, so it just never frees any memory, and it runs fine like that (until it eventually runs out of memory of course).


Any ideas or tips about how to debug this?


thanks!
Joel

Joel Davis

unread,
Nov 22, 2015, 8:44:59 PM11/22/15
to Haxe
Still stuck on this... so far the only clue I've been able to find is in 
   MarkContext::pushObj, I added a check for if the object being pushed is corrupted:

        void *vtbl = ((void**)inObject)[0];

        void *vt0 = ((void**)vtbl)[0];

        if (vt0 == (void*)0x0101010101010101) {

            printf("pushObj::BAD OBJ: %p %p %p\n", inObject, vtbl, vt0 );

        }


This doesn't happen during the first Collect, but it gets a "bad" object during every subsequent Collect() (even if it doesn't crash right away).

Building the desktop target, this never prints.

But I still don't know when the object gets clobbered. I can't figure out what's writing that 0x0101 value.. 

sigh.

Hugh

unread,
Nov 23, 2015, 12:00:08 AM11/23/15
to Haxe
Hi,
Debugging this can be tricky.
It looks like the gc system thinks it does not need some object, so it reuses the memory (overwriting it with 0x010101010101) but somewhere the object is still being used.
So the bug here is probably that the object that is  ultimately referencing the corrupted object is not being marked in the previous marking phase.  It also possible that there is an internal cross-reference error in the GC code, but that is less likely.

If you can attach a native debugger at this point where you have detected the bad object, and then look at the callstack.  The object that owns this object is probably the one you need to look at - it could be a MARK_MEMBER or marking of an array element.

One possibility from what you say about your own project files is that maybe your startup code is different?  You have to take care when calling into haxe from external code that you have called SetTopOfStack correctly.  Also, if you have saved callbacks in your own c++ code, you need to store them in GC Roots.

When debugging Gc, you can set "MAX_MARK_THREADS = 1" in the top of Immix.cpp, which uses a single thread and this can make things less complicated.
It will also make things more deterministic, and once you get the crash in the same spot every time, you can work backwards from the crash more easily.

Similarly, in MarkObjectAllocUnchecked, there is a line "inPtr->__Mark(__inCtx);" that is commented out, and uses a condition on "block" instead.  If you unconditionally call inPtr->__Mark(__inCtx) recursively instead, you will get a better callstack showing the ownership of the dodgy object.

Hugh

Joel Davis

unread,
Nov 26, 2015, 3:01:05 AM11/26/15
to Haxe
Thanks. That was very helpful. I was setting the top_of_stack wrong, once I fixed that and set the max_threads to 1, things got a lot better. However, I'm still getting this crash, but at least now it's predictable and happening to (usually) the same object. 

Now I have a predictable, mostly repeatable crash, but I still can't quite figure it out. The crash happens trying to access an Array<String>, and since I know when that object gets created, keep a pointer to it and it does get marked during the collect, but then gets overwritten later (but before the next collect).

Maybe I don't quite understand how to mix hxcpp and native code. Can hxcpp and native cpp coexist and share the same stack? What I'm trying to do is have an iOS/tvOS app, initialize haxe at startup but create all my windows/views through UIKit as usual, and then just call into haxe code from my update/render, or from specific callbacks like when a button is hit. That should work, right?  

thanks,
joel

Hugh

unread,
Nov 26, 2015, 11:27:38 PM11/26/15
to Haxe
Yes, that should work.  It is essentially what nme and openfl do.

Is it possible you have pushed foreign strings into hxcpp, using String(ptr,len)?  Or somehow otherwise mixed native and hxcpp allocated objects?

If you have a repo link, I can have a quick look at the code to see if anything sticks out.

Hugh
Reply all
Reply to author
Forward
0 new messages