CriticalCGError at PauseForCollect

133 views
Skip to first unread message

Margus Niitsoo

unread,
Oct 28, 2016, 5:50:05 AM10/28/16
to Haxe
After fixing the threading issues with AutoHaxe and wrapping work done on non-main threads into gc_enter_blocking/gc_exit_blocking blocks, the iOS bindings mostly work. However, we occasionally still see crash reports ending at:

0 MyApp
CriticalGCError(char const*) + 39484
1 MyApp
CriticalGCError(char const*) + 39484
2 MyApp
LocalAllocator::PauseForCollect() + 46972
3 MyApp
LocalAllocator::CallAlloc(int, unsigned int) + 3556
4 MyApp
alloc_empty_object + 2036

Could anyone with an understanding of HXCPP internals venture a guess as to what may be causing this? 

Best,
Margus

Hugh

unread,
Oct 31, 2016, 12:28:54 AM10/31/16
to Haxe
If you could see the const char *, it might give a hint.
It could be: "Bad Allocation while collecting - from finalizer?"

Could it be that you are doing some operations that either directly or indirectly make GC cffi calls from your finalizer?
Unfortunately, this is not allowed.  You can somehow postpone this operation to be done later - eg, on the next refresh.

If not, more of the stack should also give more of a hint.

Hugh

Margus Niitsoo

unread,
Oct 31, 2016, 3:18:52 AM10/31/16
to Haxe
Full stack trace for that thread is:

0  MyApp                     0x10025eca4 CriticalGCError(char const*) + 39484
1  MyApp                     0x10025eca4 CriticalGCError(char const*) + 39484
2  MyApp                     0x1002609e4 LocalAllocator::PauseForCollect() + 46972
3  MyApp                     0x100263178 LocalAllocator::CallAlloc(int, unsigned int) + 3556
4  MyApp                     0x100255a5c alloc_empty_object + 2036
5  MyApp                     0x1001bead4 nativeextension::native_loadogg(_value*, _value*, _value*) + 4296190676
6  MyApp                     0x1002726ec ExternalPrimitive::__run(Dynamic const&, Dynamic const&, Dynamic const&) + 58628
7  MyApp                     0x10023b9e8 MMSNativeExtension_obj::decodeOggAudio(hx::ObjectPtr<haxe::io::Bytes_obj>, hx::Null<double>, hx::Null<double>, hx::Null<int>, hx::Null<int>) + 4924
8  MyApp                     0x1001cc220 platforms::native::NativeAudioContext_obj::playSBWithFilter(hx::ObjectPtr<haxe::io::Bytes_obj>, double, double, Dynamic, Dynamic) + 3608
9  MyApp                     0x1001cc7f4 platforms::native::NativeAudioContext_obj::playSB(hx::ObjectPtr<haxe::io::Bytes_obj>, double, double, Dynamic) + 5100
10 MyApp                     0x1001ef2cc base::sound::Player_obj::metronomeBar(Array<Dynamic>, int, double, int) + 10140
11 MyApp                     0x1001efbb4 base::sound::Player_obj::metronome(Array<Dynamic>, double, hx::Null<double>) + 12420
12 MyApp                     0x10024c17c __BaseController_objmetronome(hx::Object*, Dynamic const&, Dynamic const&, Dynamic const&) + 9356
13 MyApp                     0x100291b6c hx::CMemberFunction3::__Run(Array<Dynamic> const&) + 2512
14 MyApp                     0x100239e88 Reflect_obj::callMethod(Dynamic, Dynamic, cpp::VirtualArray) + 1628
15 MyApp                     0x100241f28 IOSController_obj::_addExternalCallback(String)::_hx_Closure_0::_hx_run(cpp::VirtualArray) + 2996
16 MyApp                     0x100241ec0 IOSController_obj::_addExternalCallback(String)::_hx_Closure_0::__run(Dynamic const&) + 2892
17 MyApp                     0x100242250 IOSController_obj::_caller(String, String)::_hx_Closure_0::_hx_run() + 3804
18 MyApp                     0x1002421d0 IOSController_obj::_caller(String, String)::_hx_Closure_0::__run() + 3676
19 MyApp                     0x1001e4bf4 haxe::Timer_obj::delay(Dynamic, int)::_hx_Closure_0::__run() + 916
20 MyApp                     0x1001e4a48 haxe::Timer_obj::__construct(int)::_hx_Closure_0::__run() + 488
21 MyApp                     0x1001e57ac haxe::MainLoop_obj::tick() + 1588
22 MyApp                     0x1001e7074 haxe::EntryPoint_obj::processEvents() + 892
23 MyApp                     0x1001e70f8 haxe::EntryPoint_obj::run() + 1024
24 MyApp                     0x1002544c0 __hxcpp_lib_main + 856
25 libdispatch.dylib              0x1826494bc _dispatch_call_block_and_release + 24
26 libdispatch.dylib              0x18264947c _dispatch_client_callout + 16
27 libdispatch.dylib              0x182657914 _dispatch_root_queue_drain + 2140
28 libdispatch.dylib              0x1826570b0 _dispatch_worker_thread3 + 112
29 libsystem_pthread.dylib        0x182861470 _pthread_wqthread + 1092
30 libsystem_pthread.dylib        0x182861020 start_wqthread + 4

Nothing is printed to console on crash, though.

As a side note - I did use to see "Bad allocation while collecting - from finalizer" before, but that disappeared after I used gc_enter/exit_blocking on all calls running on other threads which seemed like a good idea as the error seems to indicate that it does not like GC running when allocating new memory. Was that the correct course of action or should it be handled in a different way?

Best,
Margus

On Monday, October 31, 2016 at 6:28:54 AM UTC+2, Hugh wrote

Hugh

unread,
Nov 1, 2016, 1:43:23 AM11/1/16
to Haxe
This thread looks pretty sensible, unless a routine, such as  native_loadogg , has 'entered a gc free zone' before making a gc call.
It appears a different thread is doing a collect.  That thread should have been waiting for this thread to acknowledge that he was ready for a collection so the next step would be to work out why it this thinks it does not need to wait.  Perhaps there is a race condition or for some reason the event-wait logic is failing.

Margus Niitsoo

unread,
Nov 1, 2016, 8:15:40 AM11/1/16
to Haxe
How should I go about doing this? I am pretty much out of ideas myself, and this is a mission critical bug at this point. 

Also - does it matter if alloc_empty_object is called inside the gc_enter/exit_blocking block or does it have to be outside it? My current understanding is that gc_blocking just blocks GC from doing a collect, but any and all other work can proceed as normal?

Also - supposing another thread had somehow started a GC collect? Would gc_enter_blocking then gracefully wait until it is done (while loop or smth) or will it throw an error?

Best,
Margus

Margus Niitsoo

unread,
Nov 1, 2016, 8:34:36 AM11/1/16
to Haxe
Also -  I cannot think of a way that GC might have been started on any other threads at the same time, as haxe calls from all the other threads are wrapped into gc_enter/exit_blocking blocks as mentioned above.

Am I right in assuming that GC collect can only be started by either
a) direct call, i.e. cpp.vm.Gc.collect()
b) a haxe memory allocation, i.e. alloc_int, alloc_empty_object, etc
c) by calling a haxe function via val_call()

Or are there any other conditions when it may happen?

Just in case it helps, the current code for load_ogg is:

    static value native_loadogg(value contents, value hx_beg, value hx_end)
    {
        AutoGCRoot * data_gc = new AutoGCRoot(contents);

        gc_enter_blocking();
        array_with_ptr * apt = bufferToArrayPtr(&contents);
        //fprintf(stderr, "Decoder got str with len: %li\n", apt->length);

        OggVorbis_File * vf = new OggVorbis_File();
        int retval = ov_open_callbacks(apt, vf, NULL, 0, OV_CALLBACKS_ARRAY);

        vorbis_info* pInfo = ov_info(vf, -1);
        int nsamples = ov_pcm_total(vf,-1);

        double beg = val_float(hx_beg), end = val_float(hx_end);

        // Seek to beginning
        ov_time_seek(vf,beg);
        nsamples -= ov_pcm_tell(vf);

        //fprintf(stdout,"OGG SRATE %li\n",pInfo->rate);

        oggDecoder * od = new oggDecoder;
        od->vf = vf;
        od->apt = apt;
        od->ibuffer = new int16_t[intbuf_len];
        od->nsamples = nsamples;
        od->nplayed = 0;
        od->pb_beg = od->pb_end = 0;

        long slen = long(pInfo->rate * (end-beg));
        if (slen>0 && od->nsamples > slen) od->nsamples = slen;

        gc_exit_blocking();

        value obj = alloc_empty_object();
        alloc_field(obj,val_id("nsamples"),alloc_int(nsamples));
        alloc_field(obj,val_id("nchannels"),alloc_int(pInfo->channels));
        alloc_field(obj,val_id("srate"),alloc_int(pInfo->rate));
        alloc_field(obj,val_id("dec"),alloc_abstract(0,(void*)od));
        alloc_field(obj,val_id("data_gc"),alloc_abstract(0,(void*)data_gc));

        return obj;
    }
    DEFINE_PRIM (native_loadogg, 3);

Daniel Uranga

unread,
Nov 1, 2016, 8:39:06 PM11/1/16
to haxe...@googlegroups.com
Maybe I'm missing something, but where are you freeing "od" ?

Don't know if this will be useful, but you could take a look at
https://github.com/SempaiGames/extension-harfbuzz/blob/master/project/common/openfl-harfbuzz.cpp#L218
(and the rest of that extension in general)

There an array of ints is returned from C++ to Haxe, and afaik its very stable.

Good luck,
Daniel
> --
> To post to this group haxe...@googlegroups.com
> http://groups.google.com/group/haxelang?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Haxe" group.
> For more options, visit https://groups.google.com/d/optout.

Hugh

unread,
Nov 1, 2016, 9:43:22 PM11/1/16
to Haxe
You are right about collections - they typically happen when some thread makes an allocation call, and there is not enough memory to satisfy the request.
val_call will often indirectly make an alloc call, and you may not know exactly what is going on inside, so it is reasonable to assume an alloc could/will happen in there.


But enter/exit blocking is not about about preventing a collect, but about telling the other threads that they can collect if they like, and to not wait for you because you might be "blocking" on some IO operation.

So it is not correct to perform any GC operation (allocation,collect,set finalizer) between the enter/exit calls.

Does this call make any make any gc calls?  bufferToArrayPtr(&contents);  Maybe it should be on the other side of "enter_blocking" call. Passing the address-of is also odd - does this mean the value might change?

The use of autogc root here is also problematic - it will never release the contents.
You either need to "delete" it a the end of your routine, but typically you would use a local variable (not a pointer) to do this automatically for you, like:
AutoGcRoot releaseWhenDone(contents);
But actually, it should not be needed because the stack scanner will find the variable on the stack and automatically preserve it.

You should be able to see what thread triggered the collection using the native debugger. MSVS has a nice "threads" box, then select the callstack, lldb has "thread backtrace all"

Hugh

Margus Niitsoo

unread,
Nov 2, 2016, 5:06:49 AM11/2/16
to Haxe
Thank you, Daniel and Hugh.

The architecture of the function I sent is such that "od" is used to decode ogg in chunks, and is deleted only when all the samples have been read. So there are two more functions: native_oggdecoder that takes the object output by loadogg and reads samples from it into the buffer, and native_destroyogg that clears the data structure when done. So loadogg is meant to leave some objects lying around in memory it can use. And I am using AutoGCRoot to mean "hold on to this memory, even when haxe part does not need it any more".

> But enter/exit blocking is not about about preventing a collect, but about telling the other threads that they can collect if they like, and to not wait for you because you might be "blocking" on some IO operation
Ok, I am confused again, now. What does enter_blocking do? Does it just mean that the values allocated on that thread will not be garbage collected for the time being? If so, I have been using it in a very inappropriate way.
However - without wrapping everything in enter-exit_blocking calls, I used to get a lot of that "finalizer" error - because one thread was trying to allocate while another was in a GC collect cycle. How should I then go about avoiding these situations?

Best,
Margus


bufferToArrayPtr() makes val_to_buffer and buffer_size calls - which I would assume do not allocate any extra memory?

Hugh

unread,
Nov 2, 2016, 9:21:40 PM11/2/16
to Haxe
The key to understanding the enter/exit blocking is to realize that the gc system is co-operative.
When one thread wants to do a collection, it must wait for all the other threads to stop and acknowledge that they are ready for this.
Usually, this happens in a allocation call where there is a built-in check for just this.

But if your thread is doing something like a network-file-read, there might be no gc calls, and it might take 10s of seconds, and this would prevent the other thread for collecting because it is waiting for your thread to "check in", which would stall the whole system.

So before you enter a a long period of potentially doing no gc-calls, you can "enter blocking mode".  This records the state of the call stack, so it can be marked even without your thread's help, and tells the other gc threads that they can do a collect without needing to wait for your thread to acknowledge.

I hope this gives a better picture of what is actually going on.

Margus Niitsoo

unread,
Nov 3, 2016, 2:50:02 PM11/3/16
to Haxe
Thank you. It really does.

However, I still need a practical answer to the question of "Finalizer" errors that crop up because GC is already running on another thread. How can this happen at all, and how can I prevent this?

Best,
Margus

Hugh

unread,
Nov 4, 2016, 12:50:29 AM11/4/16
to Haxe
For starters, to be safe, I would move the cffi calls outside the enter/exit block, ie 
  array_with_ptr * apt = bufferToArrayPtr(&contents);
double beg = val_float(hx_beg), end = val_float(hx_end);

Then, turn attention to the thread doing the collecting.  It seems that the thread shown here is the main thread, so more likely something funny is going on with the other one.
If you log out changes to the "sgCollecting" state
and your enter/exit calls and 
The key point is I think that you should not be able to exit the gc free zone while sgCollecting is true because of the mutex:

It might be also that the main thread somehow got de-registered, or one of the syn primitives is buggy, so it is not being waited on in:
Logging the count here might help.

If you can give a simple example of it breaking, that would help.

Margus Niitsoo

unread,
Nov 19, 2016, 5:24:25 AM11/19/16
to Haxe
Thank you again for all your help and for a great tool in HXCPP. After finally understanding how the pieces fit together, and getting rid of most of the gc_enter/exit_blocking blocks, everything works as intended. iOS app has hundreds of sessions each day and so far I have not seen any new GC crashes crop up, so I think it is safe to say it is indeed fixed.
 
So, a quick recap of what I have learned, in practical terms, in case others find it useful:

* GC needs to be initialized for each thread created outside the haxe world with gc_set_top_of_stack.  AutoHaxe class (from NME but duplicated in https://groups.google.com/forum/#!msg/haxelang/V-jzaEX7YD8/wyRXnnEOBQAJ) is a useful tool for that but you have to make sure that all the other CFFI calls in the native side take place in the scope of that if they are on an outside thread.

* gc_enter_blocking and gc_exit_blocking should only be used if you are doing a lengthy calculation that does not involve *any* CFFI actions, so alloc_xx, val_call, buffer_append are all off limits inside the block and it is probably also best to steer clear from even val_xx calls,

* if you need to keep a value around in the native world, wrap it in AutoGCRoot and use its get() to retrieve the pointer. Those pointers do not get invalidated by the GC. 

Thank you again for all your help,
Margus
Reply all
Reply to author
Forward
0 new messages