HXCPP CFFI and multithreading?

Margus Niitsoo

unread,

Oct 7, 2016, 8:57:31 AM10/7/16

to Haxe

We have a somewhat odd use case of building a plug-in for a Cordova application in HaXe, with some native code for audio playback and recording thrown in. We recently switched to 3.3.0-rc1 and have had problems since:

Firstly, since __hxcpp_lib_main() goes into an event loop and blocks, we run it on a separate queue. On iOS the init code looks like this:

dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0),^{

std_main();

regexp_main();

__hxcpp_lib_main(); // main in mms_native

});

On Android, the new thread is created in the java portion, and then same commands are called from that new thread.

We are wrapping all our val_call-s from the native side with:

struct AutoHaxe {

int base;

const char* message;

AutoHaxe(const char *inMessage) {

base = 0;

message = inMessage;

gc_set_top_of_stack(&base, true);

}

~AutoHaxe() {

gc_set_top_of_stack(0, true);

}

};

And everything works except when things are called from inside Timer.delay() in HaXe that lead back to native code callbacks, at which points things crash horribly on both iOS and Android.

I am pretty sure it is a threading issue, but I'd like to understand what might be the case. So I have been looking around for resources on what to do, and found just one relevant source which was a post from this group:

https://groups.google.com/forum/#!topic/haxelang/SUUWKukd0sY

It advocates set_top_of_stack as the solution, which AutoHaxe provides, but I see no way of applying it to the Timer.delay -ed calls

I have a number of questions I would like help with:

A) Is there a way of starting up HaXe main function in a non-blocking way, so it does not need to be delegated to a separate thread?

B) How do I register a new thread for the haxe world so that it can access AutoGCRoot -ed variables and val_call without segfaulting?

C) What does gc_set_top_of_stack do, actually?

D) What does AutoGCRoot do? My understanding is that it wraps a value and makes sure that the value is kept around by HaXe GC, but on Android, we are seeing the agc -> get() return an invalid pointer now, so this does not seem to quite be the case...

E) An overview of how the GC works in the multi-threaded environment, i.e:
E1) what is it's basic algorithm? (reference to a book or wikipedia article would help)

E2) Is it always run on all objects, or just those created on a given thread?

E3) Does it change "value" pointer values, and if so, what does wrapping AutoGCRoot around the value change?

In general, I feel a need for a good guide on HXCPP plugin development, as currently the best source of reference seems to be the source code itself. So any links to useful material in response to those issues would definitely be appreciated :)

A few words on how we got into this situation, and why I really love HaXe:

I am working as a lead developer in a startup called MatchMySound where we analyse audio. Our main product runs in javascript, but since we started out at a time where HTML did not allow audio recording, our analysis and recording/playback code was written in HaXe to compile into Flash. When javascript recording was possible, we changed the native part to allow us to use that, without having to re-write our complex analysis algorithms. And now, when porting for Android and iOS the story is the same: only the small native part that gives us access to audio hardware needs re-writing while most of the complex code remains the same. So we can cover 4 different platforms with one language. Now, if only we could get this threading thing sorted.

Margus

Margus Niitsoo

unread,

Oct 7, 2016, 10:10:18 AM10/7/16

to Haxe

A small update on (D) - it was a problem in our own code where we accidentally called delete on the agcr object prematurely ourselves, so my original understanding of AutoGCRoot as a wrappre that tells GC to keep the value around is still viable. Would still like a confirmation that this is indeed the case from someone more experienced, though.

Best,

Margus

Hugh

unread,

Oct 10, 2016, 12:57:34 AM10/10/16

to Haxe

Hi,

Blocking on the haxe main call will depend on how you write the haxe code.

If you do "while(true) { processEvents(); }" then yes, it will block.

What nme does in its main loop on android is to register a callback handler in the main routine, and then exit.

The Os/framework/app then makes calls to the handler when it needs to (frame refresh, touch event, timer etc).

The threading needs some care as you have seen.

Hxcpp uses "conservative marking" for threads, which means it needs to scan the thread stack for things that look like pointers so they can be marked as being in use. This is a pretty standard technique, but I guess implementations differ on how much the control the app takes control over threads.

The "top of stack" calls record the memory location of a stack variable, so it can work out which memory locations to scan. If you do not set this, then hxcpp will not be able to see your local variables, and they will get collected by the Gc.

There are also resources required for allocations that each thread owns. By attaching with "set top of stack", the thread acquires these resources, and when you detach (with ~AutoHaxe), the resources are released. This way a foreign thread can "visit" hxcpp and later terminate without holding resources. This is similar to Java's Attach/DetachCurrentThread.

All attached threads must also co-operate with the Gc. When one thread wants to do a gc, it waits for all the active threads to agree. If you leave an attached thread in a blocking routine, you may cause the other threads to blocks too. To avoid this, the thread can use the enter/exit gc_free_zone calls.

Depending on your code, you may run into problems with re-entrant set-top-of-stack calls, if you call set_top_of_stack recursively. There should be code to detect this, but something to watch out for.

Typically with a timer, you could use a native trampoline function to make the haxe callback (via a GCRoot) from a native callback which first wraps the call with an AutoHaxe object.

Another way of skipping the issue entirely is to keep the threads separate. Your worker (native) thread pumps data into a cpp buffer, and the haxe code periodically polls this buffer from its main thread. Or maybe if its the other way around, the haxe code does all the work in a loop which waits for commands, and pumps the data back into the app (android/ios) via some native cpp buffer.

Hugh

Margus Niitsoo

unread,

Oct 10, 2016, 4:41:34 AM10/10/16

to Haxe

Thank you, Hugh, this really cleared up a lot of things for me.

I have a few more questions to make sure I understood the answers properly, though.

a) Did I understand correctly that top_of_stack needs to be called in native code whenever a haxe-world "value" is internally allocated there (i.e. a buffer that will be written to in subsequent calls and eventually released to haxe world via a val_call), and failure to do so may result in the GC not seeing this value? If so, why is set_top_of_stack not called automatically whenever a call is made to a native function, i.e. why is there a need to call it manually at all?

b) Similarly - do I understand correctly that the safest GC-wise is to sandwitch all my native code between enter_blocking/exit_blocking to make sure no "values" that I use get moved around in memory while the function is running? If so, then what is the reason for not doing that automatically (yes - there are cases where you want GC to run, but you could always call exit_blocking followed by enter_blocking, or use safe_point - and these cases are at least to my mind, in a small minority, as I would guess most native calls to be short enough to not need that)

Also, as an aside - could you explain what do the parameters passed to set_top_of_stack mean, exactly?

Thank you,

Margus

Hugh

unread,

Oct 11, 2016, 1:11:49 AM10/11/16

to Haxe

a) Yes, that is right. And since any call to a haxe function might alloc something, you need it before making a haxe call too.

When you call from haxe to native, the top-of-stack should already be set, since the calling thread would already have needed set this up.

If you then call back into haxe from this native function, you do not need to re-setup the stack either, since it should still be good from when haxe set it up.

You will need to manually call it if you detach the thread, or if it is a "foreign thread" - ie, allocated by the OS. Typically, these might be from an "on needs new sample" native audio callback, or a touch event on android, or some native timer or an ios async-block. It really depends if you created the thread with "Thread.create" in haxe. (or you are bootstrapping the main class).

b) I guess you must decide if your thread is "normally attached" or "normally detached". Haxe created threads are always "normally attached".

If you are normally attached, you need to enter-blocking if you could potentially block.

If you are normally detached, you would typically attach(AutoHaxe) - make some kind of callback - and then detach. The thread can now safely block, or exit, and do whatever it wants, so no need to call enter-blocking.

The enter/exit blocking calls are mildly expensive. So no need to bother if you are going to do, say 1ms worth of work. The worst that can happen is that all the other haxe threads stall for 1ms while the wait for you to complete. But if you are doing something open-ended - eg, network io may take tens of seconds, you should enter blocking mode. Especially if you are waiting for a mutex that a haxe thread may potentially hold - you could deadlock pretty easily. All the haxe std-io calls have this built-in. So efficiency is the main reason for not going crazy with this call.

set_top_of_stack(top,force)

top = the address of a local variable, which will become the top-of-stack. All local variables and arguments will be pushed "below" this (stacks go "downwards") by the c++ compiler

= 0 means detach thread from hxcpp

force = false. This case has one very specific use - it is for the application main thread that initializes haxe and then may call into haxe from different stack positions internally. By using non-force, hxcpp will move the top-of-stack up, but not down, ensuring all stack locations are covered. It is really for a simple single-threaded app, and best not to use this.

force=true. The meaning of this changed a bit, but force basically means "push/pop mode" (rather than ratchet mode). This keeps track of re-entrant stack calls and adjusts the stack appropriately.

So:

(_,false) -> do not use directly

(&i,true) -> Ensure hxcpp sees stack variables "below" i, and increase thread attach count

(0,true) -> detach one count from hxcpp

Hugh

Margus Niitsoo

unread,

Oct 11, 2016, 12:54:23 PM10/11/16

to Haxe

Ok, perfect. Thank you!

This helped me wrap my head around the issue we were having. On iOS, I was populating the first 3 buffers on playback from the same thread that the "start" command came from - which had HaXe initialized already and on which calling AutoHaxe screwed things up very badly with a segfault at a fairly random place (alloc_abstract created as the return value from the play function). However, the subsequent calls to fill the buffer came on their own thread, and needed to be wrapped in AutoHaxe. Once I disabled AutoHaxe for the initial calls, everything started working.

Thank you again, and all the best,

Margus

Hugh

unread,

Oct 12, 2016, 12:18:37 AM10/12/16

to Haxe

It might be a bug that the main thread gc gets released when AutoHaxe does its thing.

It probably needs an extra reference count, or a more unified way of initializing.

Margus Niitsoo

unread,

Feb 1, 2017, 9:44:45 AM2/1/17

to Haxe

With hxcpp 3.4.43 I managed to get debug symbols working so that I got reasonable stack traces. That finally helped me figure out what the problem was.

The key word I overlooked in your post was "local variable". So I called

AutoHaxe * haxe = new AutoHaxe("ah");

That, needless to say, did not quite work as intended :)

M.

Reply all

Reply to author

Forward