gc and threading

52 views
Skip to first unread message

Ledion Bitincka

unread,
May 15, 2019, 1:56:56 PM5/15/19
to v8-users
I'm trying to improve some serialization code in NodeJS and was wondering if this could be achieved by going down to the native code and using multiple threads to parallelize the serialization of multiple objects. For example, imagine we need to serialize 1K objects to JSON/OtherFormat - split into 4 x 250 batches and use 4 (non-main) threads to serialize the objects, then gather the results and return.  However, I'm having a bit of a hard time understanding a few concepts:

1. when does the GC kick in for an Isolate? my understanding is that any V8 code executed in an Isolate can trigger GC for that Isolate  - is this correct?
2. when GC triggers, what happens to pointers that might have been extracted from Handle<Object> and the Handle is still in scope? 

HandleScope scope(isolate);
Handle<Object> foo = ...
Object* fooPtr = *foo;
someCallThatTriggersGC()
// is fooPtr still valid???
 


Jakob Kummerow

unread,
May 15, 2019, 2:34:11 PM5/15/19
to v8-users
I'm trying to improve some serialization code in NodeJS and was wondering if this could be achieved by going down to the native code and using multiple threads to parallelize the serialization of multiple objects. For example, imagine we need to serialize 1K objects to JSON/OtherFormat - split into 4 x 250 batches and use 4 (non-main) threads to serialize the objects, then gather the results and return. 

While I understand that this is tempting, please be aware that only one thread may be active in one Isolate at any given time. Lifting this restriction would require huge effort, and/or only apply in limited circumstances (e.g., when an object has a custom toJSON function, you would have to bail out somehow, because such a function could cause arbitrary modifications to any other object).
 
However, I'm having a bit of a hard time understanding a few concepts:

1. when does the GC kick in for an Isolate? my understanding is that any V8 code executed in an Isolate can trigger GC for that Isolate  - is this correct?

Any allocation on the managed heap can trigger GC. Of course, when you don't know what code you're calling (e.g., a user-provided function), then you have to assume that it might allocate. We use DisallowHeapAllocation scopes to guard sections where we are sure that no allocation (and hence no GC) can happen.

2. when GC triggers, what happens to pointers that might have been extracted from Handle<Object> and the Handle is still in scope? 

HandleScope scope(isolate);
Handle<Object> foo = ...
Object* fooPtr = *foo;
someCallThatTriggersGC()
// is fooPtr still valid???
 
Raw pointers will become stale, no matter where they came from. So in this example, fooPtr will be invalid; foo will still be valid (that's the point of having Handles). 

Ledion Bitincka

unread,
May 15, 2019, 3:09:18 PM5/15/19
to v8-users
Thanks!

> While I understand that this is tempting, please be aware that only one thread may be active in one Isolate at any given time.
I was hoping that there could be multiple threads that had "read-only" access to an Isolate's heap - I was looking through how ValueSerializer works and now I understand there's no such thing as "read-only" given the getter/setter functions. However, wondering if this would be possible for simple objects (key/value) and bail for more complex ones. Any other suggestions for how that work can be parallelized, is it even possible? (this is custom serialization, not JSON)

Peter Marshall

unread,
May 16, 2019, 9:18:06 AM5/16/19
to v8-users
On Wednesday, May 15, 2019 at 9:09:18 PM UTC+2, Ledion Bitincka wrote:
Thanks!

> While I understand that this is tempting, please be aware that only one thread may be active in one Isolate at any given time.
I was hoping that there could be multiple threads that had "read-only" access to an Isolate's heap - I was looking through how ValueSerializer works and now I understand there's no such thing as "read-only" given the getter/setter functions. However, wondering if this would be possible for simple objects (key/value) and bail for more complex ones. Any other suggestions for how that work can be parallelized, is it even possible? (this is custom serialization, not JSON)

You could read from the heap on a concurrent thread but there is no synchronization when writing to the heap from the main thread, so there's no guarantee that what you are reading is not being concurrently written e.g. when it is being allocated, when it is modified by user JS code or when the GC moves it.

If the serialization work itself was particularly expensive (e.g. the format is very complicated or the data requires a lot of processing) then you could copy the relevant parts of the heap objects off-heap and then serialize from concurrent threads.

Ledion Bitincka

unread,
May 17, 2019, 11:43:08 AM5/17/19
to v8-users
>You could read from the heap on a concurrent thread but there is no synchronization when writing to the heap from the main thread

My current thinking is something like this - ie I'd be blocking in the main thread until the serialization is done
in main thread:
 handles
[] = getHandlesToSerialize();
 results
[] = []
 threads
[] = spawnThreads(N, handles, results);

 join
(threads);


One problem though is that I can't use the Handles directly in other threads as they'd require access to the Isolate and thus would require synchronization to enter/lock the Isolate. So the only option, as you alluded to, is to copy off-heap then serialize, which I need to test to see if it would result into any perf benefits due to the high setup cost for going multi-threaded. Maybe I'll find a way to actually keep pointers to underlying data rather than actually copy - will report here what I find.

Peter Marshall

unread,
May 20, 2019, 5:26:08 AM5/20/19
to v8-users
If you block the main thread at a safe time (e.g. not during GC) then you can probably access heap objects from your other threads without handles as long as you do your own synchronization between the background threads.
Not sure how concurrent marking threads from the GC will feel about that though.
Reply all
Reply to author
Forward
0 new messages