Profiling Sample Context API proposal

21 views
Skip to first unread message

Attila Szegedi

unread,
Jul 9, 2025, 7:54:12 AMJul 9
to v8-...@googlegroups.com
Hi all,

TL;DR: we have sample context functionality in Datadog's Node.js profiler built on top of v8::CpuProfiler. It works but we want to improve on it. We attempted a few things that don't work very well. Our ultimate realization is that we need V8 itself to support this.

Longer version:

We'd like to propose that V8 should have a supported API for attaching context to profiler samples. As prior art, the Go programming language supports a similar feature already[1][2].

Speculatively, it could look like this pair of methods:
  void v8::CpuProfiler::SetSampleContext(v8::Local<v8::Value>)
and
  v8::Local<v8::Value> v8::CpuProfile::GetSampleContext(int index)
(similar to e.g. CpuProfile::GetSampleTimestamp(int index).)These would work when CpuProfiler::Start is called with record_samples=true to record individual samples and not just aggregates.
When the profiler captures a sample, it’d also associate the sample with the currently set sample context. A single context can get associated with any number of samples including zero if it was replaced with another SetSampleContext call before a sample was taken. On the other hand, if the context is not replaced for some time then it can end up associated with multiple samples as well.

A hopefully obvious requirement is that these sample contexts would also have to be propagated into continuations. The isolate would thus need to track "Continuation Preserved Profiler Data" similar to how it currently tracks ContinuationPreservedEmbedderData (CPED.) It would most likely need to be a map, as there can be multiple active CPU profilers at any given time, but I don't want to get into implementation details much off the bat.

We'd also vastly prefer if the profiler could work with being passed a Local<Value> as the above method signatures suggest. The other choice would be to represent the sample context as void*, but I think it's a suboptimal choice as then ownership of the value and correct freeing of whatever was passed in would become an issue and might even require additional assistance from the CpuProfiler, complicating the API. Using GC managed values frees us from this – if a value gets associated with a sample, it remains alive; if a value is replaced with another before it gets associated with a sample, it is eventually collected (presuming it is not referenced from elsewhere, obviously.) A compromise solution would be to have some other limited context representation, e.g. a map of string key-value pairs if that would be easier to implement.

As an added bonus, the concept of sample contexts could be extended to the HeapProfiler as well – we had customers ask for context information associated with our Heap Live Size reports. It might even be simpler to implement for HeapProfiler (seeing how that one doesn't require signal handling) but even having it for CpuProfiler first would be great.

I think this is enough as a discussion starter. Let me know what you think, we'd definitely like to collaborate on this!

I’m also attaching an expose on the background of this proposal, our current state-of-the-art and things we tried recently to improve it; it’s a bit lengthy and entirely optional so I saved it for after the proposal itself:

BACKGROUND
==========

At Datadog we ship a Node.js profiler built on top of V8 profiler. A significant functional addition we have is sample contexts – we add contextual information from our tracer to the samples. This is typically the span ID so we can correlate samples to traces as well as an endpoint ID (e.g. a HTTP method and a path.) The way we achieve this has several elements, but it mostly hinges on us intercepting the PROF signal, recording TimeTicks::Now before and after we delegate to V8's signal handler, and storing the current context in a ring buffer with the timestamps. Later when we’re processing a v8::CpuProfile, we find the element in the ring buffer whose timestamps bracket the CpuProfile::GetSampleTimestamp() value and associate it with the context object in that element.

Setting the current context from the JS side is currently implemented by registering a Node.js async_hooks.createHook callback. On each async context switch – be it a V8 promise or a Node.js timeout etc. we will pull the current tracing contextual information and call a setter on the profiler.
These contexts are JS objects, and on the profiler's C++ side the profiler has basically a single std::shared_ptr<v8::Global<v8::Value>> private field that is updated to the new object.
This approach works, but it has some drawbacks. Node.js wants to deprecate async_hooks.createHook so we want to move away from using it. Under the hood it uses v8::Isolate::SetPromiseHook which – AFAICT – triggers some deoptimizations.

Recently we tried a different approach based on Isolate::SetContinuationPreservedEmbedderData(), seeing how it's also used by default for implementation of AsyncLocalStorage API in Node.js 24+. The idea here was to associate the sample context with the object Node.js stores in CPED. This way, we get async context propagation for free and can stop using async_hooks callbacks.
Unfortunately, now our signal handler code would need to call Isolate::GetContinuationPreservedEmbedderData(), which returns a v8::Local<Value> thus it needs to create a handle. We did some proofs of concept, and everything worked fine to our own astonishment – until it didn't. Calling a V8 API that returns a Local<> risks triggering allocation of a new Address array block in HandleScope::Extend, which is not signal safe. We were able to produce several fun looking crashes and even malloc deadlocks with it.

We could spend time trying to fix this. Like, use the ucontext passed to the signal handler to identify whether "known dangerous" methods are on stack – anything that has the chance to mutate isolate->handle_scope_implementer()->blocks(), but also __GI__malloc and probably few others. But it’s awfully brittle ensuring we covered all possible bases.

We don’t see an external way to implement a sample context solution that is signal safe, follows continuations, and doesn’t require a SetPromiseHook.

Hence the above proposal.

Attila.

--
[1] https://rakyll.org/profiler-labels/
[2] https://felixge.de/2022/02/11/connecting-go-profiling-with-tracing/

Leszek Swirski

unread,
Jul 9, 2025, 10:02:06 AMJul 9
to v8-...@googlegroups.com
Hi Attila,

This is certainly an interesting use case. We do have something partially like this internally already, with a concept of VMState that is sampled along with the current stack on every tick (the isolate holds the current vm state, and a VMState scope updates it). This isn't preserved across continuations, nor is it generic objects (it's just an enum value), but I could imagine exposing it on the API with embedder defined enums.

For all the extra stuff though -- it's difficult. Preserving this state across continuations with an additional field to the existing CPED means cooperation with the embedder, since some continuations (like setTimeout on the web) are controlled and connected to each other by the embedder rather than by V8 itself. AsyncContext would be the more extensible way of defining context-preserved data, but afaiu that proposal is stalled on precisely the issue of "how are continuations defined" when leaving pure JS. Also, the request to have a Local<Value> is unrealistic: the issues you observed in your prototype that attempted to use CPED would pop up again in any code that tried to interact with the GC as part of a signal handler. The reality here is that the signal handler has to be very fast (since it's interrupting the very main thread execution it's trying to measure), very simple, and very robust to being called literally whenever, so interaction with the GC is almost impossible (the profiling signal handler could get called in the middle of a GC!) -- you'd have to figure out some way to make it work with integers or raw addresses.

Aside from that, the business reality we face is that this is unlikely to be used by Chrome, which means we'd have a very limited mandate to support it. We'd review a patch, if it's not too invasive, and we might ask that it's hidden behind `ifdef`s so that it doesn't affect the core project. If there was a web API proposal for defining labels that would then be displayed by devtools (similar to the go functionality as I understand it) then maybe, but then that's a much much larger project scope and we wouldn't be able to drive it unless there was a shift in priorities.

Hope that all makes sense,
Leszek

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/v8-dev/CADatEgYVoXxh5Z%2BLzzTFp9Ea62EQDHDKtmSuyQFqJ9LQJZb-eg%40mail.gmail.com.

Attila Szegedi

unread,
Jul 9, 2025, 12:15:25 PMJul 9
to v8-...@googlegroups.com
Hi Leszek, 

Thanks for your response and consideration. Some replies inline.

On Wed, Jul 9, 2025 at 4:02 PM Leszek Swirski <les...@chromium.org> wrote:
Hi Attila,

This is certainly an interesting use case. We do have something partially like this internally already, with a concept of VMState that is sampled along with the current stack on every tick (the isolate holds the current vm state, and a VMState scope updates it). This isn't preserved across continuations, nor is it generic objects (it's just an enum value), but I could imagine exposing it on the API with embedder defined enums.

For all the extra stuff though -- it's difficult. Preserving this state across continuations with an additional field to the existing CPED means cooperation with the embedder, since some continuations (like setTimeout on the web) are controlled and connected to each other by the embedder rather than by V8 itself. AsyncContext would be the more extensible way of defining context-preserved data, but afaiu that proposal is stalled on precisely the issue of "how are continuations defined" when leaving pure JS.

Fortunately, I don't see this as a problem. As long as V8 manages the propagation across linked continuations it manages itself, the embedder can bridge across its own logical continuations by calling explicit SetCPED calls when the context changes in embedder-specific ways. Node.js does exactly this on setTimeout, setImmediate, IO callbacks etc. and we could have a way to listen for these as events and update our own profiling context too. I think this is all pretty much as it was meant to be designed.
 
Also, the request to have a Local<Value> is unrealistic: the issues you observed in your prototype that attempted to use CPED would pop up again in any code that tried to interact with the GC as part of a signal handler. The reality here is that the signal handler has to be very fast (since it's interrupting the very main thread execution it's trying to measure), very simple, and very robust to being called literally whenever, so interaction with the GC is almost impossible (the profiling signal handler could get called in the middle of a GC!) -- you'd have to figure out some way to make it work with integers or raw addresses.

At least for GC, we can register GC prologue/epilogue callbacks, capture the current context in a prologue call, and short-circuit using that while the GC is in progress (running under the assumption that the context won't change concurrently with GC running.)

As for using a Local<Value> on the API boundary, a strategy that can work is immediately converting it into a std::shared_ptr<Global<Value>> with std::make_shared (setting of the context happens outside the signal handler, during ordinary JS execution so we can do whatever) and then the signal handler "just" needs to execute operator= of that shared_ptr to copy it into a sample-specific struct, which is safe. It's possible that V8 already has a more efficient way to internally keep a handle on an object in a GC-safe manner, though (I haven't yet looked into how CPED itself is implemented.) The profiler would need to have that shared_ptr propagated across continuations. The signal handler thus doesn't have to deal with the value at all, just execute shared_ptr::operator=.

Of course, we can run with a raw pointer solution too, with the caveat that we'd conservatively have to keep allocated all the memory pointed to by every pointer we ever passed to CpuProfiler::SetSampleContext until we call CpuProfiler::Stop and process the resulting CpuProfile. That does increase the memory pressure but is in itself not a blocker, especially if we figure out how to make the pointer set bounded (which is the tricky part.)
 
Aside from that, the business reality we face is that this is unlikely to be used by Chrome, which means we'd have a very limited mandate to support it. We'd review a patch, if it's not too invasive, and we might ask that it's hidden behind `ifdef`s so that it doesn't affect the core project. If there was a web API proposal for defining labels that would then be displayed by devtools (similar to the go functionality as I understand it) then maybe, but then that's a much much larger project scope and we wouldn't be able to drive it unless there was a shift in priorities.

Hope that all makes sense,
Leszek

All understood. In our case, as long as we can get Node.js to build with those ifdefs enabled that should be viable. Again, thank you for your consideration – I'll see if I can come up with a not too invasive proof of concept :-)

Attila.
 
Reply all
Reply to author
Forward
0 new messages