CpuProfiler sampling and multiple threads

47 views
Skip to first unread message

Alex Kodat

unread,
Aug 2, 2020, 4:04:03 PM8/2/20
to v8-users
We're using the CpuProfiler in an Isolate that can have multiple threads running and and taking turns in the Isolate via Unlocker. Frequently, we want to profile what's happening on a thread so we do a StartProfiling for a thread, let it run for a while, and then collect the results via StopProfiling. Works great.

However, I've noticed a difference between Windows and Linux in the profiles, specifically, Windows collects samples for a thread when it's inside an Unlocker bracket whereas Linux does not. So if we have a native function sleep(), that unlocks the Isolate, in Windows I see a high hit count for stacks that have functionName "sleep" and sourceType callback in the bottom stack entry. In Linux, there are no hits where I'm inside the sleep native function. For our purposes, the Windows behavior is much nicer as we can easily see how much of the thread's time is being spent in sleep. In Linux, there is absolutely no way of knowing because the information is not captured.

Unfortunately, Windows nicer (IMO) behavior comes at a cost. Specifically, if indeed other threads do run in the Isolate while the profiled thread is sampled when unlocked, there is a tendency for core dumps while getting the stack trace. This is unsurprising because the stack trace needs to look in the Isolate's heap to get things like function names and that's not really a good idea when a thread doesn't have the Isolate lock.

The following code samples are all from src/libsampler/sampler.cc.

Under Linux the profiler protects itself in SamplerManager::DoSample (called by the SIGPROF handler) with:

  for (Sampler* sampler : samplers) {
    ...
    if (v8::Locker::IsActive() && !Locker::IsLocked(isolate)) continue;
    sampler->SampleStack(state);
  }
  
That Locker::isLocked is a bit of a misnomer and really means Locker::isLockedByCurrentThread and, because our SIGPROF handler runs on the profiled thread, if that tread is in an Unlocker bracket in a native function, isLocked returns false and we don't collect a sample. 

In Windows, the sampler does not run on the profiled thread and, instead, does a SuspendThread of the profiled thread from another thread and then uses GetThreadContext:

void Sampler::DoSample() {
  HANDLE profiled_thread = platform_data()->profiled_thread();
  if (profiled_thread == nullptr) return;

  const DWORD kSuspendFailed = static_cast<DWORD>(-1);
  if (SuspendThread(profiled_thread) == kSuspendFailed) return;

  // Context used for sampling the register state of the profiled thread.
  CONTEXT context;
  memset(&context, 0, sizeof(context));
  context.ContextFlags = CONTEXT_FULL;
  if (GetThreadContext(profiled_thread, &context) != 0) {
     ...
     SampleStack(state);
  }

There is no check if the Isolate is actually locked by the profiled thread so we can get a sample for it but also, unfortunately, seg faults and the like if another thread is running.

So it seems that there really should be a check in Sampler::DoSample for Windows to check if the Isolate is locked by the profiled thread. This would require maybe adding another method to Locker like isLockedByThread(ThreadId id) or something like that and then checking that in Sampler::DoSample. When PlatformData is instantiated for Windows it could stash the ThreadId for the current thread so it could be retrieved in Sampler::DoSample. Doesn't seem all that daunting and am willing to put together a PR for this if this seems reasonable and unless someone more conversant with V8 internals doesn't pick it up.

However, while this fix would eliminate Windows seg faults when doing profiling in a multi-threaded environment, it would also make me sad because now we would have no way of getting time spent unlocked in my native function. To provide this functionality, it seems like it would be nice to be able to create a provisional TickSample before the Isolate is unlocked in a native function and use that in DoSample if the Isolate is not locked by the profiled thread. Obviously, this would/should only be done if the thread is being profiled.

There are a lot of ways this could work but a straw man is that there could be CpuProfiler methods called SaveProvisionalThreadSample and ClearProvisionalThreadSample. The former, at least, would have to be called while one holds the Isolate lock so one would presumably do such a call right before Unlocker instantiation and then do a ClearProvisionalThreadSample call right after the end of the Unlocker context. Then, if DoSample decides that Isolate is not locked by the sampled thread, it could look for a saved provisional sample for the thread and, if available, use that for its sample. Note that a provisional TickSample could be used many times as a thread might be unlocked over many samples.

Again, assuming no one else picks this up, happy to take a swing at a PR, myself. While more complicated than the Windows fix, it doesn't seem that daunting.

Opinions?
 
Sorry if this would have been more appropriately posted on v8-dev.

Thanks
 

kosit la-orngsri

unread,
Aug 2, 2020, 6:00:36 PM8/2/20
to v8-users

สวัสดี
ในวันที่ วันจันทร์ที่ 3 สิงหาคม ค.ศ. 2020 เวลา 3 นาฬิกา 04 นาที 03 วินาที UTC+7 alex...@gmail.com เขียนว่า:

Alex Kodat

unread,
Aug 3, 2020, 12:17:34 AM8/3/20
to v8-users
OK, uncool to reply to my own post but perhaps a more efficient way to handle failed sample attempts because a thread does not have the Isolate locked is that in that scenario a Sampler could queue itself off the ThreadState or PerIsolateThreadData or some thread-specific block. When the Unlocker is destroyed, it could see Samplers on the thread-specific object and call them to collect their sample.The Sampler could maintain a count of deferred samples because, of course, more can one occur while a thread has the Isolate unlocked.This capability could easily be controlled as a CpuProfiler setting or on the StartProfiling call.With this approach there would be no extra overhead in getting samples for unlocked threads. 

kosit la-orngsri

unread,
Aug 3, 2020, 12:48:00 AM8/3/20
to v8-users
สวัสดี

ในวันที่ วันจันทร์ที่ 3 สิงหาคม ค.ศ. 2020 เวลา 11 นาฬิกา 17 นาที 34 วินาที UTC+7 alex...@gmail.com เขียนว่า:

Alex Kodat

unread,
Aug 3, 2020, 7:58:25 AM8/3/20
to v8-users
OK, now I'm replying to myself a second time. Really sorry. I realized we don't need any enhancement to sampling other than getting the non-check for Isolate locked by debugged (suspended) thread fixed in Windows. For sampling when Isolate is unlocked I can simply get time before and after an unlock on a profiled thread, do a StackTrace::CurrentStackTrace() and use that information to merge with the CpuProfiler::StopProfiling results, converting timer units to ticks. Not exactly rocket science.

So the simple question then is should I do a PR to fix the sampling issue on Windows?

Thanks and sorry about all the noise.


On Sunday, August 2, 2020 at 3:04:03 PM UTC-5, Alex Kodat wrote:

Jakob Kummerow

unread,
Aug 3, 2020, 8:14:33 AM8/3/20
to v8-users
Yes, a (small?) CL to get sampling on Windows to parity with how it's already working on Linux sounds good. 

(I'm not familiar with the specifics of how sampling works or should work.)


--
--
v8-users mailing list
v8-u...@googlegroups.com
http://groups.google.com/group/v8-users
---
You received this message because you are subscribed to the Google Groups "v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-users/1bd503d6-ddec-4dca-9c76-55877b84e283o%40googlegroups.com.

Camillo Bruni

unread,
Aug 3, 2020, 8:16:28 AM8/3/20
to v8-users
No worries,
Rubberducking is quite efficient sometimes! ;)

Feel free to send a pull request and we can have a look at it!

cheers, 
Camillo

--
Reply all
Reply to author
Forward
0 new messages