Thanks Jan,
As I said, this is a load test which we're running, so it does make sense that we'll see a contention on such a component, given that we have a high rate of send/receive operations.
The problem is, that even with a low rate of transactions, we are still seeing CPU spikes due to this contention.
I agree that a Dictionary Add/Remove operations should be fast enough and therefore the time spent within the SpinLock is expected to be very short, but still, the profiler is pretty accurate and consistent about this. It repeatedly shows that most of the CPU is spent exclusively on clr!ThreadNative::SpinWait.
There are rare cases where SpinLocks are will perform better than taking a lock. There are several warnings on
MSDN about using this struct.
In general, it should be used in scenarios where waiting for the lock is going to take a super short time, or in other words - in cases where spinning is going to be better then context-switching, because there are good chances that the thread is going to get this lock very soon.
I just ran the load test again with 50 concurrent users, and I see here 11 seconds where 98% of the CPU frames were spent on this SpinWait (In this case the overall CPU during that time is estimated to be 34% by PerfView, so that means that the 33% of the CPU was spent just on that, and blocking other operations during that time).
Maybe this lock was seen as a bottleneck in the past and was replaced by a SpinLock?
I connected now with a debugger at some random time where I had a noticeable pause. Maybe that additional information can be helpful:
There are 24 threads blocked on CompletionRegistry.Register(), spending time on that SpinLock.ContinueTryEnter (spinning / sleeping).
I see another thread which is busy on CompletionRegistry.Extract, looking for an item in the Dictionary. The amount of items in the dictionary at that time is 150.
I repeated that experiment few times, and in all cases this is what I see: Several threads trying to add an item to the dictionary that are waiting for a thread that is currently removing an item from the dictionary.
In all those cases, the Dictionary's count is 150.
By seeing each time several / dozens of threads waiting/spinning/yielding eventually, it seems that during these spikes at least, using SpinLocks is not recommended.
Additional info from one of the runs:
environment.DebugStats.PendingBatchCompletions = 104.
GrpcThreadPool.queuedContinuationCounter = 17
Maybe the problem relies in the RunHandlerLoop? Maybe it's calling CompletionRegistry.Extract too many times in a short period?
Here's a partial snippet of that method:
Here's the call-stack of the thread that's calling CompletionRegistry.Extract (it is always captured at that state, as if calling Dictionary.FindEntry is taking a non-negligible amount of time):
Child-SP RetAddr Call Site
000000f0`003fdc28 00007ffe`3aca85c3 ntdll!NtWaitForSingleObject+0x14
000000f0`003fdc30 00007ffe`2717a247 KERNELBASE!WaitForSingleObjectEx+0x93
000000f0`003fdcd0 00007ffe`2717a1ff clr!CLREventWaitHelper2+0x3c
000000f0`003fdd10 00007ffe`2717a180 clr!CLREventWaitHelper+0x1f
000000f0`003fdd70 00007ffe`272e0b8e clr!CLREventBase::WaitEx+0x7c
000000f0`003fde00 00007ffe`2717b6a1 clr!WKS::GCHeap::WaitUntilGCComplete+0x2b
000000f0`003fde30 00007ffe`272daa2c clr!Thread::RareDisablePreemptiveGC+0x180
000000f0`003fdec0 00007ffe`271310c6 clr!Thread::RedirectedHandledJITCase+0x1bf
000000f0`003fdf70 00007ffe`25f8115f clr!RedirectedHandledJITCaseForGCThreadControl_Stub+0x26
000000f0`003fdfb0 00007ffe`26aff0d0 mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].FindEntry(System.__Canon)+0xffffffff`fffe851f
000000f0`003fe000 00007ffd`c7de6923 mscorlib_ni!System.Collections.Generic.Dictionary`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].get_Item(System.__Canon)+0xb69110
000000f0`003fe030 00007ffd`c7b5e952 Grpc_Core!Grpc.Core.Internal.CompletionRegistry.Extract(IntPtr)+0x93
000000f0`003fe090 00007ffe`25f95ab3 Grpc_Core!Grpc.Core.Internal.GrpcThreadPool.RunHandlerLoop(Grpc.Core.Internal.CompletionQueueSafeHandle, Grpc.Core.Profiling.IProfiler)+0x72
000000f0`003fe0f0 00007ffe`25f95944 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x163
000000f0`003fe1c0 00007ffe`25f95912 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x14
000000f0`003fe1f0 00007ffe`25f8f0c2 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x52
000000f0`003fe240 00007ffe`271368d3 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()+0x52
000000f0`003fe280 00007ffe`27136768 clr!CallDescrWorkerInternal+0x83
000000f0`003fe2c0 00007ffe`27136e5d clr!CallDescrWorkerWithHandler+0x4e
000000f0`003fe300 00007ffe`27212cd9 clr!MethodDescCallSite::CallTargetWorker+0xf8
000000f0`003fe400 00007ffe`27137621 clr!ThreadNative::KickOffThread_Worker+0x109
000000f0`003fe660 00007ffe`2713759c clr!ManagedThreadBase_DispatchInner+0x39
000000f0`003fe6a0 00007ffe`271374cd clr!ManagedThreadBase_DispatchMiddle+0x6c
000000f0`003fe7a0 00007ffe`2713765b clr!ManagedThreadBase_DispatchOuter+0x75
000000f0`003fe830 00007ffe`27212bbb clr!ManagedThreadBase_FullTransitionWithAD+0x2f
000000f0`003fe890 00007ffe`271742bf clr!ThreadNative::KickOffThread+0xdb
000000f0`003fe960 00007ffe`3b847974 clr!Thread::intermediateThreadProc+0x86
000000f0`003ff9a0 00007ffe`3e59a2f1 KERNEL32!BaseThreadInitThunk+0x14
000000f0`003ff9d0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Thanks, and sorry for the long email :)
Tomer.
ב-יום שני, 7 בפברואר 2022 בשעה 13:00:49 UTC+2, Jan Tattermusch כתב/ה: