Hi Bill,
No problem for the late response. As you can tell, my response is quite delayed too.
I found some time to take an xperf trace using my nullfs + iobench2 combination. The same slowdown was observed. Unfortunately it seems that most of the time is spend in ExAllocatePoolWithTag and ExFreePoolWithTag calls. They add a significant overhead,
Looking at nullfs.exe in particular, the entire run totalled 509ms. Of that, winfsp-x64.dll!FspFileSystemDispatcherThread consumes 507ms (inclusive total). As we drill down further into the call stack, we see:
kernel32!DeviceIoControlImplementation 500ms
kernelbase!DeviceIoControl 499ms
ntdll!NtFsControlFile 494ms
ntoskrnl!KiSystemServiceCopyEnd 473ms
ntoskrnl!NtFsControlFile 473ms
ntoskrnl!IopXxxControlFile 473ms
ntoskrnl!IopSynchronousServiceTail 398ms
ntoskrnl!ofCallDriver 304ms
... (elided for brevity)
ntoskrnl!opCompleteRequest 92ms
ntoskrnl!ExFreePoolWithTag 48ms
ntoskrnl!ovFreeIrpPrivate 38ms
ntoskrnl!ExFreePoolWithTag 35ms
...
ntoskrnl!ovAllocateIrp 40ms
ntoskrnl!VeAllocatePoolWithTag 22ms
...
So even before we get to nullfs' or WinFSP's code, we've already spent ~100ms allocating and freeing memory. The story doesn't get better as we drill in, though. Lots of time is spent in various freepool, allocatepool calls all over the stack trace.
Not sure how this can easily be fixed...
If you want, I can share the ETL file I have.
Stephen