Performance question on Graphite Dawn backend with D3D12

386 views
Skip to first unread message

Yijie Chen

unread,
Mar 28, 2025, 7:04:43 PM3/28/25
to skia-discuss

Hello Skia Community!

I am using Graphite Dawn backend in my non-Web native C++ app. I did some benchmark on the four steps in Skia shape drawing. I used std::chrono::steady_clock to measure the CPU time, and the currentBudgetedBytes() API in Context and Recorder to measure GPU memory usage. 

For this benchmark, I let Skia draw a 2000-point and a 4000-point polygon with half line segments and half cubic Bezier segments, onto a 8000x8000 canvas with RGBA_8888 format, averaging over 100 runs. The first table is on M3 Max Mac, the second on Windows with Nvidia 4070 Super. 

I have noticed 2 performance issues with D3D12:

  • The insertRecording call is taking very long on D3D12 compared with Metal. My understanding with insertRecording is it translates Graphite command buffer to DirectX command buffer, so should ideally be quick. 
  • The GPU memory usage on D3D12 is very high. For a 8000x8000 surface (8000x8000x4 bytes = 256 MB), it allocates 1.7 GB memory on GPU as reported by context budgeted bytes. This seems very high compared with Metal. Is it expected?
Screenshot 2025-03-28 at 3.44.40 PM.png

Thanks,
Jay

Michael Ludwig

unread,
Mar 28, 2025, 9:58:03 PM3/28/25
to skia-discuss
The increased memory usage is expected. When rendering paths in particular, Graphite relies on MSAA. In addition to the 256mb target you've allocated (not included in the reported budgeted memory), the Context will create a MSAA color attachment and a depth/stencil attachment. On Metal with ARM chips, these will be memoryless attachments. On any discrete GPU or with d3d11 and 12, they are not memoryless. Since they are MSAA Textures, they will be larger than the resolve target by a factor of N. We are investigating ways to bring this down but an 8kx8k surface is not the typical scenario we've been focusing on. 

The time spent in insertRecording is not expected. It could be something to do with your build (debug should be false), but if you're enabling certain dawn level validations or not disabling them, it could lead to unintended overhead. One that comes to mind is by default dawn will zero out all texture data, so for such a large set of MSAA textures it could be taking a while (but that should only happen during init, if at all). But the d3d12 backend of dawn is less tested and used compared to its d3d11 backend so there could also be implementation hotspots that we haven't uncovered. If you are able to capture any profile of this to dive into what's taking it so long, that would help greatly. 

Jay Chen

unread,
Apr 2, 2025, 8:51:44 AM4/2/25
to skia-discuss
Hi Michael, 

Thank you for the prompt reply. It is good to know the memory issue is caused by MSAA. I plan to collect more memory usage data when turning off MSAA in context options.
 
For the insertRecoding issue on D3D12, I did some profiling with Visual Studio Performance Profiler. It turns out that during insertRecording, the ResourceProvider::findOrCreateTexture function could not find a resource from the cache, thus always creating a new texture. However, this pattern only occurs when the texture size is relatively big. I can see this bottleneck on a 8k x 8k and 4k x 4k surface, but not on a 3k x 3k surface. Is this a known limitation?

I've attached the call stack from the profiler. The first screenshot is with 8k x 8k surface, the second is with 3k x 3k. The RunBenchmark function runs 99 iterations of drawing. I excluded the first run from the profiling result due to its high overhead. I realized a simple path can reproduce this issue, so I used a square shape (4 line connecting 4 points). 

image-2025-4-1_17-48-52.png

image-2025-4-1_18-27-25.png

Please let me know if this helps. 

Thanks,
Jay

Michael Ludwig

unread,
Apr 2, 2025, 11:51:16 AM4/2/25
to skia-discuss
What is likely happening is that without transient attachments, but with a surface of a large enough size, you're exceeding the default budget limit (I believe it's just 256MB).  So every time we're done with a frame, we will delete some resources to try and get back under budget, so we can end up re-allocating these 8k textures over and over again.  For a texture of that size, it's entirely possible that the overhead is all within D3D.  If you expand the `createTexture&DawnResourceProvider` does it have more stack information for how much time is spent inside Dawn code creating the wgpu::Texture (and maybe initializing its contents to zero?) or if that is lightweight and it's all D3D allocation overhead?

Jay Chen

unread,
Apr 11, 2025, 8:47:23 AM4/11/25
to skia-discuss

Hi Michael,


The time with createTexture() is mostly on D3D allocation. Thank you for pointing out the issue. I was able to do some tiling to reduce the GPU memory usage on such extreme canvas sizes.

I have a separate ask: I am investigating if our application can use Skia which builds and links Dawn. If in the future I want to use Dawn elsewhere in my code, I am concerned about the potential conflicts when I build another Dawn that may not be the same version as the one Skia is using. I understand the option to enable dawn is skia_use_dawn. Is it possible for Skia to provide an option to link to a prebuilt version of Dawn, say skia_use_system_dawn? 


Thanks,

Jay

Michael Ludwig

unread,
Apr 11, 2025, 9:02:13 AM4/11/25
to skia-discuss
I think that that should be possible in practice. When Chrome builds Graphite, it uses a version of Dawn that is shared between its WebGPU component and Graphite. Chrome however is managing its own build rules for Skia, but would be a reference to start.  We have an ongoing effort to incorporate Graphite into Skia's bazel build, which would also allow linking with an external version of Dawn.
Reply all
Reply to author
Forward
0 new messages