Vkd3d-proton

1 view
Skip to first unread message

Christian Swindler

unread,
Aug 3, 2024, 1:33:14 PM8/3/24
to swipriomiclips

VKD3D_CONFIG=dxr is default now, and no longer needed.
There are some special cases where DXR is not enabled by default. The only such current example is
"Hellblade: Senua's Sacrifice" on Deck which force-enables DXR if it is supported, even on Deck.
New semantics are:

This feature was the last feature required for FL 12.2 and is implemented through emulation.
As demonstrated in the implementation docs, all
native implementations of this feature are fundamentally broken in some way.
There's also no known game that ships requiring this feature, so we just consider this a checkbox feature.

With NV_device_generated_commands_compute we can efficiently implement
Starfield's use of ExecuteIndirect which hammers multi-dispatch COMPUTE + root parameter changes.
Previously, we would rely on a very slow workaround.

Some games started assuming that the DLLs were laid out similar to AgilitySDK, where
d3d12.dll is just a loader, and d3d12core.dll contains the real implementation.
vkd3d-proton now implements this split as well. It is possible that various scripts must be updated
to accomodate both DLLs now. Once d3d12.dll is installed in a prefix,
only d3d12core.dll needs to be updated, as d3d12.dll is just a trivial shim either way.

Various bugfixes for games as usual. Listing individual games is becoming impractical at this point,
and it's best to refer to other sources for compatibility information with specific games.
As usual, a bunch of fixes in dxil-spirv to fix shader bugs.

This extension is significant in that it removes a ton of CPU overhead.
We already had most of this in place on RADV and Steam Deck,
but this will allow NVIDIA, Intel, Turnip, and other AMD driver implementations to hit the same optimal code paths.
GPU bound performance increases slightly since we can also remove some shader code that was required to workaround lack of descriptor buffers.

To support descriptor buffers in the code base, these features are now required instead of optional.
Note that these features are widely supported already and is not expected to cause any problems.
If an implementation could support v2.7, it will support v2.8.

The entire API feature was rewritten from scratch to support more implementations and edge cases without
a lot of per-application hacks and workarounds.
As the most extreme example of weird API usage, Guardians of the Galaxy should (finally) run well on NVIDIA.

NOTE: The old swapchain implementation is still in the repository, and is expected to be removed in the next release.
For now, VKD3D_CONFIG=swapchain_legacy can be used to triage any potential issues with the new one.

v2.6 introduced support for pipeline libraries, but only for games which made correct use of the D3D12 API.
To improve the situation across the board,
vkd3d-proton now implements an internal "magic" disk cache to enable SPIR-V caching for all games.
It is possible to disable the magic cache and let applications manage the ID3D12PipelineLibrary itself if desired.

To further reduce on-disk footprint of the magic cache, we also make use of VK_EXT_shader_module_identifier
to reduce the vkd3d-proton cache by >95%, since there is no need to store actual SPIR-V data on-disk.

Support D3D12 pipeline libraries better where we can now also cache
generated SPIR-V from DXBC/DXIL.
Massively reduces subsequent load times in Monster Hunter: Rise,
and helps other titles like Guardian of the Galaxy and Elden Ring.
Also lays the groundwork for internal driver caches down the line for games which do not use this API.
Also, deduplicates binary blobs for reduced disk size requirements.

VK_KHR_dynamic_rendering in particular requires up-to-date drivers and the legacy render pass path
will be abandoned in favor of it. Supporting both paths at the same time is not practical.
Moving to VK_KHR_dynamic_rendering allows us to fix some critical flaws with the legacy API
which caused potential shader compilation stutters and extra CPU overhead.

Some new DXR games are starting to come alive, especially with DXR 1.1 enabled,
but there are significant bugs as well that we currently cannot easily debug.
Some experimental results on NVIDIA:

By default, vkd3d-proton will now take advantage of PCI-e BAR memory types through heuristics
as D3D12 does not expose direct support for resizable BAR, and native D3D12 drivers are known to use heuristics as well.
Without resizable BAR enabled in BIOS/vBIOS, we only get 256 MiB which can help performance,
but many games will improve performance even more
when we are allowed to use more than that.
There is an upper limit for how much VRAM is dedicated to this purpose.
We also added VKD3D_CONFIG=no_upload_hvv to disable all uses of PCI-e BAR memory.

I'm new in the linux ecosystem so i could be missing something really obvious. I decided to switch 100% on linux couple days ago, for a better and more efficient workspace. And now i want to bring back my games too, so when i start lutris, and i want to configure the settings of wine, i get this error in both "Enable DXVK" and "Enable VKD3D" :

When I first saw those errors, i was missing dxvk-bin and vkd3d-proton-bin, all of nvidia packages was already present. So i cloned both of them with the aur, i rebooted my machine after makepkg, and making sure the installation was successfull. Try to rerun lutris and i still got the same error. I try to follow this thread but i don't get the same output from the op (see code below) and i'm not on AMD

The end goal of this blog is to demonstrate the magical UMR tool on Linux, which I would argue is the only reasonable post-mortem debugging method currently available on PC, but before we go that deep, we need to look at the current state of crash debugging on PC and the bespoke tooling we have in vkd3d-proton to deal with crashes.

Buffer markers is the simplest possible solution for implementing breadcrumbs. The basic idea is that a value is written to memory either before the GPU processes a command, or after work is done. On a device lost, counters can be inspected. The user will have to instrument the code somehow, either through a layer or directly. In vkd3d-proton, we can enable debug code which automatically does this for all D3D12 commands with VKD3D_CONFIG=breadcrumbs (not available in release builds).

From there, start looking for TOP_OF_PIPE and BOTTOM_OF_PIPE pipeline stages to get a potential range of commands. BOTTOM_OF_PIPE means we know for sure all commands before completed execution, and TOP_OF_PIPE means the command processor might have started executing all commands up to that point.

The main flaw with this extension is there is no easy way to narrow down the range of commands. With RADV we can enforce sync with syncshaders as a (very useful) hack, but there is no such method on NV unless we do it ourselves ?

Now this is the real deal. RADV can invoke UMR on crashes and dump out a bunch of useful information. The UMR tool is standalone and should work with AMDVLK or amdgpu-pro as well. Nothing stops you from invoking the same CLI commands that RADV does while the device is hung.

Currently, RADV only knows how to dump the GFX ring, so we need to ensure only that queue is used if crashes happen in async compute. In vkd3d-proton, we have VKD3D_CONFIG=single_queue for that purpose.

Sometimes RADV_DEBUG=hang masks bugs as well due to extra sync, but fortunately we got a wave dump eventually. The failure was in a scalar load from a raw pointer. Normally, this means an out-of-bounds root CBV descriptor access.

The descriptor index was computed as root table offset + dynamic offset. Studying the ISA I realized that it was not actually the dynamic offset that was the culprit, but rather the root table offset. Figuring this out would have taken an eternity without SGPR dumps.

c80f0f1006
Reply all
Reply to author
Forward
0 new messages