PCI passthrough appears to be regressing

Eric Shelton

unread,

Jan 30, 2016, 11:20:45 AM1/30/16

to qubes-devel

I'm not sure that I have anything concrete enough to open an issue yet (aside from https://github.com/QubesOS/qubes-issues/issues/1659), but I think it is worth initiating a discussion about PCI passthrough support having regressed from Qubes 2.0 (where pretty much everything seemed to work, including passthrough to HVM domains), and that it appears to only be getting worse over time. For example, see these recent discussions:

https://groups.google.com/forum/#!topic/qubes-users/VlbfFyNGNTs

https://groups.google.com/d/msg/qubes-devel/YqdhlmYtMY0/vCO3QHLBBgAJ

https://groups.google.com/d/msg/qubes-users/cmPRMOkxkdA/gIV68O0-CQAJ

The state of things seems to be something along these lines:

- everything worked pretty well under Qubes 2.0

- starting with Qubes 3.0, PCI passthrough to HVM domains broke (I think it is a libvirt related issue, as passthrough would still work using 'xl' to start a domain)

- it seems people are having less success doing passthrough under Xen 4.6 than under Xen 4.4. However, there is no hard evidence on this.

- Linux kernel 4.4 appears to break PCI passthrough for network devices entirely (not so much what is going on with Qubes today, but evidence that Xen's PCI passthrough support is getting worse, not better). Although the Qubes team has opted to move forward despite the above issues, this represents the point at which Qubes will no longer be able to do the one passthrough thing it relies on - isolating network adapters.

The effects of this today are that PCI passthrough questions are popping up more frequently on qubes-users, and the fixes that used to reliably address passthrough problems (for example, setting passthrough to permissive) seem to be less helpful, as problems seem to be lurking deeper within Xen, or perhaps changes to Xen mean that other fixes should be used instead.

The effects of this in the future is that it grows less and less likely that the vision of doing a separate GUI domain via GPU passthrough can be successfully executed. It is hard enough to get a GPU passed through to begin with (thanks to weird abuses of PCI features by GPU makers to facilitate DRM, I'm uncertain it will ever work for anything other than Intel GPUs, which is only due to specific efforts on Intel's part to make it work in Xen). The above issues make it much worse.

Eric

Joanna Rutkowska

unread,

Jan 31, 2016, 11:27:40 AM1/31/16

to Eric Shelton, qubes-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

The longer-term plan is for us to move to HVMs for everything: AppVMs,
ServiceVMs, and, of course, for the GUI domain also. This should not only
resolve the passthrough problems (assuming IOMMU works well), but also reduce
complexity in the hypervisor required to handle such VMs (see e.g. XSA 148). We
plan on starting this work once an early 4.0 is out.

Are the DRM-related problems with passthrough for some GPUs you mentioned above
also encountered on HVMs, or are they limited to PVs only?

Thanks,
joanna.
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCAAGBQJWrjW7AAoJEDOT2L8N3GcYzokQAMaollANSZxTmfPqi6hHeOqk
HWMDKQbHZnaUSfVTBIpfhWv8HDVw7r0Kunud+hDf9wGDTQnFwbpJYEeXBRE006kj
XT6193i0DGxqEazf8HjdVfVRdU2lkls9yvPx5gZpNw/KXooUs+nzhebHihaqJqR6
CrgJKUIX7o1MSR0mePJswvWK0UAb88KGBs5XZjstM2opFXyCn1BPxE4KWC+Etk0H
FZosDBJekNQGvfF08N4Fgsneu55d82CVnVBcVWKV8Tcslb3oO8NYMQC9JQhE4UsX
8JNaZLqpVEN1kwbJOY/qEduDtV/hmBh9oI6+0Jb5Olo0qKpiUebnkUnEXaCxd9Gj
aYcAdUS/uYj0sIVF7Ywe66NHPrNUcyuwm91l+EoCtA0d6gGB23KKfHymueOJ/X4p
RHznEcNZF02JJ8kVUtuc0uEkw32Nt7GAxoZ3H4PIsURoXvUJ1OKQo1LO6ovBHING
t2pK876vSJu+V0m1PuI98laHItpV7RG618AfZVyLuk5EKTchbfOmrMbu6K83Th+s
wIoKlNYIPTuBPmyMSwLOvWKVHYk+vYNh1QX8lP32gwUxk+YrLwWq2eUZUnslZfTI
dkdb7A1pTjboXTTUmWnPwgEMylSJQcSUf1SQjXJ+adaHI+4ePAq9pBUXXDq4Iv/w
xFx3KpJESTrIvx0RiPQA
=LY/z
-----END PGP SIGNATURE-----

Vít Šesták

unread,

Jan 31, 2016, 12:41:29 PM1/31/16

to qubes-devel, knock...@gmail.com, joa...@invisiblethingslab.com

Do you mean HVM, or PVH? I believe you mean PVH, as HVMs with PV stubdomains is much smaller mitigation factor.

In either case, does this imply dropping support for non-IOMMU systems?
- If no, it would require maintaining support and templates for both PV and HV. (HV includes both HVM and PVH. Is there an established term for this?) I am not sure if it is much of work.
- If yes, do you have an estimate for end of security updates of last Qubes with support for non-IOMMU systems? I believe my laptop will EOL sooner, but I'd like to have some idea about that (for multiple reasons).

Regards,
Vít Šesták 'v6ak'

Eric Shelton

unread,

Jan 31, 2016, 5:07:02 PM1/31/16

to qubes-devel, knock...@gmail.com, joa...@invisiblethingslab.com

I don't see how this resolves things, but it's a little hard to at this time since HVM PCI passthrough support is currently broken. There is no way for me to compare the bugginess of PCI passthrough on HVM versus PV. I have not experimented with Linux HVMs using PCI passthrough - do pcifront/pciback drop out of the picture, and it just looks and behaves like normal (I would expect the hypervisor to still impose some restrictions on use/abuse of PCI devices)? Is there some other significant difference?

Are the DRM-related problems with passthrough for some GPUs you mentioned above
also encountered on HVMs, or are they limited to PVs only?

All of the efforts I have seen involving GPU passthrough have involved HVM, maybe since most people pursuing it are seeking to run 3D Windows applications (games or otherwise). http://wiki.xen.org/wiki/XenVGAPassthrough#Why_can.27t_I_do_VGA_passthrough_to_Xen_PV_.28paravirtual.29_guest.3F discusses why passthrough cannot be done to a PV domain.

The bizarre things GPU makers have done with PCI registers and bars generally defies how one expects well-mannered PCI devices to behave. MS Vista not only introduced, but required this nonsense (http://www.cypherpunks.to/~peter/vista.pdf). As noted on page 22 of that PDF file, each GPU type stepping is/was required to have a different mechanism in place. As a result, no two cards, even by the same vendor, have distorted PCI behavior in the quite same way. The patch at http://old-list-archives.xenproject.org/archives/html/xen-devel/2010-10/txtfLpL6CdMGC.txt describes one example:

* ATI VBIOS Working Mechanism 
*
* Generally there are three memory resources (two MMIO and one PIO) 
* associated with modern ATI gfx. VBIOS uses special tricks to figure out 
* BARs, instead of using regular PCI config space read.
*
*  (1) VBIOS relies on I/O port 0x3C3 to retrieve PIO BAR 
*  (2) VBIOS maintains a shadow copy of PCI configure space. It retries the 
*      MMIO BARs from this shadow copy via sending I/O requests to first two 
*      registers of PIO (MMINDEX and MMDATA). The workflow is like this: 
*      MMINDEX (register 0) is written with an index value, specifying the 
*      register VBIOS wanting to access. Then the shadowed data can be 
*      read/written from MMDATA (register 1). For two MMIO BARs, the index 
*      values are 0x4010 and 0x4014 respectively.

Not how your typical PCI device behaves.

Often 1:1 mapping of bars has been required to get video drivers to work with passthrough GPUs. Sometimes needing to deal with booting the card's BIOS through QEMU's emulated BIOS becomes an issue (with the solution being copying the VBIOS using some hacked together mechanism, and rolling it into the SEABIOS image).

Between NVIDIA and AMD, generally users have had a worse time getting NVIDIA devices working via passthrough (although AMD has been plenty difficult). The most reliable technique, in my experience, is to use an NVIDIA QUADRO that is compatible with NVIDIA GRID (http://www.nvidia.com/object/dedicated-gpus.html). It's the only thing I've used for GPU passthrough that "just works" - the NVIDIA drivers are specifically written to accept the GPU is running in a virtualized environment.

On top of all of this, apparently NVIDIA is _actively_ doing things to frustrate attempts at running their hardware within a VM: https://www.reddit.com/r/linux/comments/2twq7q/nvidia_apparently_pulls_some_shady_shit_to_do/ with one developer describing the situation as "VM developers have been having a little arms race with NVIDIA."

About 1-2 years ago, KVM + QEMU put in a lot of effort into getting GPU passthrough working more smoothly. However, I do not know what the state of things is today, and Xen has not engaged in similar efforts to anywhere near the same degree. Looking the last 6 months or so of xen-devel, most of subject lines mentioning PCI passthrough are efforts towards getting Intel's IGD working via passthrough (related to this, Intel is also still actively developing XenGT) . Given the number of posts on the topic, I'm guessing it was nontrivial - and this is with an actively cooperating GPU vendor.

As noted above, most attempts have involved running GPUs in a virtualized Windows session. It is be possible that things will go more smoothly when running Linux drivers. On the other hand, I have not heard particularly good things about running with nouveau, and using NVIDIA's own Linux drivers may pull in the same headaches seen on Windows.

So, expect getting GPU passthrough working across the wide range of notebook and desktop GPUs deployed out there to be a substantial challenge, and make sure you have lots of different GPUs on hand for testing purposes. You may discover, after not digging too deeply into it, that it is not worth the headaches and risk of rendering a lot of hardware Qubes-incompatible. I understand (at least some of) the reasons for wanting to move the GUI out of dom0, but it may not be practical due to what the GPU vendors have done to support DRM.

Eric

Joanna Rutkowska

unread,

Feb 1, 2016, 3:59:01 AM2/1/16

to Eric Shelton, qubes-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We would like to primarily target laptops, and this means we're primely
interested in getting the Intel integrated GPUs to work. Fortunately, Intel
seems to be putting lots of work into making GPUs not only VT-d-assignable, but
also virtualizable (as in: multiplexible) -- as evidenced by their GVT work
(formerly XenGT):

https://01.org/igvt-g

Thanks,
joanna.
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCAAGBQJWrx4VAAoJEDOT2L8N3GcYZOsP/RLTfDl2ws3HUfhfPGv3axCD
WlcaOn2taH5T5w5pVWXtK7uk84Cvc4ZA4DfZUJMVTyCzdGE5jel64VMgV/IasVx6
Lt05jYFyoi2T64UpqvFaoKDXPesVMhm22x/mGMuZIwsQe8hd5gLItWEbkKouvXJK
QQVFzO+1kxFENgOql6T/v9JNHr1G5ue/8cX3izhuoHa0IQykJaoKuTXXwWbkMgk0
aTVd6+Bp++WL4CUeyujQhNTmwIEl93QMiHMbhtnPpxcW30qHdrqj4IKn6AfL2hLf
zG0h7DLnrn7B3KJrIEfT0MMHAymPvbLpEQosxDrCdbkkgCt6x3ihlmXKOzfTB5Bz
tiieP5gHN9iUo/ToZVa4tRLNVE7VLSUdYxvBA24bN/KHm5ad43SfgcNdNTML3Lrn
gj6OJ44tHyJo7PgjIH80bVF3f0mM2jS3HDriO0at+lWaWAY+00dN8XMb69EahrKb
Q5vyG+ovjq8sd4oSLEPmIpdpTwq+2RRlENRbjb4VvQ9jqG18sagxs9wmGuXq1iO0
KIWZ/UporVbWm9nP6rbchlzPnIwCCe4gHOfNzECRnYZjtmeMkSQSeTvmg2X+qx+w
jXyFE97CpcRqpQ/qQL+IRm1CGT97siHx8p4NaJLbMKmG2uUiE9l0Cif0EdcVQx1c
ii5TV5hIDJYQRMkqLRRk
=1Llj
-----END PGP SIGNATURE-----

Outback Dingo

unread,

Feb 1, 2016, 5:21:29 AM2/1/16

to Joanna Rutkowska, Eric Shelton, qubes-devel

good to know you have a focus on GPUs, however my pci passthru issue on my laptop was specifically nic oriented

shared pci bus

02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 5287 (rev 01)
02:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)

I did find it quite odd that qubes itself couldnt handle the pci passthru but XEN 4.5 with Fedora 23 running on top worked perfectly fine

That being said.... I can appreciate that people want the GPUs to work, personally, however if a simple nic cant be passed thru, Qubes

itself is pretty unuseable.... i guess its about priorities to me

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCAAGBQJWrx4VAAoJEDOT2L8N3GcYZOsP/RLTfDl2ws3HUfhfPGv3axCD
WlcaOn2taH5T5w5pVWXtK7uk84Cvc4ZA4DfZUJMVTyCzdGE5jel64VMgV/IasVx6
Lt05jYFyoi2T64UpqvFaoKDXPesVMhm22x/mGMuZIwsQe8hd5gLItWEbkKouvXJK
QQVFzO+1kxFENgOql6T/v9JNHr1G5ue/8cX3izhuoHa0IQykJaoKuTXXwWbkMgk0
aTVd6+Bp++WL4CUeyujQhNTmwIEl93QMiHMbhtnPpxcW30qHdrqj4IKn6AfL2hLf
zG0h7DLnrn7B3KJrIEfT0MMHAymPvbLpEQosxDrCdbkkgCt6x3ihlmXKOzfTB5Bz
tiieP5gHN9iUo/ToZVa4tRLNVE7VLSUdYxvBA24bN/KHm5ad43SfgcNdNTML3Lrn
gj6OJ44tHyJo7PgjIH80bVF3f0mM2jS3HDriO0at+lWaWAY+00dN8XMb69EahrKb
Q5vyG+ovjq8sd4oSLEPmIpdpTwq+2RRlENRbjb4VvQ9jqG18sagxs9wmGuXq1iO0
KIWZ/UporVbWm9nP6rbchlzPnIwCCe4gHOfNzECRnYZjtmeMkSQSeTvmg2X+qx+w
jXyFE97CpcRqpQ/qQL+IRm1CGT97siHx8p4NaJLbMKmG2uUiE9l0Cif0EdcVQx1c
ii5TV5hIDJYQRMkqLRRk
=1Llj
-----END PGP SIGNATURE-----

--
You received this message because you are subscribed to the Google Groups "qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qubes-devel...@googlegroups.com.
To post to this group, send email to qubes...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/qubes-devel/20160201085758.GA855%40work-mutt.

For more options, visit https://groups.google.com/d/optout.

Eric Shelton

unread,

Feb 1, 2016, 11:46:09 AM2/1/16

to qubes-devel, knock...@gmail.com, joa...@invisiblethingslab.com

I am still interested in this part of your earlier comment. Is there some evidence that we can expect PCI passthrough to go more smoothly with PVH domains?

So, it sounds like starting with Qubes 4.1, systems with Intel IGP become first-class citizens, and things will become more difficult for those with AMD GPUS, NVIDIA GPUS, and AMD CPUS. Perhaps in practice this is generally the state of the notebook market (since most notebooks with AMD or NVIDIA still have Intel GPUs), and NVIDIA support for Linux has always been pretty rough, but I suppose I am still a little surprised.

If things are headed this direction, I think Qubes should be pretty explicit about what it considers to be recommended/supported hardware. For example, that it is intended for use with Intel-based notebooks including an IGP, and that other configurations (such as many desktops) may work, but are not actively being supported by the Qubes dev team (essentially relegating efforts to getting them to work to community members).

If that sounds too severe, perhaps it makes sense to expand the information in the Qubes HCL to identify which PCI devices do and do not work with PCI passthrough, so people can make informed decisions about hardware. There seem to be a number of devices that are not doing well with PCI passthrough.

Eric

Eric Shelton

unread,

Feb 1, 2016, 11:57:54 AM2/1/16

to qubes-devel, joa...@invisiblethingslab.com, knock...@gmail.com

I think it is fair to say the GPU discussion does take this thread off track. Despite whatever plans are in the works for Qubes 4.0+ that may or may not make things better for PCI passthrough, what Qubes is using today, and what is on the horizon for pre-4.0 releases, seems to be getting worse and worse at doing PCI passthrough. The issue is getting beyond not being able to do passthrough to Windows HVM VMs; users are now starting to run into problems with passthrough of network and USB adapters too.

As noted above, Outback Dingo took the time to replicate passthrough with vanilla Xen on his hardware that was not working on Qubes. For whatever reason, it worked on Xen but not on Qubes. Plus, there seem to be a number of users posting that passthrough worked in Qubes 2.0 or 3.0, but no longer on the 3.1 RCs. Why? Did Xen have a regression between 4.4 and 4.6? Did the XSA-155 patches break something? (see https://groups.google.com/d/msg/qubes-devel/9OybIE8hTrw/L0WejtOqAAAJ) Is it something else introduced by Qubes? This is what I was hoping to sort out in this thread.

Eric

Reply all

Reply to author

Forward