Qubes/Xen doesn't comply with IOMMU grouping rules for PCI passthru

143 views
Skip to first unread message

Claudia

unread,
Dec 22, 2019, 2:32:23 PM12/22/19
to qubes...@googlegroups.com
I'm very new to all this iommu stuff, but as I understand it, devices in the same iommu group are
supposed to be treated as a single unit, meaning if any of them are assigned to a VM then they all
must be assigned to the same VM. This is because those devices cannot be isolated from each other
-- they can communicate directly without going through the IOMMU at all, for example.

This not only creates obvious security holes, but can also cause compatibility problems. For
example, devices in different VMs will have different perspectives of system memory. [3] Or
something like that. Like I said, I'm still trying to wrap my head around it.

Point is, the entire group is supposed to be treated as a single unit. Linux/KVM enforces this, Xen
does not, I'm not sure about any other platforms. [1]

This caused a very sneaky problem on my machine. My USB controllers are in the same group as my
GPU, sound card, and SATA controller. So when sys-usb (or rd.qubes.hide_all_usb) takes over those
two USB controllers, everything stops working. [4] It was quite difficult to trace. It would have
been much easier to diagnose if grouping was enforced somewhere. I would much rather have an error
in my logs about being unable to assign USB controllers, than have my whole screen freeze up with
no indication why. (I got lucky that it just crashed; if something interferes with your SATA
controller's address space it can cause disk corruption. [5])

I don't really know who's at fault here. Qubes? Xen? AMD? Dell?

Unfortunately, Qubes has no way of knowing anything about iommu grouping because Xen takes over the
IOMMU (and therefore grouping is not visible in dom0). [2] So probably the only way Qubes could
enforce grouping is by some kind of heuristic. For example, assume all functions of a device are
grouped. Or, assume all devices on a hub are grouped. Or just disable the USB Qube option on AMD
systems entirely, or warn the user that it may cause serious problems that are hard to diagnose.

As for fixing the actual problem, that is, grouping them in a more sensible way so that the GPU and
USB controllers can be isolated for example, can only be done in a firmware (or microcode?) update
by the vendor, if at all. There are some hacks for KVM to spoof the grouping restrictions (which
Xen doesn't enforce in the first place), but they don't solve the underlying problem. VFIO seems
like it could work (by emulating some IOMMU functionality in software), but I don't know if it's
supported by Xen.

I'm guessing part of the reason this problem doesn't usually come up on Intel systems is because of
the Xen option iommu=no-igfx. This means that the integrated GPU is always exempt from IOMMU
control altogether, but this option is Intel-specific and has no AMD equivalent. However, that
doesn't do anything about other devices such as sound cards or SATA controllers. Intel systems
seem to just to have better grouping usually (or, are less likely to crash when grouping rules are
violated). [6]

At least that's my understanding so far.

Thoughts? Is there anything Qubes can do to do avoid splitting up IOMMU groups? Is there anything
Qubes *should* do? Should Qubes attempt to guess the IOMMU groups before taking over devices?
Should the USB Qube option be disabled on AMD systems (you can still manually set up sys-usb of
course)? Should we just blame Xen for not enforcing IOMMU groups in the first place?

[1] https://lists.gt.net/xen/devel/345279#345279
[2] http://xen.1045712.n5.nabble.com/IOMMU-group-dissapear-in-XEN-td5737357.html
[3] https://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
[4] https://www.mail-archive.com/qubes...@googlegroups.com/msg31494.html
[5] http://xen.1045712.n5.nabble.com/VGA-passthrough-with-USB-passthrough-td5738340.html
[6] https://hardforum.com/threads/ryzen-and-iommu-groups-is-this-ever-going-to-get-fixed.1944064

---

Dell Inspiron 5575, AMD Ryzen 5 2500U, Qubes R4.1 booted without Xen:

# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host
Bridge
00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host
Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to
Bus A
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to
Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet
controller (rev 07)
02:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (rev 31)
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega
Series / Radeon Vega Mobile Series] (rev c4)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio
Controller
03:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh)
Platform Security Processor
03:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
03:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
03:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio
Controller
04:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev
61)

# lspci -t
-[0000:00]-+-00.0
+-00.2
+-01.0
+-01.6-[01]----00.0
+-01.7-[02]----00.0
+-08.0
+-08.1-[03]--+-00.0
| +-00.1
| +-00.2
| +-00.3
| +-00.4
| \-00.6
+-08.2-[04]----00.0
+-14.0
+-14.3
+-18.0
+-18.1
+-18.2
+-18.3
+-18.4
+-18.5
+-18.6
\-18.7

# tree /sys/kernel/iommu_groups/
├── 0
│ ├── devices
│ │ └── 0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
│ ├── reserved_regions
│ └── type
├── 1
│ ├── devices
│ │ └── 0000:00:01.6 -> ../../../../devices/pci0000:00/0000:00:01.6
│ ├── reserved_regions
│ └── type
├── 2
│ ├── devices
│ │ └── 0000:00:01.7 -> ../../../../devices/pci0000:00/0000:00:01.7
│ ├── reserved_regions
│ └── type
├── 3
│ ├── devices
│ │ ├── 0000:00:08.0 -> ../../../../devices/pci0000:00/0000:00:08.0
│ │ ├── 0000:00:08.1 -> ../../../../devices/pci0000:00/0000:00:08.1
│ │ ├── 0000:00:08.2 -> ../../../../devices/pci0000:00/0000:00:08.2
│ │ ├── 0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.0
│ │ ├── 0000:03:00.1 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.1
│ │ ├── 0000:03:00.2 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.2
│ │ ├── 0000:03:00.3 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.3
│ │ ├── 0000:03:00.4 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.4
│ │ ├── 0000:03:00.6 -> ../../../../devices/pci0000:00/0000:00:08.1/0000:03:00.6
│ │ └── 0000:04:00.0 -> ../../../../devices/pci0000:00/0000:00:08.2/0000:04:00.0
│ ├── reserved_regions
│ └── type
├── 4
│ ├── devices
│ │ ├── 0000:00:14.0 -> ../../../../devices/pci0000:00/0000:00:14.0
│ │ └── 0000:00:14.3 -> ../../../../devices/pci0000:00/0000:00:14.3
│ ├── reserved_regions
│ └── type
├── 5
│ ├── devices
│ │ ├── 0000:00:18.0 -> ../../../../devices/pci0000:00/0000:00:18.0
│ │ ├── 0000:00:18.1 -> ../../../../devices/pci0000:00/0000:00:18.1
│ │ ├── 0000:00:18.2 -> ../../../../devices/pci0000:00/0000:00:18.2
│ │ ├── 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
│ │ ├── 0000:00:18.4 -> ../../../../devices/pci0000:00/0000:00:18.4
│ │ ├── 0000:00:18.5 -> ../../../../devices/pci0000:00/0000:00:18.5
│ │ ├── 0000:00:18.6 -> ../../../../devices/pci0000:00/0000:00:18.6
│ │ └── 0000:00:18.7 -> ../../../../devices/pci0000:00/0000:00:18.7
│ ├── reserved_regions
│ └── type
├── 6
│ ├── devices
│ │ └── 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.6/0000:01:00.0
│ ├── reserved_regions
│ └── type
└── 7
├── devices
│ └── 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.7/0000:02:00.0
├── reserved_regions
└── type

awokd

unread,
Dec 26, 2019, 7:59:39 AM12/26/19
to qubes...@googlegroups.com
Claudia:

TLDR; check bottom of https://community.amd.com/thread/241650, looks
like there was a recently released related updated. Not sure if
applicable to your situation.

> This caused a very sneaky problem on my machine. My USB controllers are in the same group as my
> GPU, sound card, and SATA controller. So when sys-usb (or rd.qubes.hide_all_usb) takes over those
> two USB controllers, everything stops working. [4] It was quite difficult to trace. It would have
> been much easier to diagnose if grouping was enforced somewhere. I would much rather have an error
> in my logs about being unable to assign USB controllers, than have my whole screen freeze up with
> no indication why. (I got lucky that it just crashed; if something interferes with your SATA
> controller's address space it can cause disk corruption. [5])
>
> I don't really know who's at fault here. Qubes? Xen? AMD? Dell?

The improper grouping is probably somewhere in AGESA, which is provided
to the manufacturers by AMD. It could be because of hardware related
limitations, which again are supplied by AMD. Sometimes vendors take
liberties (cost cutting measures) with both and break functionality, as
their primary/sole concern is that Windows boots. This can especially be
the case with consumer class machines such as Ryzen. Agree it would be
nice if Xen handled this failure mode more gracefully. Not sure there is
much Qubes can do here, though. On the other hand, my older AMD
(pre-Ryzen) consumer laptop running Coreboot has correct groupings.

> Intel systems
> seem to just to have better grouping usually (or, are less likely to crash when grouping rules are
> violated). [6]

I think that is overbroad. There are plenty of Intel systems with broken
passthrough. iommu=no-igfx itself is a workaround for broken passthrough
of Intel graphics. There are also plenty of AMD systems with properly
implemented passthrough.

> Thoughts? Is there anything Qubes can do to do avoid splitting up IOMMU groups? Is there anything
> Qubes *should* do? Should Qubes attempt to guess the IOMMU groups before taking over devices?
> Should the USB Qube option be disabled on AMD systems (you can still manually set up sys-usb of
> course)? Should we just blame Xen for not enforcing IOMMU groups in the first place?

Ultimately, it's a hardware/firmware issue. Threadripper and Epyc based
AMD systems ought to be more thoroughly vetted to support passthrough.
My suggestions are to disable automatic IOMMU grouping in your UEFI
configuration, if possible. Otherwise, try a newer firmware version with
updated AGESA code and see if it helps, or possibly add a card with
additional USB controllers as that should appear in its own group.

John Mitchell

unread,
Dec 27, 2019, 6:37:27 AM12/27/19
to qubes-users
"This can especially be the case with consumer class machines such as Ryzen."

I had to lol about this since I have had many Intel systems with one problem or another and this is certainly not isolated to Ryzen.  I am also running a Ryzen system running three VMs with PCI-passthrough on one of the VMs with no problems.  I do not know if my motherboard is consumer grade (ASrock) however it does rock.  ;)

Claudia

unread,
Dec 27, 2019, 7:06:01 AM12/27/19
to John Mitchell, qubes-users
December 27, 2019 11:37 AM, "John Mitchell" <sonw...@gmail.com> wrote:

> "This can especially be the case with consumer class machines such as Ryzen."
> I had to lol about this since I have had many Intel systems with one problem or another and this is
> certainly not isolated to Ryzen. I am also running a Ryzen system running three VMs with
> PCI-passthrough on one of the VMs with no problems. I do not know if my motherboard is consumer
> grade (ASrock) however it does rock. ;)
>

It's not passthru in general, it's passthru of specific devices, in this case USB controllers. My network devices passthru just fine because they're in their own groups. Since you said "PCI-passthrough on one of the VMs" I'm assuming you're talking about sys-net, and that you're not using a USB Qube (which would make it two VMs). And just because passthru doesn't work on one (Intel) system or another doesn't necessarily mean it's because of IOMMU grouping. There are plenty of different reasons it could break.

In any case, yeah, I'm sure Intel systems have their problems too. I just happened to come across a lot of complaints about IOMMU grouping on AMD, especially Ryzen. But my main point was about Xen and Qubes with regards to IOMMU grouping. The AMD-vs-Intel matter is the least of my concerns.

Claudia

unread,
Dec 27, 2019, 7:30:18 AM12/27/19
to awokd, qubes...@googlegroups.com
December 26, 2019 12:59 PM, "awokd' via qubes-users" <qubes...@googlegroups.com> wrote:

> Claudia:
>
> TLDR; check bottom of https://community.amd.com/thread/241650, looks
> like there was a recently released related updated. Not sure if
> applicable to your situation.

Thanks for the link! I'm not sure if it affects me or not. I did install a Dell BIOS update dated March 2019, so it sounds like that could have contained this Agesa update. So downgrading might fix the grouping issue, but this update also contained an "urgent" security update which I'd have to look into before downgrading.

>> This caused a very sneaky problem on my machine. My USB controllers are in the same group as my
>> GPU, sound card, and SATA controller. So when sys-usb (or rd.qubes.hide_all_usb) takes over those
>> two USB controllers, everything stops working. [4] It was quite difficult to trace. It would have
>> been much easier to diagnose if grouping was enforced somewhere. I would much rather have an error
>> in my logs about being unable to assign USB controllers, than have my whole screen freeze up with
>> no indication why. (I got lucky that it just crashed; if something interferes with your SATA
>> controller's address space it can cause disk corruption. [5])
>>
>> I don't really know who's at fault here. Qubes? Xen? AMD? Dell?
>
> The improper grouping is probably somewhere in AGESA, which is provided
> to the manufacturers by AMD. It could be because of hardware related
> limitations, which again are supplied by AMD. Sometimes vendors take
> liberties (cost cutting measures) with both and break functionality, as
> their primary/sole concern is that Windows boots. This can especially be
> the case with consumer class machines such as Ryzen. Agree it would be
> nice if Xen handled this failure mode more gracefully. Not sure there is
> much Qubes can do here, though. On the other hand, my older AMD
> (pre-Ryzen) consumer laptop running Coreboot has correct groupings.

Yeah, my impression is the firmware can influence IOMMU grouping to an extent, within the bounds of
the physical hardware. If this problem was indeed caused by an update then I assume it's (at least partly) firmware-related. According to that thread, a fix has been released for some boards/CPUs, "ComboPI", but the only feedback I can find on it is for Ryzen 3000-series which doesn't help me. Also I don't even know if or when my machine will receive a BIOS update with this Agesa fix.

I sort of blame Xen for not enforcing IOMMU grouping, especially considering that it hides that
info from the OS. KVM does enforce IOMMU grouping rules, so I don't see why Xen wouldn't. Xen
leaves it up to the user software to be careful what it passes where, but that's kind of hard when
you don't have /sys/kernel/iommu_groups for a hint.

>> Intel systems
>> seem to just to have better grouping usually (or, are less likely to crash when grouping rules are
>> violated). [6]
>
> I think that is overbroad. There are plenty of Intel systems with broken
> passthrough. iommu=no-igfx itself is a workaround for broken passthrough
> of Intel graphics. There are also plenty of AMD systems with properly
> implemented passthrough.

Very possible. I don't have experience with a lot of other hardware, so I'm just going by what I've
heard. It definitely seems to be a Ryzen problem at least, maybe not AMD in general. I just seemed
to come across a lot more complaints about AMD than Intel, though. It would be nice if the HCL
contained more detailed information about the IOMMU such as grouping, so we could get a better
idea. At any rate, that's the least of my worries.

TBH I don't really understand what no-igfx does, so I don't know if an AMD-equivalent option would help in this case or not. It's just worth noting that it's an Intel-specific fix which could improve Intel compatibility compared to AMD generally.

>> Thoughts? Is there anything Qubes can do to do avoid splitting up IOMMU groups? Is there anything
>> Qubes *should* do? Should Qubes attempt to guess the IOMMU groups before taking over devices?
>> Should the USB Qube option be disabled on AMD systems (you can still manually set up sys-usb of
>> course)? Should we just blame Xen for not enforcing IOMMU groups in the first place?
>
> Ultimately, it's a hardware/firmware issue. Threadripper and Epyc based
> AMD systems ought to be more thoroughly vetted to support passthrough.
> My suggestions are to disable automatic IOMMU grouping in your UEFI
> configuration, if possible. Otherwise, try a newer firmware version with
> updated AGESA code and see if it helps, or possibly add a card with
> additional USB controllers as that should appear in its own group.

There is no way to enable or disable automatic IOMMU grouping in my bios. The only options are IOMMU
enabled or disabled, as far as I can tell. There is no newer firmware for this machine at this
time. Not sure about microcode, though. This is a laptop, so I can't add any cards.

awokd

unread,
Dec 29, 2019, 9:19:47 AM12/29/19
to qubes...@googlegroups.com
Claudia:
> December 26, 2019 12:59 PM, "awokd' via qubes-users" <qubes...@googlegroups.com> wrote:
>
>> Claudia:
>>
>> TLDR; check bottom of https://community.amd.com/thread/241650, looks
>> like there was a recently released related updated. Not sure if
>> applicable to your situation.
>
> Thanks for the link! I'm not sure if it affects me or not. I did install a Dell BIOS update dated March 2019, so it sounds like that could have contained this Agesa update. So downgrading might fix the grouping issue, but this update also contained an "urgent" security update which I'd have to look into before downgrading.

I'd assumed AGESA version numbers were from a common code base, but
apparently not. The one mentioned in that thread was released around
Oct. 2019, but may not be applicable to your hardware. They also don't
specifically reference USB controller grouping in that thread, so it
might do nothing for you even if it is applicable.

> I sort of blame Xen for not enforcing IOMMU grouping, especially considering that it hides that
> info from the OS. KVM does enforce IOMMU grouping rules, so I don't see why Xen wouldn't. Xen
> leaves it up to the user software to be careful what it passes where, but that's kind of hard when
> you don't have /sys/kernel/iommu_groups for a hint.

I am a bit fuzzy here too. It seems like if ACS is working correctly,
you can get better granularity within IOMMU groups. It would be
disappointing if it does not on recently released hardware. In your
case, the USB controller appears as a different function of the same PCI
device, which could be the case from a hardware perspective. This is
even worse for a passthrough scenario than IOMMU grouping. There is a
Realtek controller that often comes up on the list that makes people
passthrough the SD card controller to their sys-net along with WIFI for
the same reason.

> This is a laptop, so I can't add any cards.

This didn't used to be mutually exclusive. Thanks, Apple.

--
- don't top post
Mailing list etiquette:
- trim quoted reply to only relevant portions
- when possible, copy and paste text instead of screenshots

Ilpo Järvinen

unread,
Dec 29, 2019, 12:57:35 PM12/29/19
to Qubes user
I got an impression from somewhere, that AMD platform itself should
support really good IOMMU grouping but that there's then a BIOS option
to enable it (like "IOMMU: auto/enabled", where "auto" got you the
default conflicting groups; I read two lspcis from the same HW
somewhere with very different PCI dev layouting where the other was with
the "enabled" setting but I guess it was a desktop MB). I suspect,
however, laptop vendors may not be putting that much effort on
including such options.


--
i.

Claudia

unread,
Dec 29, 2019, 7:25:49 PM12/29/19
to awokd, qubes...@googlegroups.com
December 29, 2019 2:19 PM, "awokd' via qubes-users" <qubes...@googlegroups.com> wrote:

> Claudia:
>
>> December 26, 2019 12:59 PM, "awokd' via qubes-users" <qubes...@googlegroups.com> wrote:
>>
>>> Claudia:
>>>
>>> TLDR; check bottom of https://community.amd.com/thread/241650, looks
>>> like there was a recently released related updated. Not sure if
>>> applicable to your situation.
>>
>> Thanks for the link! I'm not sure if it affects me or not. I did install a Dell BIOS update dated
>> March 2019, so it sounds like that could have contained this Agesa update. So downgrading might fix
>> the grouping issue, but this update also contained an "urgent" security update which I'd have to
>> look into before downgrading.
>
> I'd assumed AGESA version numbers were from a common code base, but
> apparently not. The one mentioned in that thread was released around
> Oct. 2019, but may not be applicable to your hardware. They also don't
> specifically reference USB controller grouping in that thread, so it
> might do nothing for you even if it is applicable.

The fixed version appears to be for 3000-series processors. At least, when I was googling around I didn't see any 2000's. I have the 2500U. And besides that, I don't think there's any way for me to install it without Dell releasing a firmware update, is there?. The fix was from October, but the original/broken Agesa update was from July or earlier. So I thought maybe the March firmware update broke it, but the first thing I did was update firmware so I don't know if grouping was any different before.



>> I sort of blame Xen for not enforcing IOMMU grouping, especially considering that it hides that
>> info from the OS. KVM does enforce IOMMU grouping rules, so I don't see why Xen wouldn't. Xen
>> leaves it up to the user software to be careful what it passes where, but that's kind of hard when
>> you don't have /sys/kernel/iommu_groups for a hint.
>
> I am a bit fuzzy here too. It seems like if ACS is working correctly,
> you can get better granularity within IOMMU groups. It would be
> disappointing if it does not on recently released hardware. In your

(TL;DR - I don't think ACS matters in Xen)

I do recall seeing some info about ACS. I don't know how to check if it's supported/working. But I don't think it matters. When I say IOMMU grouping I'm actually talking about two different things. One is the grouping "policy" (so to speak), that shows up in /sys/kernel/iommu_groups, and the grouping structure is determined using the ACS protocol. This provides an interface so that software like KVM can prevent you from accidentally separating inter-dependent devices into different VMs, which can cause memory corruption or security holes or whatever. If ACS is not supported or not working, the kernel has to assume that basically all devices on the same bus(?) are interdependent, and then you end up with crappy grouping. However, unlike KVM, Xen does not, I repeat, does not enforce this policy. Xen leaves it up to the user to know what they're doing.

Hence this leads us to the second sense of IOMMU grouping: the "de facto" grouping (so to speak), which means some set of devices really actually truly are interdependent, by virtue of directly sharing untranslated memory addresses for example, and will cause a crash if separated. Case in point: KVM users sometimes install an unofficial "ACS override patch" that lies about the "policy" part, in order to separate devices that normally belong to the same group, and sometimes it will work mostly fine as long as the devices in question are not "de facto" interdependent. (Patches are also added to the official kernel for specific devices when the vendor certifies that they can safely be separated.) There is no such thing for Xen, because Xen doesn't attempt to enforce the grouping policy in the first place. So ACS should be a non-issue in Xen.

So in my case, I'm pretty sure that most of my devices are de facto interdependent, because separating the USB controllers from the rest of the group causes an instant crash. The de facto groups probably can be influenced by firmware/microcode in addition to the hardware.

That's my understanding anyway. I could be wrong.

> case, the USB controller appears as a different function of the same PCI
> device, which could be the case from a hardware perspective. This is
> even worse for a passthrough scenario than IOMMU grouping. There is a
> Realtek controller that often comes up on the list that makes people
> passthrough the SD card controller to their sys-net along with WIFI for
> the same reason.

That's something I haven't been able to figure out: are functions of the same device always inherently in the same de facto group? Or does the BDF structure have little to do with grouping? It seems likely that functions of the same device would communicate directly instead of via the bus/IOMMU. But it's also conceivable that some devices would intentionally send data through the IOMMU instead of directly between functions. Especially the integrated devices on a CPU that supports IOMMU, since there's not much point in having an IOMMU if nearly every device is interdependent (as is the case with Ryzen apparently). I don't know if such devices actually exist or not though.

Further evidence, the ACS override patch can be configured to "multifunction" mode, which I think means that it allows functions of the same device to be separated. Even if there are no existing devices that officially (via ACS) allow their functions to be separated, in practice it must work sometimes, or else multifunction mode would be completely pointless.

So I don't know that the structure of the devices and functions is necessarily more important than IOMMU grouping, per se. I think they may be independent of one another, at least in theory. In any case, the important thing is the de facto grouping.

>> This is a laptop, so I can't add any cards.
>
> This didn't used to be mutually exclusive. Thanks, Apple.

Ha. Now that you mention it, I do remember laptops used to have PCIe slots. But I think those days are pretty much over.

On a side note, I remembered I saw some error about the IOMMU in the kernel logs at some point. I just ignored it at the time because I was dealing with bigger problems. I'm going to start a new thread for that.

brenda...@gmail.com

unread,
Dec 29, 2019, 8:27:20 PM12/29/19
to qubes-users
On Sunday, December 29, 2019 at 7:25:49 PM UTC-5, Claudia wrote:

Ha. Now that you mention it, I do remember laptops used to have PCIe slots. But I think those days are pretty much over.

On a side note, I remembered I saw some error about the IOMMU in the kernel logs at some point. I just ignored it at the time because I was dealing with bigger problems. I'm going to start a new thread for that.


Yup, many early-mid 2010s Lenovo Thinkpads have an externall expresscard slot: X230, T520, W520, T530, W530, T540, W540...

1 entire lane of PCIe 2.0 (3.2 Gbit/s ... ~300MB/s) bliss!

But more seriously, people actually used to use these for external gaming GPUs way back when.

For Qubes, on some of these models, the slot is very helpful: you can add an additional USB 3.0 root hub for external devices that can be mapped independently, even if you can only get about half-throughput from it.

B

PS - Also, some internal laptop slots for wifi/etc. are mPCIe...but using them for other purposes generally means leaving the laptop disassembled, which means...well...why use a laptop?

qubes123

unread,
Dec 30, 2019, 2:12:19 PM12/30/19
to qubes-users
> The improper grouping is probably somewhere in AGESA, which is provided

> to the manufacturers by AMD. It could be because of hardware related
> limitations, which again are supplied by AMD. Sometimes vendors take
> liberties (cost cutting measures) with both and break functionality, as
> their primary/sole concern is that Windows boots. This can especially be
> the case with consumer class machines such as Ryzen. Agree it would be
> nice if Xen handled this failure mode more gracefully. Not sure there is
> much Qubes can do here, though. On the other hand, my older AMD
> (pre-Ryzen) consumer laptop running Coreboot has correct groupings.


I could be wrong, but aren't these PCI assignments and hierarchies coded within the ACPI DSDT table in BIOS?
I remember as if in UEFI the ACPI tables could be overridden somehow...
Or - since kernel 5.3.x(?) you can supply certain ACPI tables (as files, stored in initrd) to the kernel using commandline parameters* (some additional acpi manipulations are needed to extract the current dsdt to see what is in there and make changes in aml...)

Or - before all - you can simply try to boot the kernel with cmdline: acpi=nocrs (or off) and let the kernel "enroll" the PCI devices. Maybe worth to try - just one reboot...


Claudia

unread,
Jan 6, 2020, 2:41:30 PM1/6/20
to qubes123, qubes-users
December 30, 2019 7:12 PM, "qubes123" <dm1.l...@gmail.com> wrote:

>> The improper grouping is probably somewhere in AGESA, which is provided
>
>>> to the manufacturers by AMD. It could be because of hardware related
>>> limitations, which again are supplied by AMD. Sometimes vendors take
>>> liberties (cost cutting measures) with both and break functionality, as
>>> their primary/sole concern is that Windows boots. This can especially be
>>> the case with consumer class machines such as Ryzen. Agree it would be
>>> nice if Xen handled this failure mode more gracefully. Not sure there is
>>> much Qubes can do here, though. On the other hand, my older AMD
>>> (pre-Ryzen) consumer laptop running Coreboot has correct groupings.
>
> I could be wrong, but aren't these PCI assignments and hierarchies coded within the ACPI DSDT table
> in BIOS?

I guess in some cases they are, and in other cases they're in hardware. For example if you have two devices between a physical PCI bridge, communication between those two devices might be sent across the bridge without ever making it to the IOMMU. I don't think there's any software approach could do anything about that kind of situation.

In my case, the USB controllers and most of the other devices are functions of the same PCI device, 00:03.0{1,2,3,4,6}. Therefore most likely any communication between them is happening within the device and not going to the IOMMU (00:00.2). However I don't know if this is because of the physical structure, or if it could be changed by modifying ACPI tables. I guess the only way to know would be to try it.

> I remember as if in UEFI the ACPI tables could be overridden somehow...
>
> Or - since kernel 5.3.x(?) you can supply certain ACPI tables (as files, stored in initrd) to the
> kernel using commandline parameters* (some additional acpi manipulations are needed to extract the
> current dsdt to see what is in there and make changes in aml...)

I understand the part about uploading the ACPI tables via initrd, but I would have no idea how to extract them, what they mean, or what changes to make to them.

Also, I haven't figured out if ACPI override actually changes the behavior of PCI devices, or if it just spoofs the information provided to the kernel/hypervisor (which would make it unnecessary/ineffective on Xen). According to the OSDev wiki: "AML interpreter can build up a database of all devices within a system and the properties and functions they support (in reference to configuration and power management)."

> Or - before all - you can simply try to boot the kernel with cmdline: acpi=nocrs (or off) and let
> the kernel "enroll" the PCI devices. Maybe worth to try - just one reboot...

I did some tests by playing sound in a VM and then binding pciback to the USB controllers to simulate passthru. None of them were successful. At the time of the bind command, audio stopped, and the screen would freeze unless nomodeset was on. I did the testing in the 4.1 pre-release.

I tested four combinations of parameters: (none), acpi=off, acpi=nocrs, and acpi=nocrs pci=nocrs, each with and without Xen. In the non-Xen tests, iommu_groups was the same every time. In the Xen tests, xl dmesg and xl info were identical every time. In all tests, lspci and lspci -t were identical. Kernel logs and lspci -kvvnn had some differences each time, but nothing that looked important. If I should look for anything specific please let me know. Note, the data was collected right after I logged in, before I performed the passthru. Not one of my better decisions.

However the only thing I recall seeing in the logs at the time of the passthru was this, with acpi=nocrs pci=nocrs:
xhci_hcd 0000:03:00.4: Host halt failed, -110
xhci_hcd 0000:03:00.4: Host controller not halted, aborting reset.
xhci_hcd 0000:03:00.4: USB bus 3 deregistered
pciback 0000:03:00.3: seizing device
xen: registering gsi 55 triggering 0 polarity 1
Already setup the GSI :55
pciback 0000:03:00.4: seizing device
xen: registering gsi 52 triggering 0 polarity 1
Already setup the GSI :52

Could it be a PCI reset related problem?

Finally, a possible workaround I thought of is putting sys-usb into PV mode, since PV passthru doesn't use the IOMMU. It wouldn't be quite as secure as HVM, as it wouldn't prevent a DMA attack, but it would still be better than having USB in dom0. However it looks like Qubes 4.1 isn't going to support any kind of passthru for PVs, so I'll ultimately end up back where I started. I don't currently have sys-usb installed, but I might try it when I have some time.

Reply all
Reply to author
Forward
0 new messages