How different from bare metal linux is Qubes NVME interaction?

75 views
Skip to first unread message

Guerlan

unread,
Jan 17, 2020, 8:57:20 PM1/17/20
to qubes-users
I have a problem where any linux distro in my Razer Blade Stealth will suffer from NVME corruption (nvme enters in read only mode) many times a day. I tried the most recent kernel, old ones, etc. Tried opening a bug in canonical launchpad, got some help, some kernel patches to try, but none worked. I made two independent tests that prove it's not a problem in my notebok or SSD: 1, I bougth a brand new SSD, same problem. 2: neither Windows or Qubes will give the error.

Actually, I installed Qubes just to see what would happen, because I wanted to run something non linux in my machine just to test. Turns out something in Qubes makes the dom0 access the NVME without any problems.

The way I think Xen/Qubes NVME interaction works is like this: xen has pci front and back drivers. Front is in xen, back is in dom0. dom0 then accesses the NVME through PCI. The nvme driver in dom0 is the same as the nvme driver in my older bare metal linux distributions which had the bug, so I can only think the bug happens in the PCI level, because it's the only thing different here.

Can someone with better understanding of Xen and Qubes give me a better picture and maybe guess what's happening and why I don't feel the bug at all in Qubes?

Claudia

unread,
Jan 18, 2020, 6:27:05 AM1/18/20
to Guerlan, qubes-users

Qubes doesn't use a storage domain, so the SATA controller (or in your case, NVME) doesn't use pciback as far as I know. If not for that, I would say it's probably because pciback blocks writes to read-only config space of PCI devices. However storage devices in Qubes just use the normal driver in dom0, just like any other Linux. You can check with `lspci -k`. Unfortunately I don't know enough about NVME to be much help.

Have you tried installing Xen in any other distro? See if the problem occurs there. You can also test suspend/resume while you're at it. If you find that it fixes the NVME problem and suspend works, you can continue using that distro under Xen, even if you don't actually use any domUs.

Guerlan

unread,
Jan 18, 2020, 1:47:39 PM1/18/20
to qubes-users
Thanks. Here's my lspci -k

[lz@dom0 ~]$ lspci -k
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
    Subsystem: Razer USA Ltd. Device 6752
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: i915
    Kernel modules: i915
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: pciback
    Kernel modules: xhci_pci
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: intel_pch_thermal
    Kernel modules: intel_pch_thermal
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: intel-lpss
    Kernel modules: intel_lpss_pci
00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: intel-lpss
    Kernel modules: intel_lpss_pci
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: mei_me
    Kernel modules: mei_me
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #3 (rev f1)
    Kernel driver in use: pcieport
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
    Kernel driver in use: pcieport
00:1e.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO UART Controller #0 (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: intel-lpss
    Kernel modules: intel_lpss_pci
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel, snd_soc_skl
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
    Subsystem: Razer USA Ltd. Device 6752
    Kernel driver in use: i801_smbus
    Kernel modules: i2c_i801
01:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
    Subsystem: Bigfoot Networks, Inc. Device 1535
    Kernel driver in use: pciback
    Kernel modules: ath10k_pci
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
    Subsystem: Samsung Electronics Co Ltd Device a801
    Kernel driver in use: nvme
    Kernel modules: nvme

I don't understand, how can dom0 access NVME without using the nvme driver? And how can the nvme driver communicate with my SSD if not through PCI? And shouldn't any PCI connection be made through xenpci driver? I thougth dom0 couldn't access raw pci

Claudia

unread,
Jan 18, 2020, 10:03:21 PM1/18/20
to Guerlan, qubes-users

No, pciback is like a proxy server that communicates via Xen mechanisms to domU's that want to use the hardware. Dom0 itself talks directly to the hardware just like a regular OS (for the most part, though there are exceptions).

"Domain 0 has responsibility for all devices on the system. Normally, as it discovers PCI devices, it passes those to drivers within the Linux kernel. In order for a device to be accessed by a guest, the device must instead be assigned to a special domain 0 driver. This driver is called xen-pciback in pvops kernels, and called pciback in classic kernels. PV guests access the device via a kernel driver in the guest called xen-pcifront (pcifront in classic xen kernels), which connects to pciback. HVM guests see the device on the emulated PCI bus presented by QEMU. "

https://wiki.xenproject.org/wiki/Xen_PCI_Passthrough

As you can see, your NVMe controller is using the "nvme" driver, not pciback. Xen actually does support passthru of SATA/NVMe/SCSI/whatever controllers to domUs, referred to as a storage domain, but Qubes doesn't utilize this feature.

Anyway, getting back to the point, as I said there are certain exceptions where Xen intervenes between dom0 and hardware sometimes. An example would be the IOMMU, which Xen takes over early in the boot process, and hides much of its functionality from the dom0 OS. I don't know if there might be exceptions like this in the context of PCI devices or NVMe controllers in particular. You'd have to ask someone more familiar with the inner workings of Xen. Usually Xen breaks things rather than fixing them, but I suppose it's possible that Xen is protecting your NVMe controller from the OS in some way, whether intentionally or not.

Reply all
Reply to author
Forward
0 new messages