Atheros AR928X & Q4.0rc3 Passthrough

68 views
Skip to first unread message

awokd

unread,
Dec 16, 2017, 9:21:47 AM12/16/17
to qubes...@googlegroups.com
Getting crashes on domU boot with an assigned Atheros wireless PCIe card
under Qubes 4.0rc3 with both PV and HVM. Any suggestions how to accomplish
it? Some of the posts/threads I find go back to 2010 but I'm still
stumped.

*****

dom0 shows:
Dec 16 05:12:14 dom0 kernel: pci 0000:02:00.0: [168c:002a] type 00 class
0x028000
Dec 16 05:12:14 dom0 kernel: pci 0000:02:00.0: reg 0x10: [mem
0xf0100000-0xf010ffff 64bit]
Dec 16 05:12:14 dom0 kernel: pci 0000:02:00.0: supports D1
Dec 16 05:12:14 dom0 kernel: pci 0000:02:00.0: PME# supported from D0 D1
D3hot
Dec 16 05:12:14 dom0 kernel: pci 0000:02:00.0: disabling ASPM on pre-1.1
PCIe device. You can enable it with 'pcie_aspm=force'
...
Dec 16 05:12:14 dom0 kernel: pci 0000:02:00.0: Signaling PME through PCIe
PME interrupt
...
Dec 16 05:12:16 dom0 kernel: pciback 0000:02:00.0: seizing device
Dec 16 05:12:16 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:12:16 dom0 kernel: Already setup the GSI :17

*****

With qvm-prefs personal virt_mode pv (first run in guest-personal.log), I see

[ 0.000000] p2m virtual area at ffffc90000000000, size is 1000000
[ 0.000000] Remapped 0 page(s)
...
[ 3.971434] ath9k 0000:00:00.0: Xen PCI mapped GSI17 to IRQ15
[ 3.971674] ath: phy0: Enable WAR for ASPM D3/L1
[ 4.397628] BUG: unable to handle kernel paging request at
ffffc90001cd0040
[ 4.397651] IP: [<ffffffff81405f4e>] iowrite32+0x2e/0x40
[ 4.397667] PGD 18831067 [ 4.397671] PUD 18830067
PMD 11ef3067 [ 4.397683] PTE 80100000f0100075
[ 4.397690]
[ 4.397696] Oops: 0003 [#1] SMP
...
[ 4.398003] RIP [<ffffffff81405f4e>] iowrite32+0x2e/0x40

and in dom0

Dec 16 05:14:52 dom0 kernel: xen_pciback: vpci: 0000:02:00.0: assign to
virtual slot 0
Dec 16 05:14:52 dom0 kernel: pciback 0000:02:00.0: registering for 9
...
Dec 16 05:14:56 dom0 kernel: pciback 0000:02:00.0: enabling device (0000
-> 0002)
Dec 16 05:14:56 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:14:56 dom0 kernel: Already setup the GSI :17
Dec 16 05:14:56 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:14:56 dom0 kernel: Already setup the GSI :17
Dec 16 05:14:56 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:14:56 dom0 kernel: Already setup the GSI :17
Dec 16 05:14:56 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:14:56 dom0 kernel: Already setup the GSI :17
Dec 16 05:14:56 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:14:56 dom0 kernel: Already setup the GSI :17
Dec 16 05:14:56 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:14:56 dom0 kernel: Already setup the GSI :17
Dec 16 05:14:56 dom0 kernel: pciback 0000:02:00.0: Driver tried to write
to a read-only configuration space field at offset 0x92, size 2. This may
be harmless, but if you have problems with your device:
1) see permissive attribute in sysfs
2) report problems to the xen-devel mailing
list along with details of your device
obtained from lspci.

I think that is related to the following code in
https://github.com/torvalds/linux/blob/master/drivers/net/wireless/ath/ath9k/init.c

if (sc->driver_data & ATH9K_PCI_D3_L1_WAR) {
ah->config.pcie_waen = 0x0040473b;
ath_info(common, "Enable WAR for ASPM D3/L1\n");
}

which I'm guessing is what leads to the configuration space write and crash.

*****

Setting qvm-prefs personal virt_mode hvm (second run in
guest-personal.log), I see

Dec 16 05:15:48 dom0 kernel: xen_pciback: vpci: 0000:02:00.0: assign to
virtual slot 0
Dec 16 05:15:48 dom0 kernel: pciback 0000:02:00.0: registering for 11
Dec 16 05:15:49 dom0 kernel: pciback 0000:02:00.0: enabling device (0000
-> 0002)
Dec 16 05:15:49 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:15:49 dom0 kernel: Already setup the GSI :17
Dec 16 05:15:49 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:15:49 dom0 kernel: Already setup the GSI :17
Dec 16 05:15:49 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:15:49 dom0 kernel: Already setup the GSI :17
Dec 16 05:15:49 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:15:49 dom0 kernel: Already setup the GSI :17
Dec 16 05:15:49 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:15:49 dom0 kernel: Already setup the GSI :17
Dec 16 05:15:49 dom0 kernel: xen: registering gsi 17 triggering 0 polarity 1
Dec 16 05:15:49 dom0 kernel: Already setup the GSI :17

and in hypervisor.log

(XEN) domain_crash called from svm.c:1541
(XEN) Domain 10 (vcpu#0) crashed on cpu#2:
(XEN) ------[ Xen-4.8.2 x86_64 debug=n Not tainted ]------
(XEN) CPU: 2
(XEN) RIP: 0010:[<ffffffff97405f4e>]
(XEN) RFLAGS: 0000000000000296 CONTEXT: hvm guest (d10v0)
(XEN) rax: 0000000000000000 rbx: ffff97c946eb95c0 rcx: 0000000000000005
(XEN) rdx: 0000000000000040 rsi: ffffaf5580700040 rdi: 0000000000000000
(XEN) rbp: ffffaf55806cb8f8 rsp: ffffaf55806cb8c8 r8: 0000000000000000
(XEN) r9: 00000000ffffff90 r10: 000000000000003f r11: 0000000000000000
(XEN) r12: 0000000000000000 r13: ffffffffc04fb7d0 r14: 0000000000000100
(XEN) r15: ffff97c946eb4028 cr0: 0000000080050033 cr4: 00000000000406f0
(XEN) cr3: 0000000012e30000 cr2: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0018 cs: 0010

*****

I've tried several things such as adding permissive and no-strict-reset
flags when attaching the device, bunch of ath9k kernel options, etc. Only
thing that resulted in any change whatsoever was when I blacklisted the
ath9k module entirely, then I could boot.
Not sure where to go next. Figure out how to edit Xen quirks? Comment out
everything that looks like a write and recompile the driver? Throw it away
and buy something else? (I'd prefer to get this working somehow.)

*****

dom0 lspci -vv

02:00.0 Network controller: Qualcomm Atheros AR928X Wireless Network
Adapter (PCI-Express) (rev 01)
Subsystem: Fujitsu Limited. Device 147c
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 17
Region 0: Memory at f0100000 (64-bit, non-prefetchable) [disabled]
[size=64K]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA
PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities: [60] Express (v1) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s
<512ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
Capabilities: [90] MSI-X: Enable- Count=1 Masked-
Vector table: BAR=0 offset=00000000
PBA: BAR=0 offset=00000000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+
ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: pciback
Kernel modules: ath9k



dmesg.log
guest-personal.log

Holger Levsen

unread,
Dec 17, 2017, 6:44:09 AM12/17/17
to aw...@danwin1210.me, qubes...@googlegroups.com
On Sat, Dec 16, 2017 at 02:21:30PM -0000, 'awokd' via qubes-users wrote:
> Getting crashes on domU boot with an assigned Atheros wireless PCIe card
> under Qubes 4.0rc3 with both PV and HVM. Any suggestions how to accomplish
> it? Some of the posts/threads I find go back to 2010 but I'm still
> stumped.
[...]
> I've tried several things such as adding permissive and no-strict-reset
> flags when attaching the device, bunch of ath9k kernel options, etc. Only
> thing that resulted in any change whatsoever was when I blacklisted the
> ath9k module entirely, then I could boot.
> Not sure where to go next. Figure out how to edit Xen quirks? Comment out
> everything that looks like a write and recompile the driver? Throw it away
> and buy something else? (I'd prefer to get this working somehow.)

I cannot really help you, but for me it's good to see someone else has
this problem with an Atheros AR928X card as well. I was testing it on
Qubes 3.2 with coreboot and wasnt 100% sure this was due to Qubes/Xen,
or coreboot or hardware… still need to try that hw with pure Debian to
rule out that it's a hw problem.

--
cheers,
Holger
signature.asc

awokd

unread,
Dec 17, 2017, 7:00:19 AM12/17/17
to qubes...@googlegroups.com
On Sun, December 17, 2017 11:44 am, Holger Levsen wrote:
> On Sat, Dec 16, 2017 at 02:21:30PM -0000, 'awokd' via qubes-users wrote:
>
>> Getting crashes on domU boot with an assigned Atheros wireless PCIe
>> card under Qubes 4.0rc3 with both PV and HVM. Any suggestions how to
>> accomplish it? Some of the posts/threads I find go back to 2010 but I'm
>> still stumped.
> [...]
>

> I cannot really help you, but for me it's good to see someone else has
> this problem with an Atheros AR928X card as well. I was testing it on Qubes
> 3.2 with coreboot and wasnt 100% sure this was due to Qubes/Xen,
> or coreboot or hardware… still need to try that hw with pure Debian to rule
> out that it's a hw problem.

Thanks for taking a look! It works with no problems under pure Debian on
the same machine. If I swap drives I can also test it on a plain Xen
4.8.2/Fedora 26 setup but since Qubes tweaks Xen I'm not sure a success or
failure there would provide any useful information...

awokd

unread,
Dec 29, 2017, 3:08:05 PM12/29/17
to aw...@danwin1210.me, qubes...@googlegroups.com
So I got around to testing under Xen 4.8.2/Fedora 26 on the same machine
and pass-through to a Stretch HVM worked! Hit it with a bunch of iw
commands and couldn't make it crash.

Main difference I could find between Fedora's Xen and Qubes' was that
Fedora's had CONFIG_SHADOW_PAGING=y. I know it's off in Qubes
intentionally, but shouldn't matter since I'm using HAP HVM?

An AR9565 does work in pass-through under Q4.0rc3 on here.

Attached debug log for the Personal HVM. Xen/apic.c has multiple traces,
and why does it seem to be randomly assigning registers?

{"execute": "device_add", "arguments": {"driver": "xen-pci-passthrough",
"id": "xen-pci-pt_0000-02-00.0", "hostaddr": "0000:00:00.00",
"machine_addr": "0000:02:00.0", "permissive": false}}
[00:05.0] xen_pt_realize: Assigning real physical device 00:00.0 to devfn
0x28
[00:05.0] xen_pt_register_regions: IO region 0 registered (size=0x00010000
base_addr=0xf0100000 type: 0x4)
[00:05.0] xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080,
host=0x0000, syncing to 0x0000.
[00:05.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x0000,
host=0xf0100004, syncing to 0xf0100004.
[00:05.0] xen_pt_config_reg_init: Offset 0x0042 mismatch! Emulated=0x0000,
host=0x03c2, syncing to 0x0202.
[00:05.0] xen_pt_config_reg_init: Offset 0x0064 mismatch! Emulated=0x0000,
host=0x0cc0, syncing to 0x0cc0.
[00:05.0] xen_pt_config_reg_init: Offset 0x0072 mismatch! Emulated=0x0000,
host=0x1011, syncing to 0x1011.
[00:05.0] xen_pt_pci_intx: intx=1
[00:05.0] xen_pt_realize: Real physical device 00:00.0 registered
successfully


guest-personal-dm.log

awokd

unread,
Jan 7, 2018, 2:12:45 PM1/7/18
to aw...@danwin1210.me, qubes...@googlegroups.com
Got Xen debugging enabled (thank you, Marek!) and am seeing the following
in the log now. If I understand the code right, which is a big assumption
on my part, it's crashing because it's attempting to do a type 5 access to
a page setup in mode 3. So my questions to anyone in general or myself
are:

1. Why does it try to do that?
2. Why does it work in regular Fedora 26's Xen 4.8.2-7.fc26 but not Qubes'
Xen 4.8.2-11.fc25?
3. How can I fix it?


(XEN) AMD-Vi: Setup I/O page table: device id = 0x200, type = 0x1, root
table = 0x22f674000, domain = 0, paging mode = 3

(XEN) AMD-Vi: Disable: device id = 0x200, domain = 0, paging mode = 3
(XEN) AMD-Vi: Setup I/O page table: device id = 0x200, type = 0x1, root
table = 0x264921000, domain = 9, paging mode = 3
(XEN) AMD-Vi: Re-assign 0000:02:00.0 from dom0 to dom9

(d9) pci dev 05:0 bar 10 size 000010000: 0f2020004
(XEN) memory_map:add: dom9 gfn=f2020 mfn=f0100 nr=10

(XEN) memory_map:add: dom9 gfn=f2020 mfn=f0100 nr=10
(XEN) irq.c:275: Dom9 PCI link 0 changed 5 -> 0
(XEN) irq.c:275: Dom9 PCI link 1 changed 10 -> 0
(XEN) irq.c:275: Dom9 PCI link 2 changed 11 -> 0
(XEN) irq.c:275: Dom9 PCI link 3 changed 5 -> 0
(XEN) svm.c:1540:d9v0 SVM violation gpa 0x000000f2020040, mfn 0xf0100, type 5
(XEN) domain_crash called from svm.c:1541
(XEN) Domain 9 (vcpu#0) crashed on cpu#1:
(XEN) ----[ Xen-4.8.2 x86_64 debug=y Not tainted ]----
(XEN) CPU: 1
(XEN) RIP: 0010:[<ffffffffab405f4e>]
(XEN) RFLAGS: 0000000000000296 CONTEXT: hvm guest (d9v0)
(XEN) rax: 0000000000000000 rbx: ffff8fb606b115c0 rcx: 0000000000000005
(XEN) rdx: 0000000000000040 rsi: ffffb0eac0d00040 rdi: 0000000000000000
(XEN) rbp: ffffb0eac0c938f8 rsp: ffffb0eac0c938c8 r8: 0000000000000000
(XEN) r9: 00000000ffffff90 r10: 000000000000003f r11: 0000000000000000
(XEN) r12: 0000000000000000 r13: ffffffffc04d97d0 r14: 0000000000000100
(XEN) r15: ffff8fb606b18028 cr0: 0000000080050033 cr4: 00000000000406f0
(XEN) cr3: 000000010688d000 cr2: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0018 cs: 0010
(XEN) grant_table.c:3388:d0v3 Grant release (0) ref:(151) flags:(2) dom:(0)
(XEN) grant_table.c:3388:d0v3 Grant release (1) ref:(52) flags:(2) dom:(0)
(XEN) grant_table.c:3388:d0v3 Grant release (2) ref:(120) flags:(6) dom:(0)
(XEN) AMD-Vi: Disable: device id = 0x200, domain = 9, paging mode = 3
(XEN) AMD-Vi: Setup I/O page table: device id = 0x200, type = 0x1, root
table = 0x22f674000, domain = 0, paging mode = 3
(XEN) AMD-Vi: Re-assign 0000:02:00.0 from dom9 to dom0




Reply all
Reply to author
Forward
0 new messages