msi pci pass-through error with new Qualcomm AX500

27 views
Skip to first unread message

Rama Mcintosh

unread,
Jan 10, 2021, 1:37:18 AM1/10/21
to qubes-devel

Hi,

Dell soldered  the new AX500 pci card to my motherboard (HCL https://groups.google.com/g/qubes-users/c/Fa65-e8vqdM).

Please let me know if a qubes-issue is a better place to track this.

It gets -28 in sys-net:

[    5.706045] ath11k_pci 0000:00:06.0: WARNING: ath11k PCI support is experimental!
[    5.706448] ath11k_pci 0000:00:06.0: BAR 0: assigned [mem 0xf2000000-0xf20fffff 64bit]
[    5.734248] ath11k_pci 0000:00:06.0: failed to get 32 MSI vectors, only -28 available
[    5.734289] ath11k_pci 0000:00:06.0: failed to enable msi: -28
[    5.736589] ath11k_pci: probe of 0000:00:06.0 failed with error -28

The driver works fine in bare metal fedora 33.   I tried permissive and no-strict-reset.   I've been looking at xen and suspect the bug is there.  I did notice a patch to fix a intel wifi driver to work with xen.

Any help of what needs fixing would be appreciated.   Thanks.

Rama McIntosh

unread,
Jan 10, 2021, 2:25:58 AM1/10/21
to qubes-devel
more info:

From xen/console/guest-sys-net-dm.log:
[2021-01-09 18:51:45] [00:06.0] xen_pt_realize: Assigning real physical device 05:00.0 to devfn 0x30
[2021-01-09 18:51:45] [00:06.0] xen_pt_register_regions: IO region 0 registered (size=0x00100000 base_addr=0xd2100000 type: 0x4)
[2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x000e mismatch! Emulated=0x0080, host=0x0000, syncing to 0x0000.
[2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x0000, host=0xd2100004, syncing to 0xd2100004.
[2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x0042 mismatch! Emulated=0x0000, host=0x0003, syncing to 0x0003.
[2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x0074 mismatch! Emulated=0x0000, host=0x5908fc0, syncing to 0x5908fc0.
[2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x007a mismatch! Emulated=0x0000, host=0x0010, syncing to 0x0010.
[2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x0082 mismatch! Emulated=0x0000, host=0x1012, syncing to 0x1012.
[2021-01-09 18:51:45] [00:06.0] xen_pt_realize: no pin interrupt
[2021-01-09 18:51:45] [00:06.0] xen_pt_realize: Real physical device 05:00.0 registered successfully



lspci from dom0:

05:00.0 Network controller: Qualcomm Device 1101 (rev 01)
    Subsystem: Rivet Networks Device a501
    Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Region 0: Memory at d2100000 (64-bit, non-prefetchable) [size=1M]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit-
        Address: 00000000  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [70] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
        LnkCap:    Port #0, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <64us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s (downgraded), Width x1 (ok)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
             10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS-, TPHComp+, ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
             AtomicOpsCtl: ReqEn-
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v2] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [148 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
        LaneErrStat: 0
    Capabilities: [158 v1] Transaction Processing Hints
        No steering table available
    Capabilities: [1e4 v1] Latency Tolerance Reporting
        Max snoop latency: 0ns
        Max no snoop latency: 0ns
    Capabilities: [1ec v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
              PortCommonModeRestoreTime=70us PortTPowerOnTime=0us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=0ns
        L1SubCtl2: T_PwrOn=10us
    Kernel driver in use: pciback
    Kernel modules: ath11k_pci

lspci on sys-net:

00:06.0 Network controller: Qualcomm Device 1101 (rev 01)
    Subsystem: Rivet Networks Device a501
    Physical Slot: 6
    Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Region 0: Memory at f2000000 (64-bit, non-prefetchable) [size=1M]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
        Address: 00000000  Data: 0000
    Capabilities: [70] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
        LnkCap:    Port #0, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <64us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s (downgraded), Width x1 (ok)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- TPHComp+ ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
             EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Kernel modules: ath11k_pci


Marek Marczykowski-Górecki

unread,
Jan 10, 2021, 11:06:06 AM1/10/21
to Rama McIntosh, qubes-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Jan 09, 2021 at 11:25:57PM -0800, Rama McIntosh wrote:
> On Saturday, January 9, 2021 at 8:37:18 PM UTC-10 Rama McIntosh wrote:
> > Dell soldered the new AX500 pci card to my motherboard (HCL
> > https://groups.google.com/g/qubes-users/c/Fa65-e8vqdM).
> >
> > Please let me know if a qubes-issue is a better place to track this.
> >
> > So when the driver tries to initialize msi:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/wireless/ath/ath11k/pci.c?h=v5.10.5#n640
> >
> > It gets -28 in sys-net:
> >
> > [ 5.706045] ath11k_pci 0000:00:06.0: WARNING: ath11k PCI support is
> > experimental!
> > [ 5.706448] ath11k_pci 0000:00:06.0: BAR 0: assigned [mem
> > 0xf2000000-0xf20fffff 64bit]
> > [ 5.734248] ath11k_pci 0000:00:06.0: failed to get 32 MSI vectors, only
> > -28 available
> > [ 5.734289] ath11k_pci 0000:00:06.0: failed to enable msi: -28
> > [ 5.736589] ath11k_pci: probe of 0000:00:06.0 failed with error -28

- -28 is ENOSPC (No space left on device). Likely too many MSI vectors
were requested. Lets see below.

> > The driver works fine in bare metal fedora 33. I tried permissive and
> > no-strict-reset. I've been looking at xen and suspect the bug is there.
> > I did notice a patch to fix a intel wifi driver to work with xen.
> >
> > Any help of what needs fixing would be appreciated. Thanks.
> more info:
>
> From xen/console/guest-sys-net-dm.log:
> [2021-01-09 18:51:45] [00:06.0] xen_pt_realize: Assigning real physical
> device 05:00.0 to devfn 0x30
> [2021-01-09 18:51:45] [00:06.0] xen_pt_register_regions: IO region 0
> registered (size=0x00100000 base_addr=0xd2100000 type: 0x4)
> [2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x000e
> mismatch! Emulated=0x0080, host=0x0000, syncing to 0x0000.
> [2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x0010
> mismatch! Emulated=0x0000, host=0xd2100004, syncing to 0xd2100004.
> [2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x0042
> mismatch! Emulated=0x0000, host=0x0003, syncing to 0x0003.
> [2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x0074
> mismatch! Emulated=0x0000, host=0x5908fc0, syncing to 0x5908fc0.
> [2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x007a
> mismatch! Emulated=0x0000, host=0x0010, syncing to 0x0010.
> [2021-01-09 18:51:45] [00:06.0] xen_pt_config_reg_init: Offset 0x0082
> mismatch! Emulated=0x0000, host=0x1012, syncing to 0x1012.
> [2021-01-09 18:51:45] [00:06.0] xen_pt_realize: no pin interrupt
> [2021-01-09 18:51:45] [00:06.0] xen_pt_realize: Real physical device
> 05:00.0 registered successfully

There may be some hint above, but I haven't tried to decode it yet
(specifically - match offsets to specific capabilities you list below).

> lspci from dom0:
>
> 05:00.0 Network controller: Qualcomm Device 1101 (rev 01)
> Subsystem: Rivet Networks Device a501
> Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Region 0: Memory at d2100000 (64-bit, non-prefetchable) [size=1M]

(...)

> Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit-
> Address: 00000000 Data: 0000
> Masking: 00000000 Pending: 00000000

This looks interesting, specifically "Count=1/32". I haven't seen many
other devices with MSI (but not MSI-X) that has more than one vector.

> lspci on sys-net:
>
> 00:06.0 Network controller: Qualcomm Device 1101 (rev 01)
> Subsystem: Rivet Networks Device a501
> Physical Slot: 6
> Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Region 0: Memory at f2000000 (64-bit, non-prefetchable) [size=1M]

(...)

> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
> Address: 00000000 Data: 0000

And here it is presented as "Count=1/1". This looks very much related.
It may be some of the masking listed in qemu output
(guest-sys-net-dm.log file), but can be also somewhere else. Anyway
the most likely responsible place is in qemu.

So, I've looked into qemu sources and indeed, I've found this comment[1]:

/* Currently no support for multi-vector */
if (*val & PCI_MSI_FLAGS_QSIZE) {
XEN_PT_WARN(&s->dev, "Tries to set more than 1 vector ctrl %x\n", *val);
}

Later in the code you can find also relevant register definition:

/* Message Control reg */
{
.offset = PCI_MSI_FLAGS,
.size = 2,
.init_val = 0x0000,
.res_mask = 0xFE00,
.ro_mask = 0x018E,
.emu_mask = 0x017E,
.init = xen_pt_msgctrl_reg_init,
.u.w.read = xen_pt_word_reg_read,
.u.w.write = xen_pt_msgctrl_reg_write,
},

multi-vector is covered by bits 4-6 (mask 0x70)[2] and you can see above
that it's emulated (emu_mask) which means qemu provides own values
instead of passing them from the hardware and the values for those 3 bits
are 0 (init_val).

I'm not sure how hard would be implementing multi-vector support here,
but it's clearly not there.

[1] https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_config_init.c#L1104
[2] https://wiki.osdev.org/PCI#Enabling_MSI

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAl/7JeMACgkQ24/THMrX
1yxE4wf/bsj5hMkzKUatMHjUf+StfA0MObQFtrkStdG3XEynbJQgC/yNXkJiXokD
r7HGG6pxob+/2mNFc+OmMAvOIO7yLhew2EEiLGDiDkyPGIZsc92IdKE7Wijzh5Kf
6R/jrRf/lQMBD3CBrY8FaRpsqTBGp1uhxpeAmsf6Qjdh7bO1kjPzY08xmuoWf7gu
szSu/iStRuBo0irtFZL5W7DmcMrbtzq/sfnJzqJD5bxz4lqrzl5NQbgeJhP+nGWq
Ht0kSVvPxXMa9e/VMMQWLS+9ZElEGeDWhJw4f1fI3+1W533G+Y7z1ONzfYE+Jz9B
NnNzXTMuuDQkb2A8vaW1S8f1xjhtnQ==
=XcGW
-----END PGP SIGNATURE-----

Rama McIntosh

unread,
Jan 11, 2021, 4:16:57 AM1/11/21
to Marek Marczykowski-Górecki, qubes-devel
On Sun, Jan 10, 2021 at 6:06 AM Marek Marczykowski-Górecki <marm...@invisiblethingslab.com> wrote:

(...)

>     Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit-
>         Address: 00000000  Data: 0000
>         Masking: 00000000  Pending: 00000000

This looks interesting, specifically "Count=1/32". I haven't seen many
other devices with MSI (but not MSI-X) that has more than one vector.

Thanks, yes it looks like you nailed the issue.    The driver is dividing up the 32 vectors to MHI, CE, WAKE, and DP so it looks like I'll have to attempt to fix qemu as I can't simply make the driver use one vector.

(...)



So, I've looked into qemu sources and indeed, I've found this comment[1]:

    /* Currently no support for multi-vector */
    if (*val & PCI_MSI_FLAGS_QSIZE) {
        XEN_PT_WARN(&s->dev, "Tries to set more than 1 vector ctrl %x\n", *val);
    }

Later in the code you can find also relevant register definition:

    /* Message Control reg */
    {
        .offset     = PCI_MSI_FLAGS,
        .size       = 2,
        .init_val   = 0x0000,
        .res_mask   = 0xFE00,
        .ro_mask    = 0x018E,
        .emu_mask   = 0x017E,
        .init       = xen_pt_msgctrl_reg_init,
        .u.w.read   = xen_pt_word_reg_read,
        .u.w.write  = xen_pt_msgctrl_reg_write,
    },

multi-vector is covered by bits 4-6 (mask 0x70)[2] and you can see above
that it's emulated (emu_mask) which means qemu provides own values
instead of passing them from the hardware and the values for those 3 bits
are 0 (init_val).

I'm not sure how hard would be implementing multi-vector support here,
but it's clearly not there.

[1] https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_config_init.c#L1104
[2] https://wiki.osdev.org/PCI#Enabling_MSI

(...)

Thanks Marek.   Yes the Mismatch errors were from the same qemu code.   When I have some down time from work I'll be studying the qemu code and figure out how to add multi vector msi support.   The comment above in the qemu code about no support for multi-vector was from 9 years ago, so I guess it is only a high priority for me :)

-Rama
Reply all
Reply to author
Forward
0 new messages