Cannot reproduce documented Nitefurry II-based FireSim environment due to XDMA Errors

387 views
Skip to first unread message

Wenzel Pünter

unread,
Aug 29, 2023, 8:18:04 AM8/29/23
to FireSim
Hello everyone,

I have been fighting the toolchain for a few days now, trying to establish a single-machine on-premise FireSim 1.17.1 environment for academic research based on a Nitefurry II M2-FPGA as documented here and am now seeking for help to get it running.
Has anyone recently build such an environment successfully? With which setup?

The FPGA is attached to the internal M2 slot of a Lenovo M720T with an i5-9400, 32GiB RAM, 4TB SATA SSD running a clean install of Ubuntu (Desktop) 20.04.6 LTS with all updates installed, disabled Secure Boot, and stock kernel 5.12.0-82-generic. The GRUB default cmdline has been extended to include "module.sig_enforce=0". The FPGA is connected via a Xilinx HS2 cable to an USB2 port of the motherboard.

After preparing the environment, installing Vivado Lab Edition 2023.1_0507_1903 with cable drivers, building the XDMA driver on git commit 0e8d321 with the modified `XDMA_ENGINE_XFER_MAX_DESC = 16` value in `libxdma.h`, the XVSEC driver, and programming the FPGA with the pre-prepared firesim.msc memory image downloaded from here as linked here in Vivado , the FPGA exposes a PCI device with two memory regions:

  1. wenzel@fpgabox:~$ lspci -vvv -d 10ee:903f
  2. 01:00.0 Processing accelerators: Xilinx Corporation Device 903f
  3. Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
  4. Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
  5. Interrupt: pin A routed to IRQ 16
  6. Region 0: Memory at b2000000 (32-bit, non-prefetchable) [virtual] [size=32M]
  7. Region 2: Memory at b0000000 (64-bit, prefetchable) [virtual] [size=64K]
  8. Capabilities: <access denied>
  9. Kernel modules: xdma

The XDMA driver however cannot identify the Config Bar:

  1. [ 42.103771] xdma: loading out-of-tree module taints kernel.
  2. [ 42.148764] xdma: module verification failed: signature and/or required key missing - tainting kernel
  3. [ 42.154696] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2020.2.2
  4. [ 42.154699] xdma:xdma_mod_init: desc_blen_max: 0xfffffff/268435455, timeout: h2c 10 c2h 10 sec.
  5. [ 42.154724] xdma:xdma_device_open: xdma device 0000:01:00.0, 0x00000000deb1791b.
  6. [ 42.154732] xdma 0000:01:00.0: enabling device (0000 -> 0002)
  7. [ 42.154843] xdma:map_single_bar: BAR0 at 0xb2000000 mapped at 0x000000008d104a9b, length=33554432(/33554432)
  8. [ 42.154855] xdma:map_single_bar: BAR2 at 0xb0000000 mapped at 0x00000000420b7988, length=65536(/65536)
  9. [ 42.154858] xdma:map_bars: Failed to detect XDMA config BAR
  10. [ 42.155625] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
  11. [ 42.155628] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
  12. [ 42.160232] usbcore: registered new interface driver ftdi_sio
  13. [ 42.160441] usbserial: USB Serial support registered for FTDI USB Serial Device
  14. [ 42.160816] ftdi_sio 1-7:1.0: FTDI USB Serial Device converter detected
  15. [ 42.163750] usb 1-7: Detected FT232H
  16. [ 42.176688] xdma:probe_one: pdev 0x00000000deb1791b, err -22.
  17. [ 42.176693] xdma:xpdev_free: xpdev 0x000000004a4d682d, destroy_interfaces, xdev 0x0000000000000000.
  18. [ 42.176695] xdma:xpdev_free: xpdev 0x000000004a4d682d, xdev 0x0000000000000000 xdma_device_close.
  19. [ 42.176697] xdma: probe of 0000:01:00.0 failed with error -22
  20. [ 42.177240] pci 0000:01:00.0: AER: can't recover (no error_detected callback)
  21. [ 42.183157] usb 1-7: FTDI USB Serial Device converter now attached to ttyUSB0
  22. [ 42.316683] pcieport 0000:00:1b.0: AER: device recovery failed
  23. [ 118.432521] ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0
  24. [ 118.432534] ftdi_sio 1-7:1.0: device disconnected

When ignoring this error and following the guide further, the FireSim toolchain cannot identify the FPGA in `firesim enumeratefpgas` due to the XDMA failure. These are some quirks I applied deviating from the guide:
- initializing the conda .bashrc modifications manually via `conda init bash` because it never asks
- setting `default_hw_config` to `nitefury_firesim_rocket_singlecore_no_nic` because it is not generated this way by `firesim managerinit --platform rhsresearch_nitefury_ii`

A snippet of the FireSim enumeratefpgas output (full is here):
2023-08-29 14:02:56,191 [flush       ] [DEBUG]  [localhost] out:
2023-08-29 14:02:56,191 [flush       ] [DEBUG]  [localhost] out: stderr:
2023-08-29 14:02:56,191 [flush       ] [DEBUG]  [localhost] out:
2023-08-29 14:02:56,191 [flush       ] [DEBUG]  [localhost] out: :INFO: Disconnecting BDF: 01:00.0
2023-08-29 14:02:56,227 [flush       ] [DEBUG]  [localhost] out: :INFO: Programming Digilent/210249B8751C with /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/rhsresearch_nitefury_ii/firesim.bit
2023-08-29 14:03:03,665 [flush       ] [DEBUG]  [localhost] out: :INFO: Reconnecting BDF: 01:00.0
2023-08-29 14:03:03,794 [flush       ] [DEBUG]  [localhost] out: :INFO: Running check fingerprint driver call with 01:00.0
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: :WARNING: Running the driver failed...
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: :DEBUG: bdf: 01:00.0 bus_id: 01
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: stdout:
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: Domain ID not specified. Assuming Domain ID 0
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: Device ID not specified. Assuming Device ID 0
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: Function ID not specified. Assuming Function ID 0
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: BAR ID not specified. Assuming BAR ID 0
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: PCI Vendor ID not specified. Assuming PCI Vendor ID 0x10ee
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: PCI Device ID not specified. Assuming PCI Device ID 0x903f
2023-08-29 14:03:03,906 [flush       ] [DEBUG]  [localhost] out: FireSim-rhsresearch_nitefury_ii: /home/wenzel/firesim/sim/midas/src/main/cc/simif_rhsresearch_nitefury_ii.cc:268: void simif_xilinx_alveo_u250_t::fpga_setup(uint16_t, uint8_t, uint8_t, uint8_t, uint8_t, uint16_t, uint16_t): Assertion `xdma_id != -1' failed.
2023-08-29 14:03:03,907 [flush       ] [DEBUG]  [localhost] out:
2023-08-29 14:03:03,907 [flush       ] [DEBUG]  [localhost] out: stderr:
2023-08-29 14:03:03,907 [flush       ] [DEBUG]  [localhost] out:
2023-08-29 14:03:03,907 [flush       ] [DEBUG]  [localhost] out: :ERROR: Unable to determine BDF for Digilent/210249B8751C FPGA. Something went wrong
2023-08-29 14:03:03,911 [flush       ] [DEBUG]  [localhost] out:
2023-08-29 14:03:03,913 [flush       ] [INFO ]  Fatal error: run() received nonzero return code 1 while executing!
2023-08-29 14:03:03,913 [flush       ] [INFO ]  Requested: ./scripts/generate-fpga-db.py --bitstream /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/rhsresearch_nitefury_ii/firesim.bit --driver /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-rhsresearch_nitefury_ii --out-db-json /opt/firesim-db.json
2023-08-29 14:03:03,913 [flush       ] [INFO ]  Executed: /bin/bash -l -c "cd /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging >/dev/null && ./scripts/generate-fpga-db.py --bitstream /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/rhsresearch_nitefury_ii/firesim.bit --driver /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-rhsresearch_nitefury_ii --out-db-json /opt/firesim-db.json"
2023-08-29 14:03:03,913 [flush       ] [INFO ]  Aborting.
2023-08-29 14:03:03,932 [flush       ] [INFO ]  Fatal error: One or more hosts failed while executing task 'enumerate_fpgas_node_wrapper'
2023-08-29 14:03:03,932 [flush       ] [INFO ]  Aborting.
2023-08-29 14:03:03,936 [<module>    ] [ERROR]  Fatal error.


If I understood the issue right, it seems to be a bug in the XDMA driver or the XDMA IP as used by the precompiled bitstream, as also discussed here. When enabling the debug statements in the XDMA driver, the read register values for both bars are all 0xF, indicating that the IP block was not found on the AXI lite bus.

Has anyone an idea how to fix this or under which conditions the guide works?

Thanks,
Wenzel

--
Wenzel Pünter
Doctoral Student
CTI Lab, HPI

Wenzel Pünter

unread,
Sep 1, 2023, 8:42:18 AM9/1/23
to FireSim
If anyone has the same issue: it turned out that the used M2 slot of the hardware configuration was only powered by one PCIe lane. While the Xilinx PCIe IP block per se is also able to communicate with just one lane, it cannot forward the 128bit AXI lite bus using this configuration. Choosing different hardware (in my case a Supermicro server board) that provides four PICe lanes to the M2 card resolved this issue.

Gon Solo

unread,
Apr 15, 2024, 7:02:50 AM4/15/24
to FireSim
Wenzel Pünter schrieb am Freitag, 1. September 2023 um 14:42:18 UTC+2:
If anyone has the same issue: it turned out that the used M2 slot of the hardware configuration was only powered by one PCIe lane. While the Xilinx PCIe IP block per se is also able to communicate with just one lane, it cannot forward the 128bit AXI lite bus using this configuration. Choosing different hardware (in my case a Supermicro server board) that provides four PICe lanes to the M2 card resolved this issue.

I have the same issue. However, my M2 slot seems to have four lanes:
Output of "lspci -vvv -d 10ee:903f|grep Width":

LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
LnkSta: Speed 5GT/s, Width x4

Is there anything I can debug this further?

Gon Solo

unread,
Apr 23, 2024, 6:47:03 AM4/23/24
to FireSim
When flashing the fpga with the project that comes with the Nitefury II, the "config BAR" can be found and fury_test.sh works as expected.

I also tried building a new Firesim bitstream and using this. The result is always the same:

[  +0.000004] xdma:is_config_bar: BAR 2 is NOT the XDMA config BAR: 0xffffffff, 0xffffffff.
[  +0.000003] xdma:map_bars: Failed to detect XDMA config BAR
[  +0.032838] xdma:probe_one: pdev 0x0000000071e847b7, err -22.
[  +0.000003] xdma:xpdev_free: xpdev 0x00000000d1c759ee, destroy_interfaces, xdev 0x0000000000000000.
[  +0.000002] xdma:xpdev_free: xpdev 0x00000000d1c759ee, xdev 0x0000000000000000 xdma_device_close.
[  +0.000002] xdma:xdma_device_close: pdev 0x0000000071e847b7, xdev 0x0000000000000000.
[  +0.000002] xdma: probe of 0000:08:00.0 failed with error -22

Any advice?
Reply all
Reply to author
Forward
0 new messages