Hello everyone,
I have been fighting the toolchain for a few days now, trying to establish a single-machine on-premise FireSim 1.17.1 environment for academic research based on a Nitefurry II M2-FPGA as documented
here and am now seeking for help to get it running.
Has anyone recently build such an environment successfully? With which setup?
The FPGA is attached to the internal M2 slot of a Lenovo M720T with an i5-9400, 32GiB RAM, 4TB SATA SSD running a clean install of Ubuntu (Desktop) 20.04.6 LTS with all updates installed, disabled Secure Boot, and stock kernel 5.12.0-82-generic. The GRUB default cmdline has been extended to include "module.sig_enforce=0". The FPGA is connected via a Xilinx HS2 cable to an USB2 port of the motherboard.
After preparing the environment, installing Vivado Lab Edition 2023.1_0507_1903 with cable drivers, building the XDMA driver on git commit 0e8d321 with the modified `XDMA_ENGINE_XFER_MAX_DESC = 16` value in `libxdma.h`, the XVSEC driver, and programming the FPGA with the pre-prepared firesim.msc memory image downloaded from
here as linked
here in Vivado , the FPGA exposes a PCI device with two memory regions:
wenzel@fpgabox:~$ lspci -vvv -d 10ee:903f
01:00.0 Processing accelerators: Xilinx Corporation Device 903f
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 16
Region 0: Memory at b2000000 (32-bit, non-prefetchable) [virtual] [size=32M]
Region 2: Memory at b0000000 (64-bit, prefetchable) [virtual] [size=64K]
Capabilities: <access denied>
Kernel modules: xdma
The XDMA driver however cannot identify the Config Bar:
[ 42.103771] xdma: loading out-of-tree module taints kernel.
[ 42.148764] xdma: module verification failed: signature and/or required key missing - tainting kernel
[ 42.154696] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2020.2.2
[ 42.154699] xdma:xdma_mod_init: desc_blen_max: 0xfffffff/268435455, timeout: h2c 10 c2h 10 sec.
[ 42.154724] xdma:xdma_device_open: xdma device 0000:01:00.0, 0x00000000deb1791b.
[ 42.154732] xdma 0000:01:00.0: enabling device (0000 -> 0002)
[ 42.154843] xdma:map_single_bar: BAR0 at 0xb2000000 mapped at 0x000000008d104a9b, length=33554432(/33554432)
[ 42.154855] xdma:map_single_bar: BAR2 at 0xb0000000 mapped at 0x00000000420b7988, length=65536(/65536)
[ 42.154858] xdma:map_bars: Failed to detect XDMA config BAR
[ 42.155625] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
[ 42.155628] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
[ 42.160232] usbcore: registered new interface driver ftdi_sio
[ 42.160441] usbserial: USB Serial support registered for FTDI USB Serial Device
[ 42.160816] ftdi_sio 1-7:1.0: FTDI USB Serial Device converter detected
[ 42.163750] usb 1-7: Detected FT232H
[ 42.176688] xdma:probe_one: pdev 0x00000000deb1791b, err -22.
[ 42.176693] xdma:xpdev_free: xpdev 0x000000004a4d682d, destroy_interfaces, xdev 0x0000000000000000.
[ 42.176695] xdma:xpdev_free: xpdev 0x000000004a4d682d, xdev 0x0000000000000000 xdma_device_close.
[ 42.176697] xdma: probe of 0000:01:00.0 failed with error -22
[ 42.177240] pci 0000:01:00.0: AER: can't recover (no error_detected callback)
[ 42.183157] usb 1-7: FTDI USB Serial Device converter now attached to ttyUSB0
[ 42.316683] pcieport 0000:00:1b.0: AER: device recovery failed
[ 118.432521] ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0
[ 118.432534] ftdi_sio 1-7:1.0: device disconnected
When ignoring this error and following the guide further, the FireSim toolchain cannot identify the FPGA in `firesim enumeratefpgas` due to the XDMA failure. These are some quirks I applied deviating from the guide:
- initializing the conda .bashrc modifications manually via `conda init bash` because it never asks
- setting `default_hw_config` to `nitefury_firesim_rocket_singlecore_no_nic` because it is not generated this way by `firesim managerinit --platform rhsresearch_nitefury_ii`
A snippet of the FireSim enumeratefpgas output (full is
here):
2023-08-29 14:02:56,191 [flush ] [DEBUG] [localhost] out:
2023-08-29 14:02:56,191 [flush ] [DEBUG] [localhost] out: stderr:
2023-08-29 14:02:56,191 [flush ] [DEBUG] [localhost] out:
2023-08-29 14:02:56,191 [flush ] [DEBUG] [localhost] out: :INFO: Disconnecting BDF: 01:00.0
2023-08-29 14:02:56,227 [flush ] [DEBUG] [localhost] out: :INFO: Programming Digilent/210249B8751C with /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/rhsresearch_nitefury_ii/firesim.bit
2023-08-29 14:03:03,665 [flush ] [DEBUG] [localhost] out: :INFO: Reconnecting BDF: 01:00.0
2023-08-29 14:03:03,794 [flush ] [DEBUG] [localhost] out: :INFO: Running check fingerprint driver call with 01:00.0
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: :WARNING: Running the driver failed...
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: :DEBUG: bdf: 01:00.0 bus_id: 01
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: stdout:
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: Domain ID not specified. Assuming Domain ID 0
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: Device ID not specified. Assuming Device ID 0
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: Function ID not specified. Assuming Function ID 0
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: BAR ID not specified. Assuming BAR ID 0
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: PCI Vendor ID not specified. Assuming PCI Vendor ID 0x10ee
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: PCI Device ID not specified. Assuming PCI Device ID 0x903f
2023-08-29 14:03:03,906 [flush ] [DEBUG] [localhost] out: FireSim-rhsresearch_nitefury_ii: /home/wenzel/firesim/sim/midas/src/main/cc/simif_rhsresearch_nitefury_ii.cc:268: void simif_xilinx_alveo_u250_t::fpga_setup(uint16_t, uint8_t, uint8_t, uint8_t, uint8_t, uint16_t, uint16_t): Assertion `xdma_id != -1' failed.
2023-08-29 14:03:03,907 [flush ] [DEBUG] [localhost] out:
2023-08-29 14:03:03,907 [flush ] [DEBUG] [localhost] out: stderr:
2023-08-29 14:03:03,907 [flush ] [DEBUG] [localhost] out:
2023-08-29 14:03:03,907 [flush ] [DEBUG] [localhost] out: :ERROR: Unable to determine BDF for Digilent/210249B8751C FPGA. Something went wrong
2023-08-29 14:03:03,911 [flush ] [DEBUG] [localhost] out:
2023-08-29 14:03:03,913 [flush ] [INFO ] Fatal error: run() received nonzero return code 1 while executing!
2023-08-29 14:03:03,913 [flush ] [INFO ] Requested: ./scripts/generate-fpga-db.py --bitstream /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/rhsresearch_nitefury_ii/firesim.bit --driver /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-rhsresearch_nitefury_ii --out-db-json /opt/firesim-db.json
2023-08-29 14:03:03,913 [flush ] [INFO ] Executed: /bin/bash -l -c "cd /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging >/dev/null && ./scripts/generate-fpga-db.py --bitstream /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/rhsresearch_nitefury_ii/firesim.bit --driver /home/wenzel/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-rhsresearch_nitefury_ii --out-db-json /opt/firesim-db.json"
2023-08-29 14:03:03,913 [flush ] [INFO ] Aborting.
2023-08-29 14:03:03,932 [flush ] [INFO ] Fatal error: One or more hosts failed while executing task 'enumerate_fpgas_node_wrapper'
2023-08-29 14:03:03,932 [flush ] [INFO ] Aborting.
2023-08-29 14:03:03,936 [<module> ] [ERROR] Fatal error.
If I understood the issue right, it seems to be a bug in the XDMA driver or the XDMA IP as used by the precompiled bitstream, as also discussed
here. When enabling the debug statements in the XDMA driver, the read register values for both bars are all 0xF, indicating that the IP block was not found on the AXI lite bus.
Has anyone an idea how to fix this or under which conditions the guide works?
Thanks,
Wenzel
--
Wenzel Pünter
Doctoral Student
CTI Lab, HPI