Hi, all,
I tried to run firesim on u250. but failed to run firesim enumeratefpgas.
lspci looks like this:
sudo lspci -vvv -d 10ee:903f
01:00.0 Serial controller: Xilinx Corporation Device 903f (prog-if 01 [16450])
Subsystem: Xilinx Corporation Device 0007
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 127
Region 0: Memory at 42000000 (32-bit, non-prefetchable) [size=32M]
Region 1: Memory at 44000000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee40000 Data: 0021
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range BC, TimeoutDis+, NROPrPrP-, LTR-
10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, TPHComp-, ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [1c0 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Kernel driver in use: xdma
Kernel modules: xdma
the config_hwdb.yaml and config_build_recipes generated by firesim managerinit --platform xilinx_alveo_u250
doesn't contain the settings for alveo_u250_firesim_rocket_singlecore_no_nic, So I replaced the files using the configs from firesim_staging. after I run firesim enumeratefpgas, I got
2024-10-31 11:52:18,377 [flush ] [DEBUG] [yu@] run: /usr/local/bin/firesim-generate-fpga-db.py --bitstream /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit --driver /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-xilinx_alveo_u250 --out-db-json /opt/firesim-db.json
2024-10-31 11:52:18,473 [flush ] [DEBUG] [yu@] out: :INFO: Running: ['/usr/bin/sudo', '/usr/local/bin/firesim-generate-fpga-db.py', '--bitstream', '/home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit', '--driver', '/home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-xilinx_alveo_u250', '--out-db-json', '/opt/firesim-db.json', '--vivado-bin', '/tools/Xilinx/Vivado_Lab/2023.1/bin/vivado_lab', '--hw-server-bin', '/tools/Xilinx/Vivado_Lab/2023.1/bin/hw_server']
2024-10-31 11:52:18,503 [flush ] [DEBUG] [yu@] out: :INFO: This script expects that all Xilinx XDMA-enabled FPGAs are programmed with the same --bitstream arg. by default (through an MCS file for bistream file)
2024-10-31 11:53:19,747 [flush ] [DEBUG] [yu@] out: :INFO: Disconnecting BDF: 01:00.0
2024-10-31 11:53:22,852 [flush ] [DEBUG] [yu@] out: :INFO: Programming Xilinx/21330836802NA with /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit
2024-10-31 11:53:54,819 [flush ] [DEBUG] [yu@] out: :INFO: Reconnecting BDF: 01:00.0
2024-10-31 11:53:56,834 [flush ] [DEBUG] [yu@] out: Traceback (most recent call last):
2024-10-31 11:53:56,834 [flush ] [DEBUG] [yu@] out: File "/usr/local/bin/firesim-fpga-util.py", line 206, in <module>
2024-10-31 11:53:56,840 [flush ] [DEBUG] [yu@] out: sys.exit(main(sys.argv[1:]))
2024-10-31 11:53:56,840 [flush ] [DEBUG] [yu@] out: File "/usr/local/bin/firesim-fpga-util.py", line 199, in main
2024-10-31 11:53:56,840 [flush ] [DEBUG] [yu@] out: reconnect_bus_id(bus_id)
2024-10-31 11:53:56,840 [flush ] [DEBUG] [yu@] out: File "/usr/local/bin/firesim-fpga-util.py", line 144, in reconnect_bus_id
2024-10-31 11:53:56,840 [flush ] [DEBUG] [yu@] out: assert pcielib.any_device_exists(bus_id), f"{bus_id} not visible. Check for proper rescan."
2024-10-31 11:53:56,840 [flush ] [DEBUG] [yu@] out: AssertionError: 01 not visible. Check for proper rescan.
2024-10-31 11:53:56,863 [flush ] [DEBUG] [yu@] out: :ERROR: It failed with stdout: :INFO: Writing to /sys/bus/pci/rescan: 1
2024-10-31 11:53:56,863 [flush ] [DEBUG] [yu@] out: stderr:
2024-10-31 11:53:56,871 [flush ] [DEBUG] [yu@] out:
2024-10-31 11:53:56,877 [flush ] [INFO ] Fatal error: run() received nonzero return code 1 while executing!
2024-10-31 11:53:56,878 [flush ] [INFO ] Requested: /usr/local/bin/firesim-generate-fpga-db.py --bitstream /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit --driver /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-xilinx_alveo_u250 --out-db-json /opt/firesim-db.json
2024-10-31 11:53:56,878 [flush ] [INFO ] Executed: /bin/bash -l -c "cd /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging >/dev/null && /usr/local/bin/firesim-generate-fpga-db.py --bitstream /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit --driver /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-xilinx_alveo_u250 --out-db-json /opt/firesim-db.json"
2024-10-31 11:53:56,878 [flush ] [INFO ] Aborting.
2024-10-31 11:53:56,894 [flush ] [INFO ] Fatal error: One or more hosts failed while executing task 'enumerate_fpgas_node_wrapper'
2024-10-31 11:53:56,894 [flush ] [INFO ] Aborting.
2024-10-31 11:53:56,911 [<module> ] [ERROR] Fatal error.
Traceback (most recent call last):
File "/data/yu/chipyard/sims/firesim/deploy/firesim", line 530, in <module>
main(args)
File "/data/yu/chipyard/sims/firesim/deploy/firesim", line 469, in main
t['task'](t['config'](args))
File "/data/yu/chipyard/sims/firesim/deploy/firesim", line 324, in enumeratefpgas
runtime_conf.enumerate_fpgas()
File "/data/yu/chipyard/sims/firesim/deploy/runtools/runtime_config.py", line 1230, in enumerate_fpgas
self.firesim_topology_with_passes.enumerate_fpgas_passes(
File "/data/yu/chipyard/sims/firesim/deploy/runtools/firesim_topology_with_passes.py", line 675, in enumerate_fpgas_passes
execute(
File "/data/yu/chipyard/.conda-env/lib/python3.10/site-packages/fabric/tasks.py", line 392, in execute
error(err)
File "/data/yu/chipyard/.conda-env/lib/python3.10/site-packages/fabric/utils.py", line 357, in error
return func(message)
File "/data/yu/chipyard/.conda-env/lib/python3.10/site-packages/fabric/utils.py", line 65, in abort
raise e
SystemExit: 1
Then sudo lspci -vvv -d 10ee:903f doesn't have any output. After a warm reboot the sudo lspci -vvv -d 10ee:903f becomes normal.I tried other configs but always end up the same. The programming seems to be fun, but the reconnection failed. maybe due to the incorrect generation of the bitsteam. not sure if it's ok to replace the config files from firesim-staging.
cheers,
yu