U250 fail to run firesim enumeratefpgas

33 views
Skip to first unread message

yu cui

unread,
Oct 31, 2024, 12:18:39 AM10/31/24
to FireSim
Hi, all,
I tried to run firesim on u250. but failed to run firesim enumeratefpgas.
lspci looks like this:
sudo lspci -vvv -d 10ee:903f

01:00.0 Serial controller: Xilinx Corporation Device 903f (prog-if 01 [16450])
        Subsystem: Xilinx Corporation Device 0007
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 127
        Region 0: Memory at 42000000 (32-bit, non-prefetchable) [size=32M]
        Region 1: Memory at 44000000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee40000  Data: 0021
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+, NROPrPrP-, LTR-
                         10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [1c0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Kernel driver in use: xdma
        Kernel modules: xdma





the config_hwdb.yaml and config_build_recipes generated by firesim managerinit --platform xilinx_alveo_u250
doesn't contain the settings for alveo_u250_firesim_rocket_singlecore_no_nic, So I replaced the files using the configs from firesim_staging. after I run firesim enumeratefpgas, I got 





2024-10-31 11:52:18,377 [flush       ] [DEBUG]  [yu@] run: /usr/local/bin/firesim-generate-fpga-db.py --bitstream /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit --driver /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-xilinx_alveo_u250 --out-db-json /opt/firesim-db.json
2024-10-31 11:52:18,473 [flush       ] [DEBUG]  [yu@] out: :INFO: Running: ['/usr/bin/sudo', '/usr/local/bin/firesim-generate-fpga-db.py', '--bitstream', '/home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit', '--driver', '/home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-xilinx_alveo_u250', '--out-db-json', '/opt/firesim-db.json', '--vivado-bin', '/tools/Xilinx/Vivado_Lab/2023.1/bin/vivado_lab', '--hw-server-bin', '/tools/Xilinx/Vivado_Lab/2023.1/bin/hw_server']
2024-10-31 11:52:18,503 [flush       ] [DEBUG]  [yu@] out: :INFO: This script expects that all Xilinx XDMA-enabled FPGAs are programmed with the same --bitstream arg. by default (through an MCS file for bistream file)
2024-10-31 11:53:19,747 [flush       ] [DEBUG]  [yu@] out: :INFO: Disconnecting BDF: 01:00.0
2024-10-31 11:53:22,852 [flush       ] [DEBUG]  [yu@] out: :INFO: Programming Xilinx/21330836802NA with /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit
2024-10-31 11:53:54,819 [flush       ] [DEBUG]  [yu@] out: :INFO: Reconnecting BDF: 01:00.0
2024-10-31 11:53:56,834 [flush       ] [DEBUG]  [yu@] out: Traceback (most recent call last):
2024-10-31 11:53:56,834 [flush       ] [DEBUG]  [yu@] out:   File "/usr/local/bin/firesim-fpga-util.py", line 206, in <module>
2024-10-31 11:53:56,840 [flush       ] [DEBUG]  [yu@] out:     sys.exit(main(sys.argv[1:]))
2024-10-31 11:53:56,840 [flush       ] [DEBUG]  [yu@] out:   File "/usr/local/bin/firesim-fpga-util.py", line 199, in main
2024-10-31 11:53:56,840 [flush       ] [DEBUG]  [yu@] out:     reconnect_bus_id(bus_id)
2024-10-31 11:53:56,840 [flush       ] [DEBUG]  [yu@] out:   File "/usr/local/bin/firesim-fpga-util.py", line 144, in reconnect_bus_id
2024-10-31 11:53:56,840 [flush       ] [DEBUG]  [yu@] out:     assert pcielib.any_device_exists(bus_id), f"{bus_id} not visible. Check for proper rescan."
2024-10-31 11:53:56,840 [flush       ] [DEBUG]  [yu@] out: AssertionError: 01 not visible. Check for proper rescan.
2024-10-31 11:53:56,863 [flush       ] [DEBUG]  [yu@] out: :ERROR: It failed with stdout: :INFO: Writing to /sys/bus/pci/rescan: 1
2024-10-31 11:53:56,863 [flush       ] [DEBUG]  [yu@] out:  stderr:
2024-10-31 11:53:56,871 [flush       ] [DEBUG]  [yu@] out:
2024-10-31 11:53:56,877 [flush       ] [INFO ]  Fatal error: run() received nonzero return code 1 while executing!
2024-10-31 11:53:56,878 [flush       ] [INFO ]  Requested: /usr/local/bin/firesim-generate-fpga-db.py --bitstream /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit --driver /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-xilinx_alveo_u250 --out-db-json /opt/firesim-db.json
2024-10-31 11:53:56,878 [flush       ] [INFO ]  Executed: /bin/bash -l -c "cd /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging >/dev/null && /usr/local/bin/firesim-generate-fpga-db.py --bitstream /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/xilinx_alveo_u250/firesim.bit --driver /home/yu/FIRESIM_RUNS_DIR/enumerate_fpgas_staging/FireSim-xilinx_alveo_u250 --out-db-json /opt/firesim-db.json"
2024-10-31 11:53:56,878 [flush       ] [INFO ]  Aborting.
2024-10-31 11:53:56,894 [flush       ] [INFO ]  Fatal error: One or more hosts failed while executing task 'enumerate_fpgas_node_wrapper'
2024-10-31 11:53:56,894 [flush       ] [INFO ]  Aborting.
2024-10-31 11:53:56,911 [<module>    ] [ERROR]  Fatal error.
Traceback (most recent call last):
  File "/data/yu/chipyard/sims/firesim/deploy/firesim", line 530, in <module>
    main(args)
  File "/data/yu/chipyard/sims/firesim/deploy/firesim", line 469, in main
    t['task'](t['config'](args))
  File "/data/yu/chipyard/sims/firesim/deploy/firesim", line 324, in enumeratefpgas
    runtime_conf.enumerate_fpgas()
  File "/data/yu/chipyard/sims/firesim/deploy/runtools/runtime_config.py", line 1230, in enumerate_fpgas
    self.firesim_topology_with_passes.enumerate_fpgas_passes(
  File "/data/yu/chipyard/sims/firesim/deploy/runtools/firesim_topology_with_passes.py", line 675, in enumerate_fpgas_passes
    execute(
  File "/data/yu/chipyard/.conda-env/lib/python3.10/site-packages/fabric/tasks.py", line 392, in execute
    error(err)
  File "/data/yu/chipyard/.conda-env/lib/python3.10/site-packages/fabric/utils.py", line 357, in error
    return func(message)
  File "/data/yu/chipyard/.conda-env/lib/python3.10/site-packages/fabric/utils.py", line 65, in abort
    raise e
SystemExit: 1

Then sudo lspci -vvv -d 10ee:903f doesn't have any output. After a warm reboot the sudo lspci -vvv -d 10ee:903f becomes normal.I tried other configs but always end up the same. The programming  seems to be fun, but the reconnection failed. maybe due to the incorrect generation of the bitsteam. not sure if it's ok to replace the config files from firesim-staging. 
cheers,
yu


Reply all
Reply to author
Forward
0 new messages