FireSim Support for Local Xilinx FPGA Boards with XDMA PCIe?

1,098 views
Skip to first unread message

Steve Haynal

unread,
Feb 1, 2022, 8:40:08 PM2/1/22
to FireSim
Hi,

I've read several threads of people interested in using FireSim on local VCU118, XC706 or various Alveo cards but never saw that such support had been implemented. What is the latest status of FireSim for local local Xilinx FPGA boards with XDMA PCIe support? Are there any examples to study?

Best Regards,

Steve Haynal

David Metz

unread,
Feb 2, 2022, 3:55:33 AM2/2/22
to fir...@googlegroups.com
Hello Steve,

I have FireSim running on local Alveo U250 cards at NTNU.
The branche we use can be found at https://github.com/EECS-NTNU/firesim/tree/u250 .
We use XIlinx XDMA and MIG IPs for PCIe and memory support.
The biggest issue we encountered were difficulties reconfiguring the PCIe IP while the server is running.
AWS gets around this for the F1 FPGAs by using a static shell and doing partial reconfiguration of the design inside it via PCIe.
The Xilinx XRT shell that comes with Alveo cards is heavily targeted towards the Vitis HLS flow and as far as I can tell does not support the approach to DMA FireSim uses.
The ideal solution would be to create a shell similar to the one used by AWS but this requires some engineering effort.

What we did instead is take inspiration from the xilinx open-nic-shell https://github.com/Xilinx/open-nic-shell#programming-fpga .
While called a shell, the whole FPGA is reflashed when the bitstream is changed. Because this causes the PCIe interface to go down, a script provided by Xilinx is used to temporarily disable errors on the PCIe interface.
Due to changes to the PCIe BAR configuration this approach still requires rebooting the server once when switching from XRT to FireSim, but no reboot when switching between different FireSim images.

On the host side only slight changes are needed, since the AWS F1s also use a XDMA based DMA interface.

This approach is a bit hacky but seems to work well for us for now.
The biggest issues are that root access is required to reflash the whole FPGA and disable PCIe errors, instead of doing partial reconfiguration.
We have some scripts we use to simplify flashing the FPGAs and allow it to be done from non-root users, but those are currently not on public github.
If you have any questions, feel free to ask.

Best regards,

David Metz

--
You received this message because you are subscribed to the Google Groups "FireSim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firesim+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firesim/58032719-3456-44fd-9547-8fab39d5f641n%40googlegroups.com.

David Biancolin

unread,
Feb 2, 2022, 12:53:13 PM2/2/22
to FireSim
Hey Steve and David, 

 For 1.14, we'll have Vitis support worked out + modifications to the firesim manager to support deploying to hosts that aren't AWS. As David points out, it's non-trivial to implement some of the features that need a high bandwidth link to the host CPU ("DMA" in bridge parlance). Anyways, part of this effort resolves around defining a host-abstraction layer such that people like David would have an easier time porting FIreSim to other hosts in the future. I'm hoping we can build a reasonable C library IF that the user can just link against the correct host implementation + chisel shells / vivado templates that would be intuitive to port to a different FPGA. 

I think David's approach is a highly sensible one (and likely higher performance). I started working on a similar shell for U250, but switched to working on Vitis with Abraham Gonzalez, because:
- we wanted access to the whole Alveo Suite of FPGAs
- *most* U250 cloud vendors won't let you flash your own shell
- and truthfully, i wasn't aware of this script David was referring to mask PCIE enumeration problems on reprogramming, and so figured i needed to get PR working

Ultimately, for non-Alveo boards rolling your own shell is unavoidable. I, for instance, will need to build something for VU19P-based devices. David, given that you have something working, I think it would be great if you could upstream your shell once our local-FPGA feature branch is more stable. I think it would really help motivate the design of the host abstraction layer we're trying to build. No pressure. 

- David Biancolin



Message has been deleted

Steve Haynal

unread,
Feb 3, 2022, 1:53:52 PM2/3/22
to FireSim
For some reason my post of this last night was deleted, maybe automatically by Google Groups, so reposting.

Hi David and David,

Thanks for the responses. I was able to generate all the Verilog for Vivado with the NTNU branch. At first I was seeing some FIRRTL errors and I suspected some mismatch with submodule versions. So I created 6 patches for all the changes in the NTNU branch and applied them to a fresh master checkout. They applied cleanly and this build worked. 

It took some effort to get the machine environment right. I ended up using a CentOS 7.7 docker and applied the requirements from the machine-launch-script.sh. I thought this should be close to the AWS AMI. What OS and configuration setup is recommended for a local install?

I'm actually interested in targeting a different card so have been studying the Vivado project for the U250. It looks like you have a XDMA and only connect the AXI4-MM DMA master to a slave on the FireSim side. 64 address bits are used. What is the actual address map and size used in FireSim? The AXI4-Lite port for control appears to be tied off. Is it not used by FireSim? Finally, although there are 4 DRAM masters from FireSim, only one is connected to a 8GB DRAM? Is this all (1 DMA AXI4-MM master, 1 AXI4-MM DRAM slave) that is required for FireSim.

I am familiar with and use a variation of the PCIe management you describe. Instead of a script to disable errors, I remove the PCIe endpoint, make any updates, then rescan the endpoint. This works as long as the BARs don't change, and the host was initially booted with the PCIe endpoint live for enumeration. There is a nice collection of PCIe scripts here:

Best Regards,

Steve Haynal



On Wednesday, February 2, 2022 at 9:53:13 AM UTC-8 David Biancolin wrote:
Hey Steve and David, 

 For 1.14, we'll have Vitis support worked out + modifications to the firesim manager to support deploying to hosts that aren't AWS. As David points out, it's non-trivial to implement some of the features that need a high bandwidth link to the host CPU ("DMA" in bridge parlance). Anyways, part of this effort resolves around defining a host-abstraction layer such that people like David would have an easier time porting FIreSim to other hosts in the future. I'm hoping we can build a reasonable C library IF that the user can just link against the correct host implementation + chisel shells / vivado templates that would be intuitive to port to a different FPGA. 

I think David's approach is a highly sensible one (and likely higher performance). I started working on a similar shell for U250, but switched to working on Vitis with Abraham Gonzalez, because:
- we wanted access to the whole Alveo Suite of FPGAs
- *most* U250 cloud vendors won't let you flash your own shell
- and truthfully, i wasn't aware of this script David was referring to mask PCIE enumeration problems on reprogramming, and so figured i needed to get PR working

Ultimately, for non-Alveo boards rolling your own shell is unavoidable. I, for instance, will need to build something for VU19P-based devices. David, given that you have something working, I think it would be great if you could upstream your shell once our local-FPGA feature branch is more stable. I think it would really help motivate the design of the host abstraction layer we're trying to build. No pressure. 

- David Biancolin


Hello Steve,

I have FireSim running on local Alveo U250 cards at NTNU.
The branche we use can be found at https://github.com/EECS-NTNU/firesim/tree/u250 .
We use XIlinx XDMA and MIG IPs for PCIe and memory support.
The biggest issue we encountered were difficulties reconfiguring the PCIe IP while the server is running.
AWS gets around this for the F1 FPGAs by using a static shell and doing partial reconfiguration of the design inside it via PCIe.
The Xilinx XRT shell that comes with Alveo cards is heavily targeted towards the Vitis HLS flow and as far as I can tell does not support the approach to DMA FireSim uses.
The ideal solution would be to create a shell similar to the one used by AWS but this requires some engineering effort.

What we did instead is take inspiration from the xilinx open-nic-shell https://github.com/Xilinx/open-nic-shell#programming-fpga .
While called a shell, the whole FPGA is reflashed when the bitstream is changed. Because this causes the PCIe interface to go down, a script provided by Xilinx is used to temporarily disable errors on the PCIe interface.
Due to changes to the PCIe BAR configuration this approach still requires rebooting the server once when switching from XRT to FireSim, but no reboot when switching between different FireSim images.

On the host side only slight changes are needed, since the AWS F1s also use a XDMA based DMA interface.

This approach is a bit hacky but seems to work well for us for now.
The biggest issues are that root access is required to reflash the whole FPGA and disable PCIe errors, instead of doing partial reconfiguration.
We have some scripts we use to simplify flashing the FPGAs and allow it to be done from non-root users, but those are currently not on public github.
If you have any questions, feel free to ask.

Best regards,

David Metz

Message has been deleted

David Metz

unread,
Feb 4, 2022, 6:11:35 AM2/4/22
to fir...@googlegroups.com
My message was also deleted when posting it using the groups interface, so I'm retyping and trying again by mail.

Hi Steve and David,

It's cool to see you are planning to support Vitis. I shortly looked into using Vitis but encountered some issues and saw no easy way to interface with DMA.

I'm open to upstream my changes but not sure how much time I will have for this over the coming months.
The build and bitstream generation process is fairly straight forward, but the flashing and running is a bit complicated and might include some platform specific assumptions.

Nice to hear you could get my branch to build.

We are using CentOS 8, since the machines containing the FPGAs are part of a bigger cluster.
The setup, build and run process we used is described at https://github.com/EECS-NTNU/chipyard/wiki/u250_firesim .
We use USB JTAG to flash.

To target a different card you should hopefully only have to adjust the block design.
I configured the XDMA IP to provide the interfaces used by FireSim on the AWS F1 platform, described here https://github.com/aws/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md
The XDMA connects to FireSim through an AXI Master interface for DMA and an AXI Lite Master Interface for control.

As you observed we are currently using only one memory channel.
So far the 16GB provided by it have been enough and it was easier to adjust the design like this when prototyping.
Another advantage of using only one memory channel, is that it is easier to meet timing, since the other channels are attached to different SLRs.

The tied off control interface on the MIG can be used to inspect ECC errors, something I am not interested in at the moment.
As far as I can tell there is no way to disable this interface when ECC ram is used, other than modifying the board files to trick vivado into thinking non-ECC is used, with the risk of messing up memory timing.

The approach you describe for PCIe is the same we use, with the addition of diabeling errors.
After flashing a FireSim bitstream for the first time we have to reboot due to the changed BAR config, but afterwards we can switch between different FireSim bitstreams.

Best regards,
David Metz


Message has been deleted
Message has been deleted

David Metz

unread,
Feb 11, 2022, 12:20:51 PM2/11/22
to fir...@googlegroups.com
Hi Steve,
Most of the traffic (in terms of variety) goes over the AXI Lite interface. Only bridges with high bandwidth requirements, AFAIK TracerV, Dromajo, Print and SimpleNIC, use DMA.
It's entirely possible to run FireSim without DMA when not using those bridges. I have done that before for another proprietary FPGA platform that does not offer a convenient high bandwidth interface.
Best regards,
David

On Tue, Feb 8, 2022 at 4:43 PM Steve Haynal <softerh...@gmail.com> wrote:
Hi David,

Thanks for the helpful pointer to the u250_firesim wiki. Also, thanks for clarifying that you are using 16GB of memory, and it is the AXI-Lite to the MIG which is tied off. Does most traffic go over the AXI4-Lite link to FireSim? What is the the AXI4-MM DMA path used for?

Best Regards,

Steve Haynal
You received this message because you are subscribed to a topic in the Google Groups "FireSim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/firesim/MGc1qAIYgC8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to firesim+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firesim/d46df0a8-be6d-415f-9a0b-41ba757e416dn%40googlegroups.com.

David Metz

unread,
Feb 11, 2022, 12:20:56 PM2/11/22
to fir...@googlegroups.com
Hi David,
I just tried booting into Linux, logging in and then powering off with the default image. This is what I got:

Simulation complete.
*** PASSED *** after 8716718222 cycles

Emulation Performance Summary
------------------------------
Wallclock Time Elapsed: 106.8 s
Host Frequency: 89.999 MHz
Target Cycles Emulated: 8716718222
Effective Target Frequency: 81.585 MHz
FMR: 1.10
Note: The latter three figures are based on the fastest target clock.

I can try out the MMIO latency of the system if you send me a patch. I assume one thing that might affect it is which NUMA node the host application is running on, so I might have to pin it to a specific core.
Best regards,
David Metz


On Tue, Feb 8, 2022 at 4:43 PM David Biancolin <david.b...@gmail.com> wrote:
Hey David, 

What kind of FMRs do you get with your U250 booting linux with the default rocket config? If we sent you a patch to the driver measure MMIO read/write latency would you be willing to run that for us? 

- David 

On Friday, February 4, 2022 at 3:11:35 AM UTC-8 david....@googlemail.com wrote:

Steve Haynal

unread,
Nov 21, 2022, 1:20:13 AM11/21/22
to FireSim
Hi David Metz,
Have you ported your Alveo u250 work to any recent version of Chipyard or FireSim? I have been using Chipyard from about mid January 2022, but a new install fails to setup all submodules properly. Are you still able to setup a new install by following the instructions on the wiki page you shared?
Best Regards,
Steve Haynal


On Friday, February 4, 2022 at 3:11:35 AM UTC-8 david....@googlemail.com wrote:

David Metz

unread,
Nov 21, 2022, 4:24:58 AM11/21/22
to fir...@googlegroups.com
Hi Steve,

I haven't been using this setup recently, but my colleague Björn Gotschall has updated the firesim part to work on 1.14.0.
You can find the branch here: https://github.com/EECS-NTNU/firesim/tree/1.14.0_u250
There is also a branch for 1.15.1 , but that is mostly untested and some things might not work.
As I haven't done a new setup recently I can't comment on the rest of the setup instructions.
Hope that helps.

Best regards,
David Metz

Steve Haynal

unread,
Nov 23, 2022, 12:45:50 PM11/23/22
to FireSim
Hi David,

Thanks for your pointers and help. I was able to use your 1.14.0_u250 branch for my purposes.

Best Regards,

Steve Haynal

Steve Haynal

unread,
Nov 28, 2022, 1:27:26 PM11/28/22
to FireSim
Hi David Metz and Bjorn,

With our build based on your Alveo u250 work but targeting a different FPGA, we are seeing nondeterminism from run to run which David Biancolin says should not occur. I would like to know if you see this same nondeterminism. We are running the linux power off test on the basic RocketChip design:

./FireSim-u250 +permissive +mm_relaxFunctionalModel_0=0 +mm_writeMaxReqs_0=10 +mm_readMaxReqs_0=10 +mm_writeLatency_0=30 +mm_readLatency_0=30 +zero-out-dram +slotid=0 +blkdev0=./poweroffnode.ext2 +permissive-off ./br-base-bin

Near the end of the test, we see the text below in the output. The line "Cycles elapsed" in bold below should have the same value from run to run but we see variation. 

Best Regards,

Steve Haynal




Running sysctl: OK
Starting mdev: OK
Initializing random number generator... [    1.527119] random: dd: uninitialized urandom read (512 bytes read)
done.
Starting network: OK
[    1.669630] random: httpd: uninitialized urandom read (8 bytes read)
[    1.669748] random: httpd: uninitialized urandom read (8 bytes read)
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
Starting dropbear sshd: OK
Cycles elapsed: 3232551491
Time elapsed: 1.897110000 seconds
Powering off immediately.
[    1.924636] reboot: Power down

Simulation complete.
*** PASSED *** after 3291824132 cycles

Emulation Performance Summary
------------------------------
Wallclock Time Elapsed: 49.5 s
Host Frequency: 89.999 MHz
Target Cycles Emulated: 3291824132
Effective Target Frequency: 66.468 MHz
FMR: 1.35


Note: The latter three figures are based on the fastest target clock.

Björn Gottschall

unread,
Nov 28, 2022, 1:41:10 PM11/28/22
to FireSim
Hi Steve,

without testing this on our setup I see already one source of non determinism in your simulation run which is the ext2 disk image which gets mounted and thus changed between each run. Can you test it with a copy of it that is replaced between each run?

Cheers,
Björn

Steve Haynal

unread,
Nov 28, 2022, 2:24:38 PM11/28/22
to FireSim
Hi Björn,

Thanks for catching this! That was the reason for our nondeterminism. If we start with the same ext2 disk image, we do always then see the same test behavior.

Best Regards,

Steve Haynal

Björn Gottschall

unread,
Dec 6, 2022, 3:08:43 AM12/6/22
to FireSim
Just in case you want to use the newest FireSim version: our 1.15.1_u250 branch is now tested and working.

Cheers,
Björn

Steve Haynal

unread,
Dec 6, 2022, 10:32:17 PM12/6/22
to FireSim
Thanks!

Steve

Yang Wang

unread,
Dec 19, 2022, 1:40:34 AM12/19/22
to FireSim
Hi Björn,
Thanks for sharing your implementation on U250. I followed your branches of Firesim and Chipyard.

I'm trying to port Firesim to my local customized XCVU13P boards according to your 1.15.1_u250 branch.

I have tried to adapt BaseF1Config1Mem to my board and connect all 4 DDR4 channels (4GiB for each channel). It seems that all channel works well when I enable the "+zero-out-dram" test.

However, I found that I cannot get any UART0 outputs. I have tried your FireMarshal (https://github.com/EECS-NTNU/FireMarshal) images, base images, and bare metal membench.

uart0.png

I found the host can get clock cycles from PCIe/firesim shell, which indicates the wrapper works well?

clock.jpg

Do you have any advice for me to find the root cause?

Thanks!

Yang

Björn Gottschall

unread,
Dec 19, 2022, 3:18:46 AM12/19/22
to FireSim
I'm not really sure how to tackle this problem. It should also work without zeroing the DRAM and also our FireMarshal images shouldn't be necessary. They are just our base images for the experiments that we do. Are you sure your bitstream is correctly implemented and flashed to the FPGA? One needs to flash the bitstream for every run in our setup at least.

Cheers,
Björn

Yang Wang

unread,
Dec 21, 2022, 7:17:59 AM12/21/22
to FireSim
Hi Björn,
Thanks a lot! According to your advice, I flash the bitstream for each run and collect TracerV on each clock cycle.
trace.jpg
I found that the RISC-V CPU caught in a loop in crt.S, and the core reboots repeatedly.
asm.jpg
I'm still debugging this error.  

Thanks a lot!

David Biancolin

unread,
Dec 21, 2022, 2:25:00 PM12/21/22
to fir...@googlegroups.com
Hey Yang, 

Your issues occur early enough in simulation that you should be able to use metasimulation and get full waveforms from verilator or VCS. I'd encourage you to start with that + some of the simple baremetal binaries. 

There is some documentation for this here: https://docs.fires.im/en/stable/Advanced-Usage/Debugging-in-Software/RTL-Simulation.html?highlight=metasimulation Note, that applies to the last stable revision upstream. 

- David 



You received this message because you are subscribed to a topic in the Google Groups "FireSim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/firesim/MGc1qAIYgC8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to firesim+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firesim/f6340dbf-7c8a-4dbf-997a-fd555c3729e4n%40googlegroups.com.

Yang Wang

unread,
Dec 22, 2022, 12:07:40 AM12/22/22
to FireSim
Hi David,
Thanks for your kind advices, I am trying to setup metasimulation.

Yang

Yang Wang

unread,
Dec 22, 2022, 10:14:45 PM12/22/22
to FireSim
Hi David and Björn,

Here are some updates, I run asm risc-v tests on metasimulation and found that it works on 1.15.1_u250 branch. I have logged the PC and instructions on the verilator.

I compared traceV's log and metasimulation log, and found that the core crashed when it executed the instruction on address 0x0000000080000040.

So, I checked the DRAM read and write on DDR4 channel 0 and found that the core read the wrong data from the 0x40 address.

I guess that is the reason why the core crashed, and the problem is more related to our side block design. I will check it later.

Thanks for all of your help and work on U250!

Yang

Yang Wang

unread,
Dec 24, 2022, 4:14:20 AM12/24/22
to FireSim
Hi all, I fixed my bugs in creating bd scripts, which works on my FPGA. Thanks for your help!

Arash S

unread,
Nov 4, 2024, 2:40:29 PM11/4/24
to FireSim
Hello all,

I have a beginner's question. I wrote a program using VITIS HLS, and I can run it on an XDMA-enabled U280. What do I need to do to run this program on FireSim? Thank you for any help.

la lala

unread,
Mar 5, 2025, 9:48:15 AMMar 5
to FireSim
Hi,all,When I done the step (FPGA Setup->12.Right-click on your FPGA and click “Boot from Configuration Memory Device”.)(https://docs.fires.im/en/latest/Getting-Started-Guides/On-Premises-FPGA-Getting-Started/Initial-Setup/Xilinx-VCU118.html),I encountered the following problem:ERROR: [Labtools 27-2254] Booting from configuration memory device unsuccessful.
What do I need to do to fix this problem? Thank you for any help.

Performing Program Operation...
Program Operation successful.
Performing operation on qspi device 1
Mfg ID : 20   Memory Type : bb   Memory Capacity : 21   Device ID 1 : 0   Device ID 2 : 0
Performing Erase Operation...
Erase Operation successful.
Performing Blank Check Operation...
Blank Check Operation successful. The part is blank.
Performing Program Operation...
Program Operation successful.
INFO: [Labtoolstcl 44-377] Flash programming completed successfully
program_hw_cfgmem: Time (s): cpu = 00:00:19 ; elapsed = 00:12:11 . Memory (MB): peak = 7052.000 ; gain = 47.836 ; free physical = 11395 ; free virtual = 16286
endgroup
boot_hw_device  [lindex [get_hw_devices xcvu9p_0] 0]
INFO: [Labtoolstcl 44-664] Will wait up to 180 seconds for booting to complete.
ERROR: [Labtools 27-2254] Booting from configuration memory device unsuccessful.
boot_hw_device: Time (s): cpu = 00:00:05 ; elapsed = 00:03:01 . Memory (MB): peak = 7052.000 ; gain = 0.000 ; free physical = 11372 ; free virtual = 16274
ERROR: [Common 17-39] 'boot_hw_device' failed due to earlier errors.


Muhammad Ali Akhtar

unread,
Mar 6, 2025, 7:50:05 AMMar 6
to fir...@googlegroups.com
After programming the flash, you should cold boot your machine. 
Cold Boot means:
Shutdown and Power Off (for few seconds) 
Just to be extra safe, I would disconnect the Power Cable. In my experience, some Dell WorkStations retain some charge for a few seconds after disconnecting the power cord. 

Reconnect Power cord, turn on the machine. It should work fine. 



Muhammad Ali Akhtar

Reply all
Reply to author
Forward
0 new messages