VCU118 sim hang

118 views
Skip to first unread message

Connor Sullivan

unread,
Oct 5, 2023, 1:54:39 PM10/5/23
to FireSim
Looking for some suggestions as to where I might start solving a hanging simulation.

For reference, the configuration I am running is a quad-core config where the cores are small boom cores. I also have a 2 bank, 1MB, 8 way llc.

I am running a workload that is based on the base-fedora workload. The sim successfully boots and seems to be running fine, but when I start running processes the sim hangs after a certain point. The processes are simply allocating memory and then writing into those memory locations. I am assigning the processes to specific cores. The system hangs when I run two of these traffic generating processes and never recovers.

The heartbeat.csv is still updating during the hangs. Any suggestions?

Best,
Connor Sullivan
Message has been deleted

jin yuan

unread,
Oct 5, 2023, 3:53:30 PM10/5/23
to FireSim
Hi Connor,

I am sorry I have no idea about your question. I am writing here for my own question cause I am trying to run the vcu118 xdma flow, and I have some questions. I want to know how the code under the firesim/platforms/xilinx_vcu118 is used. Is it used when we we run the command "firesim buildbitstream"? Since I try to port the xdma flow for vcu118 to vck190, I need to check the code of vcu118. However, I find that inside the file cl_firesim.sv file, a F1shim module is instantiated, but I couldn't find this module. Also some xdc files are also missed. So I doubt these files are generated while running buildbitstream command. Another question would be if the execution of "firesim buildbitstream" needs root privilege. 

Best,
Jin

Connor Sullivan

unread,
Oct 6, 2023, 11:57:14 AM10/6/23
to FireSim
Adding some more info. Maybe someone could reproduce my issue using this.

This is the specific config I am simulating. I'm not sure if anyone has any insight based on this, since I am only using default fragments.
config.png

I've attached the code that I am running that is causing the simulation to hang. When running two of these processes (on different cores) the output from the serial console stops, but the heartbeat continues to update. I am specifically running the below sequence. Once the second process starts, the simulation hangs.
./BkPLL -c 0 -m 64 -a write -i 99999999 -l 7 &
./BkPLL -c 1 -m 64 -a write -i 99999999 -l 7 &

Still working on debugging this. Unfortunately, I am having issues with metasimulation not booting linux correctly so haven't been able to pursue much debugging.
BkPLL.cpp

Connor Sullivan

unread,
Oct 9, 2023, 11:58:21 AM10/9/23
to FireSim
Another small update. I tried using rocket cores instead of boom cores. When using rocket cores with the same cache configuration, we don't see the same hanging issue. So it seems to have something to do with boom cores.

Connor Sullivan

unread,
Oct 10, 2023, 4:25:45 PM10/10/23
to FireSim
Providing some more information about what exactly the workload that causes the sim hang is doing. BkPLL generates memory requests that target a specific bank of the cache, with the goal of creating memory contention, slowing down processes on other cores. To generate enough traffic, multiple BkPLLs need to be run, and the system needs to be out of order as we rely on memory level parallelism to generate enough requests to cause contention (this is why we need to test on Boom cores even though we don't see hanging simulation on rocket). We've tested this on real silicon, but are now trying to test a hardware solution in FireSim.

I've confirmed that the heartbeat is still updating, but the stack trace halts.
Reply all
Reply to author
Forward
0 new messages