Running linux workload on custom afi

170 views
Skip to first unread message

Varun Gandhi

unread,
Jan 14, 2021, 1:04:51 PM1/14/21
to chip...@googlegroups.com
Hi,

I’m trying to run the linux workload on a quad medium boom core config. 

Here’s the entry in config_build_recipes.ini; where FireSimLoopbackNICQuadMediumBoom1KBTLBConfig is the config in Targets.config in firechip. 

[firesim-boom-quadcore-nic-1MB-tlb-l2-llc4mb-ddr3]
DESIGN=FireSim
TARGET_CONFIG=FireSimLoopbackNICQuadMediumBoom1KBTLBConfig
PLATFORM_CONFIG=F65MHz_BaseF1Config
instancetype=z1d.2xlarge
deploytriplet=None

But I’m getting the following error at the runworkload step:

Script started, file is uartlog

AFI PCI  Vendor ID: 0x1d0f, Device ID 0xf000
Using xdma write queue: /dev/xdma1_h2c_0
Using xdma read queue: /dev/xdma1_c2h_0
UART0 is here (stdin/stdout).
command line for program 0. argc=29:
+permissive +mm_relaxFunctionalModel_0=0 +mm_writeMaxReqs_0=10 +mm_readMaxReqs_0=10 +mm_writeLatency_0=30 +mm_readLatency_0=30 +slotid
=1 +profile-interval=-1 +macaddr0=00:12:6D:00:00:03 +blkdev0=linux-uniform1-br-base.img +niclog0=niclog0 +blkdev-log0=blkdev-log0 +tra
ce-select=1 +trace-start=0 +trace-end=-1 +trace-output-format=0 +dwarf-file-name=linux-uniform1-br-base-bin-dwarf +autocounter-readrat
e=0 +autocounter-filename=AUTOCOUNTERFILE +drj_dtb=linux-uniform1-br-base-bin.dtb +drj_bin=linux-uniform1-br-base-bin +drj_rom=linux-u
niform1-br-base-bin.rom +print-start=0 +print-end=-1 +linklatency0=6405 +netbw0=200 +shmemportname0=0000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000001 +permissive-off linux-uniform1-br-base-bin
using link latency: 6405 cycles
using netbw: 200
using netburst: 8
Using non-slot-id associated shmemportname:
opening/creating shmem region
/port_nts0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001_0
Using non-slot-id associated shmemportname:
opening/creating shmem region
/port_stn0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001_0
Using non-slot-id associated shmemportname:
opening/creating shmem region
/port_nts0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001_1
Using non-slot-id associated shmemportname:
opening/creating shmem region
/port_stn0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001_1
TraceRV 0: Tracing disabled, since +tracefile was not provided.
TraceRV 1: Tracing disabled, since +tracefile was not provided.
TraceRV 2: Tracing disabled, since +tracefile was not provided.
TraceRV 3: Tracing disabled, since +tracefile was not provided.
random min: 0x0, random max: 0xffffffffffffffff
On init, 915 token slots available on input.
Commencing simulation.
[    0.000000] OF: fdt: No chosen node found, continuing without
[    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[    0.000000] Forcing kernel command line to: console=hvc0 earlycon=sbi
[    0.000000] Linux version 5.7.0-rc3-58539-g5f5fd87b36e2 (cen...@ip-192-168-0-220.ec2.internal) (gcc version 9.2.0 (GCC), GNU ld (GN
U Binutils) 2.32) #1 SMP Mon Jan 11 03:15:22 UTC 2021
[    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[    0.000000] printk: bootconsole [sbi0] enabled
[    0.000000] initrd not found or empty - disabling initrd

nat...@berkeley.edu

unread,
Jan 14, 2021, 6:36:01 PM1/14/21
to Chipyard
Which error are you seeing? The Initrd message is normal (you can see this by running the workload in qemu first using FireMarshal).

Is the output freezing at that point? I have seen issues where the console baud rates don't line up and Linux can't print. I know there was some work a while ago to ensure that the device tree and the Rocket uart were compatible (I can't remember exactly when that happened). It does mean that there are incompatible versions of FireMarshal and Chipyard if they get out of sync (part of why marshal is a submodule).

Was Rocketchip updated independently of FireMarshal? You might double check that your version lines up with the official Chipyard release's version.

Varun Gandhi

unread,
Jan 15, 2021, 12:19:55 AM1/15/21
to chip...@googlegroups.com
The output freezes at this point. 


 It does mean that there are incompatible versions of FireMarshal and Chipyard if they get out of sync (part of why marshal is a submodule). Was Rocketchip updated independently of FireMarshal? You might double check that your version lines up with the official Chipyard release's version.

Unlikely! I cloned the repo a few days back and ran the init scripts for both chipyard and firemarshal as described in the docs.

And, the linux and fedora workloads work fine on pre-built AFIs. 

I wonder if it has something to do with the quad core boom config?

1. Here’s my custom config in BoomConfigs.scala

class LoopbackNICQuadMediumBoom1MBTLBConfig extends Config(
  new chipyard.iobinders.WithUARTAdapter ++
  new chipyard.iobinders.WithTieOffInterrupts ++
  new chipyard.iobinders.WithBlackBoxSimMem ++
  new chipyard.iobinders.WithTiedOffDebug ++
  new chipyard.iobinders.WithSimSerial ++
  new chipyard.iobinders.WithLoopbackNIC ++                        // drive NIC IOs with loopback
  new testchipip.WithTSI ++
  new icenet.WithIceNIC ++
  new chipyard.config.WithBootROM ++
  new chipyard.config.WithUART ++
  new chipyard.config.WithL2TLBs(1024) ++
  new freechips.rocketchip.subsystem.WithNoMMIOPort ++
  new freechips.rocketchip.subsystem.WithNoSlavePort ++
  new freechips.rocketchip.subsystem.WithInclusiveCache ++
  new freechips.rocketchip.subsystem.WithNExtTopInterrupts(0) ++
  new boom.common.WithMediumBooms ++
  new boom.common.WithNBoomCores(4) ++
  new freechips.rocketchip.subsystem.WithCoherentBusTopology ++
  new freechips.rocketchip.system.BaseConfig)

2. TargetConfigs.scala
class FireSimLoopbackNICQuadMediumBoom1MBTLBConfig extends Config(
  new WithDefaultFireSimBridges ++
  new WithDefaultMemModel ++
  new WithFireSimConfigTweaks ++
  new chipyard.LoopbackNICQuadMediumBoom1MBTLBConfig)

3. config_build_recipes.ini

[firesim-boom-quadcore-nic-1MB-tlb-l2-llc4mb-ddr3]
DESIGN=FireSim
TARGET_CONFIG=FireSimLoopbackNICQuadMediumBoom1MBTLBConfig
PLATFORM_CONFIG=F65MHz_BaseF1Config
instancetype=z1d.2xlarge
deploytriplet=None



Best,
Varun


-- 
You received this message because you are subscribed to the Google Groups "Chipyard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/7a80325c-ef15-4fec-89e4-6042f3f911a1n%40googlegroups.com.

Varun Gandhi

unread,
Feb 18, 2021, 2:55:56 PM2/18/21
to chip...@googlegroups.com
Hi Nathan,

Could you please help me understand how to check for incompatibility between chipyard and firemarshall? I’m still getting this issue with the new release. Moreover, fedora runs just fine on the pre-built configs. I only get this issue when I try to run it on a custom AFI.

Best,
Varun

On Jan 14, 2021, at 18:36, nat...@berkeley.edu <nat...@berkeley.edu> wrote:

Alon Amid

unread,
Feb 18, 2021, 5:32:52 PM2/18/21
to chip...@googlegroups.com

Hello Varun

 

If you re-build the default configs (the same ones used for the pre-built configs), does fedora still run on them?

If that’s the case, that would likely indicate that there is a bug in your custom implementation. This can be for a variety of reasons: differences in periphery / device drivers / device tree, changes to the functionality of the core, etc.

If the fedora image runs on the default configs after they have been re-built from scratch (i.e., new agfis that you re-built using your own manager instance), that is a strong hint that the problem is not with a firemarshal version mismatch, but rather with the modified implementation.

 

Alon

Varun Gandhi

unread,
Feb 19, 2021, 11:48:14 AM2/19/21
to chip...@googlegroups.com
Hi Alon,

Thanks for the response! I appreciate it. 

I re-built the single-core large boom config and I was able to run fedora on it.

If that’s the case, that would likely indicate that there is a bug in your custom implementation. This can be for a variety of reasons: differences in periphery / device drivers / device tree, changes to the functionality of the core, etc.


I’m building a simple quad-core medium boom config. As a developer, there Is very little code that I’m adding/modifying to build this config. How do I check for the possible causes you’ve listed above? 

I just added the following code snippets for my custom config:

1. BoomConfigs.scala

class QuadMediumBoomConfig extends Config(                             // quad medium boom config
new boom.common.WithNMediumBooms(4) ++
new chipyard.config.AbstractConfig)


2. Config_build_recipes.ini

[firesim-boom-quadcore-nic-l2-llc4mb-ddr3]
DESIGN=FireSim
TARGET_CONFIG=WithNIC_DDR3FRFCFSLLC4MB_WithDefaultFireSimBridges_WithFireSimConfigTweaks_chipyard.QuadMediumBoomConfig
PLATFORM_CONFIG=F65MHz_BaseF1Config
instancetype=z1d.2xlarge
deploytriplet=None

Is fedora not compatible with medium-boom configs? 

I tried br-base as well but ran into the exact same issue, i.e., the boot sequence freezes at: 

[    0.000000] initrd not found or empty - disabling initrd


Best,
Varun

nat...@berkeley.edu

unread,
Feb 19, 2021, 12:33:17 PM2/19/21
to Chipyard
That's pretty early in the boot process, Linux is just setting up basic memory and stuff (the message about initrd is normal). Since it works on the default config and in QEMU, I doubt it's a software issue.

Varun Gandhi

unread,
Feb 19, 2021, 1:39:08 PM2/19/21
to chip...@googlegroups.com
Hi Nathan,

How would you recommend I try to debug this issue? At this point, I have a limited understanding of the build system, so I’d appreciate your guidance on some basic debugging practices.

Best,
Varun

Varun Gandhi

unread,
Feb 21, 2021, 3:03:42 PM2/21/21
to chip...@googlegroups.com
Hi Alon and Nathan,

I’ve narrowed down the problem — it seems to be an issue with the default medium boom implementation. Small and Large boom configs work just fine. Is there a way I can file a bug report for it to be looked into? Thanks!

Best,
Varun

Alon Amid

unread,
Feb 22, 2021, 2:03:43 AM2/22/21
to chip...@googlegroups.com

Hi Varun

 

If you believe the problem is with the medium boom implementation, please open an issue in the boom github repo: https://github.com/riscv-boom/riscv-boom/issues

Jerry Zhao

unread,
Feb 22, 2021, 2:14:01 AM2/22/21
to chip...@googlegroups.com
You likely need this Linux kernel patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/riscv/mm/init.c?id=21190b74bcf3a36ebab9a715088c29f59877e1f3
We have been procrastinating on bumping the linux kernel in the default firemarshal workloads, as such an endeavor may have unintended consequences on existing workloads.

Reply all
Reply to author
Forward
0 new messages