clock_uncore_clock and axi4_mem_0_clock in MegaBoom

448 views
Skip to first unread message

Øyvind Harboe

unread,
Jan 8, 2024, 12:23:26 AM1/8/24
to Chipyard
I'm trying to run the MegaBoom configuration through OpenROAD and I have run into a snag with with CTS.

OpenROAD CTS decides not to generate a clock tree for clock_uncore_clock, since
it is routed through MegaBoom and out on axi4_mem_0_clock. The reason why OpenROAD CTS decides not to generate a clock tree is that at the CTS point in the flow, buffers exist in the clock_uncore_clock net and CTS concludes that a clock tree exists/has been created manually.

So something has to change here... but what exactly is it that MegaBoom wants to happen when it routes clock_uncore_clock out on axi4_mem_0_clock and how should this requirement be expressed to OpenROAD?

It would be great to have some pointers as to the motivation for routing clock_uncore_clock out on axi4_mem_0_clock....

My guess, for now, is that for a CPU, the clock insertion latency for a single clock design like MegaBoom isn't a primary concern, clock skew a bit more, but it is more complicated than that less skew is better.

The MegaBoom does need to communicate with the outside world through its axi4_mem_0 interface (DRAM and memory mapped io) and the registers/flipflops that drive/read the axi4_mem_0 interface are at the leaf of the clock_uncore_clock clock tree. 

Hence axi4_mem_0_clock is exported so the axi4_mem pins can have clock insertion latency relative to the leaf of the clock_uncore_clock instead of the root, thus avoiding all sorts of hold and setup issues.

Full discussion on getting OpenROAD to build MegaBoom:

https://github.com/The-OpenROAD-Project/OpenROAD/discussions/4490

Great lecture series on CTS:


Unfortunately this PDF seems to be offline now, but very good high level read:



Jerry Zhao

unread,
Jan 8, 2024, 1:08:50 AM1/8/24
to chip...@googlegroups.com
>  what exactly is it that MegaBoom wants to happen when it routes clock_uncore_clock out on axi4_mem_0_clock

The axi4_mem_0_clock is passed out to synchronize the data signals for simulations of multi-clock systems. In multi-clock systems, the axi4_mem_0_clock would not be tied to clock_uncore.
Normally, you wouldn't run a AXI interface like this through VLSI. This interface only makes sense to realize on FPGA platforms or RTL simulations.

To build a more realistic chip config, I would look at the ChipLikeRocketConfig in the latest chipyard version. This builds a system with interfaces that are sensible for VLSI... a "tapeout ready" design.
You can simply replace the `WithNBigCores(1)` with `WithNMegaBooms(1)` in this config to make a "ChipLikeMegaBoomConfig".


>  buffers exist in the clock_uncore_clock net and CTS concludes that a clock tree exists/has been created manually.

Are these buffers related to the axi4 simulation port? 

-Jerry

--
You received this message because you are subscribed to the Google Groups "Chipyard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/0d0a9efd-448f-4f5a-9df1-ea17d9017aadn%40googlegroups.com.

Øyvind Harboe

unread,
Jan 8, 2024, 1:29:39 AM1/8/24
to Chipyard
Thanks!

See some comments inline below...

On Monday, January 8, 2024 at 7:08:50 AM UTC+1 jerr...@berkeley.edu wrote:
>  what exactly is it that MegaBoom wants to happen when it routes clock_uncore_clock out on axi4_mem_0_clock

The axi4_mem_0_clock is passed out to synchronize the data signals for simulations of multi-clock systems. In multi-clock systems, the axi4_mem_0_clock would not be tied to clock_uncore.
Normally, you wouldn't run a AXI interface like this through VLSI. This interface only makes sense to realize on FPGA platforms or RTL simulations.

To build a more realistic chip config, I would look at the ChipLikeRocketConfig in the latest chipyard version. This builds a system with interfaces that are sensible for VLSI... a "tapeout ready" design.
You can simply replace the `WithNBigCores(1)` with `WithNMegaBooms(1)` in this config to make a "ChipLikeMegaBoomConfig".

I will give it a go...
 
>  buffers exist in the clock_uncore_clock net and CTS concludes that a clock tree exists/has been created manually.

Are these buffers related to the axi4 simulation port? 

Not entirely sure what you are asking. OpenROAD inserts these buffers, prior to CTS, to deal with the long wire across the chip. The net itself appears to be routed straight through.

Screenshot from 2024-01-08 07-26-28.png

Jerry Zhao

unread,
Jan 8, 2024, 1:41:17 AM1/8/24
to chip...@googlegroups.com
> Not entirely sure what you are asking. OpenROAD inserts these buffers, prior to CTS, to deal with the long wire across the chip. The net itself appears to be routed straight through.

Ah ok, I see now. It wasn't clear to me where the buffers were in your original email, but this makes it clear.
Yeah best thing to do would be to PAR the design without the fake AXI-4 interface.

-Jerry


Øyvind Harboe

unread,
Jan 8, 2024, 4:58:49 AM1/8/24
to Chipyard
Yep...

For now, I want to create the Verilog with synflops and deal with the conversion to SRAMs macros on the OpenROAD end. My short term goal is to sort out issues with building something of the scale of MegaBoom with OpenROAD-flow-scripts and Bazel: https://github.com/The-OpenROAD-Project/megaboom


I want to side-step Chipyard's sram_generator (barstools) and get synflops .v files directly. I'm unsure exactly what the precedure is to make that happen, so I've tried to hack the makefiles. See below.

With those hacks, it does get past firtool, so I have Verilog files, but the axi4_mem_0_clock is still there.

module ChipTop(
input reset_io, // @[generators/chipyard/src/main/scala/clocking/ClockBinders.scala:105:24]
clock_uncore, // @[generators/chipyard/src/main/scala/clocking/ClockBinders.scala:113:26]
output axi4_mem_0_clock, // @[generators/chipyard/src/main/scala/iobinders/IOBinders.scala:377:22]

I'm getting errors... but since I already have the Verilog at this point, that isn't a showstopper right now.

$ make tutorial=ChipLikeMegaBoomConfig buildfile
Running with RISCV=/home/oyvind/chipyard/.conda-env/riscv-tools
cat /home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig/chipyard.harness.TestHarness.RocketConfig.top.f | sort -u > /home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig/syn.f
echo /home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig/gen-collateral/chipyard.harness.TestHarness.RocketConfig.top.mems.v >> /home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig/syn.f
mkdir -p /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/
echo "sim.inputs:" > /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/inputs.yml
echo "  input_files:" >> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/inputs.yml
for x in $(cat /home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig/syn.f); do \
echo '    - "'$x'"' >> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/inputs.yml; \
done
echo "  input_files_meta: 'append'" >> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/inputs.yml
echo "synthesis.inputs:" >> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/inputs.yml
echo "  top_module: ChipTop" >> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/inputs.yml
echo "  input_files:" >> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/inputs.yml
for x in $(cat /home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig/syn.f); do \
echo '    - "'$x'"' >> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/inputs.yml; \
done
mkdir -p /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/
echo "vlsi.inputs.sram_parameters: '/home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig/chipyard.harness.TestHarness.RocketConfig.mems.hammer.json'" >> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/sram_generator-input.yml
echo "vlsi.inputs.sram_parameters_meta: [\"transclude\", \"json2list\"]">> /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/sram_generator-input.yml
cd /home/oyvind/chipyard/vlsi &&  ./example-vlsi -e /home/oyvind/chipyard/vlsi/env.yml  -p example-tools.yml  -p example-asap7.yml  -p /home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/sram_generator-input.yml --obj_dir /home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig sram_generator
[<global>] Loading hammer-vlsi libraries and reading settings
Traceback (most recent call last):
  File "/home/oyvind/chipyard/vlsi/./example-vlsi", line 61, in <module>
    ExampleDriver().main()
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/vlsi/cli_driver.py", line 1725, in main
    sys.exit(self.run_main_parsed(vars(parser.parse_args(args))))
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/vlsi/cli_driver.py", line 1617, in run_main_parsed
    driver, errors = self.args_to_driver(args)
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/vlsi/cli_driver.py", line 1376, in args_to_driver
    driver = HammerDriver(options, config)
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/vlsi/driver.py", line 104, in __init__
    self.load_technology()
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/vlsi/driver.py", line 148, in load_technology
    tech_module: str = self.database.get_setting("vlsi.core.technology")
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/config/config_src.py", line 846, in get_setting
    if key not in self.get_config():
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/config/config_src.py", line 802, in get_config
    self.__config_cache = combine_configs(
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/config/config_src.py", line 1125, in combine_configs
    expanded_config_reduce = reduce(update_and_expand_meta, configs, {})  # type: dict
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/config/config_src.py", line 738, in update_and_expand_meta
    meta_func(newdict, setting, meta_dict[setting])
  File "/home/oyvind/chipyard/.conda-env/lib/python3.10/site-packages/hammer/config/config_src.py", line 435, in transclude_action
    with open(value, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/oyvind/chipyard/vlsi/generated-src/chipyard.harness.TestHarness.RocketConfig/chipyard.harness.TestHarness.RocketConfig.mems.hammer.json'
make: *** No rule to make target '/home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/sram_generator-output.json', needed by '/home/oyvind/chipyard/vlsi/build/chipyard.harness.TestHarness.RocketConfig-ChipTop/hammer.d'.  Stop.

My changes:


diff --git a/common.mk b/common.mk
index 3763584f..eef616e9 100644
--- a/common.mk
+++ b/common.mk
@@ -208,13 +208,8 @@ MFC_BASE_LOWERING_OPTIONS ?= emittedLineLength=2048,noAlwaysComb,disallowLocalVa
# hence we remove them manually by using jq before passing them to firtool
$(SFC_LEVEL) $(EXTRA_FIRRTL_OPTIONS) &: $(FIRRTL_FILE)
-ifeq (,$(ENABLE_CUSTOM_FIRRTL_PASS))
- echo $(if $(shell grep "Fixed<" $(FIRRTL_FILE)), low, none) > $(SFC_LEVEL)
+ echo none > $(SFC_LEVEL)
echo "$(EXTRA_BASE_FIRRTL_OPTIONS)" $(if $(shell grep "Fixed<" $(FIRRTL_FILE)), "$(SFC_REPL_SEQ_MEM)",) > $(EXTRA_FIRRTL_OPTIONS)
-else
- echo low > $(SFC_LEVEL)
- echo "$(EXTRA_BASE_FIRRTL_OPTIONS)" "$(SFC_REPL_SEQ_MEM)" > $(EXTRA_FIRRTL_OPTIONS)
-endif
$(MFC_LOWERING_OPTIONS):
mkdir -p $(dir $@)
diff --git a/generators/chipyard/src/main/scala/config/ChipConfigs.scala b/generators/chipyard/src/main/scala/config/ChipConfigs.scala
index ffcb3f77..6c2cfebd 100644
--- a/generators/chipyard/src/main/scala/config/ChipConfigs.scala
+++ b/generators/chipyard/src/main/scala/config/ChipConfigs.scala
@@ -43,6 +43,46 @@ class ChipLikeRocketConfig extends Config(
new chipyard.config.AbstractConfig)
+// A simple config demonstrating how to set up a basic chip in Chipyard
+class ChipLikeMegaBoomConfig extends Config(
+ //==================================
+ // Set up TestHarness
+ //==================================
+ new chipyard.harness.WithAbsoluteFreqHarnessClockInstantiator ++ // use absolute frequencies for simulations in the harness
+ // NOTE: This only simulates properly in VCS
+
+ //==================================
+ // Set up tiles
+ //==================================
+ new freechips.rocketchip.subsystem.WithAsynchronousRocketTiles(depth=8, sync=3) ++ // Add async crossings between RocketTile and uncore
+ new boom.common.WithNMegaBooms(1) ++ // 1 MegaBoom
+
+ //==================================
+ // Set up I/O
+ //==================================
+ new testchipip.serdes.WithSerialTLWidth(4) ++ // 4bit wide Serialized TL interface to minimize IO
+ new testchipip.serdes.WithSerialTLMem(size = (1 << 30) * 4L) ++ // Configure the off-chip memory accessible over serial-tl as backing memory
+ new freechips.rocketchip.subsystem.WithNoMemPort ++ // Remove axi4 mem port
+ new freechips.rocketchip.subsystem.WithNMemoryChannels(1) ++ // 1 memory channel
+
+ //==================================
+ // Set up buses
+ //==================================
+ new testchipip.soc.WithOffchipBusClient(MBUS) ++ // offchip bus connects to MBUS, since the serial-tl needs to provide backing memory
+ new testchipip.soc.WithOffchipBus ++ // attach a offchip bus, since the serial-tl will master some external tilelink memory
+
+ //==================================
+ // Set up clock./reset
+ //==================================
+ new chipyard.clocking.WithPLLSelectorDividerClockGenerator ++ // Use a PLL-based clock selector/divider generator structure
+
+ // Create the uncore clock group
+ new chipyard.clocking.WithClockGroupsCombinedByName(("uncore", Seq("implicit", "sbus", "mbus", "cbus", "system_bus", "fbus", "pbus"), Nil)) ++
+
+ new chipyard.config.AbstractConfig)
+
+
+
class FlatChipTopChipLikeRocketConfig extends Config(
new chipyard.example.WithFlatChipTop ++
new chipyard.ChipLikeRocketConfig)
diff --git a/vlsi/Makefile b/vlsi/Makefile
index 074ec66a..14f73909 100644
--- a/vlsi/Makefile
+++ b/vlsi/Makefile
@@ -29,9 +29,9 @@ SMEMS_CACHE ?= $(tech_dir)/sram-cache.json
SMEMS_HAMMER ?= $(build_dir)/$(long_name).mems.hammer.json
ifdef USE_SRAM_COMPILER
- TOP_MACROCOMPILER_MODE ?= -l $(SMEMS_COMP) --use-compiler -hir $(SMEMS_HAMMER) --mode strict
+ TOP_MACROCOMPILER_MODE ?= -l $(SMEMS_COMP) --use-compiler -hir $(SMEMS_HAMMER) --mode synflops
else
- TOP_MACROCOMPILER_MODE ?= -l $(SMEMS_CACHE) -hir $(SMEMS_HAMMER) --mode strict
+ TOP_MACROCOMPILER_MODE ?= -l $(SMEMS_CACHE) -hir $(SMEMS_HAMMER) --mode synflops
endif
ENV_YML ?= $(vlsi_dir)/env.yml

Øyvind Harboe

unread,
Jan 8, 2024, 6:11:00 AM1/8/24
to Chipyard
Ooops... Wrong command...

 make tutorial=sky130-openroad CONFIG=ChipLikeMegaBoomConfig buildfile

Now the top level doesn't have a memory interface... It has a TileLink SERDES (?) interface with a 4 bit datapath, which presumably handles all memory and peripheral access.

module ChipTop(
input clock, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
reset, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
serial_tl_0_clock, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
output serial_tl_0_bits_in_ready, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
input serial_tl_0_bits_in_valid, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
input [3:0] serial_tl_0_bits_in_bits, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
input serial_tl_0_bits_out_ready, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
output serial_tl_0_bits_out_valid, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
output [3:0] serial_tl_0_bits_out_bits, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
input custom_boot, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
jtag_TCK, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
jtag_TMS, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
jtag_TDI, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
output jtag_TDO, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
uart_0_txd, // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
input uart_0_rxd // @[tools/barstools/iocell/src/main/scala/barstools/iocell/chisel/IOCell.scala:195:23]
);


I'm trying to massage the generated Verilog a bit, but ran into a crash with firtool, issue filed:

Øyvind Harboe

unread,
Jan 8, 2024, 8:07:19 AM1/8/24
to Chipyard
Question regarding serial_tl TileLink SERDES interface: I want to know what to put in the .sdc file...

What is the clock period of serial_tl_0_clock?

What bandwidth can one expect from this tilelink memory interface?

Near as I can tell, it connects to L2.

Harrison Liew

unread,
Jan 31, 2024, 2:18:15 PM1/31/24
to Chipyard
Hi,

The serial TileLink is asynchronous to the rest of the system, so it can be any clock period. The way you wrote your SDC in your other post is sufficient for adding this clock.
But, eventually, if you tape-out the chip, this clock would be coming through and IO Cell, so the clock frequency will be upper bounded by the maximum frequency of the IO cells (10's of MHz).

I will respond to the MacroCompiler error in your other post to keep this thread on-topic.

-Harrison

Øyvind Harboe

unread,
Jan 31, 2024, 2:32:19 PM1/31/24
to chip...@googlegroups.com
Thank you.


You received this message because you are subscribed to a topic in the Google Groups "Chipyard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/chipyard/BXsafsGlhJ0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/db643a1b-ea4f-41f5-84a9-de3e8840c9e0n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages