Adding Memory Blocks Generated By Foundry Compiler

445 views
Skip to first unread message

Giang Nguyễn

unread,
Feb 24, 2023, 3:52:33 PM2/24/23
to Chipyard
Hi everyone,

I want to tape out a chip using the RTL generated by Chipyard and targeting the 22 FDSOI technology.

I'm trying to incorporate the memory blocks (generated by the foundry compiler) into Chipyard framework. I've also made a custom sram-cache.json to instruct the tool to use the custom memory blocks. 

I have two questions that I would like to ask:

1. I want to verify whether if our generated RTL is compatible with the custom memory blocks. From what I understand, we need to write a simple C program and simulate it on the chip. We also need to manually include the behavioral models (verilog files) in the generated-src folder. However, I'm not sure about the actual details, could you kindly the actual steps to verify the RTL? Is it just simply as typing make sim-rtl?

2. The blocks generated by the memory compiler have quite many ports, and some of them are unused in my design. I have declared them to be connected to either "1" or "0" in the extra-ports section in sram-cache.json. However, when the RTL is generated, these ports are simply absent when the memory modules is instantiated. How can I tell Chipyard to not omit these extra ports? Or should I just make a custom memory wrapper where all the ports are explicitly declared?

Thank you very much. I look forward to your support.

Giang, Nguyen

Harrison Liew

unread,
Feb 25, 2023, 12:49:12 PM2/25/23
to Chipyard
Giang,

Great to hear that you're trying to tape out!

You are correct that you need to make a sram-cache.json file with the memory blocks that you have. It also sounds like you are (or are trying to) use Hammer? I'll give you instructions for a few scenarios:

Using Hammer
  • Assuming you already have a 22FDSOI plugin, you will need to implement the sram_generator tool for your memory library (see the ASAP7 one for an example). This is how you gather all the views (LEFs, LIBs, Verilog models, etc.) for your memories for Hammer.
  • Within Chipyard, set up your Hammer flow to use your tech plugin + tech's sram_generator tool
  • When you make srams,  the sram_generator tool your wrote generates a sram_generator-output.json file
  • This json file gets read in by Hammer so that it knows the memories' Verilog models. You can then run simulation as per the tutorials.
No Hammer
  • You won't be able to use the vlsi Makefile to run your flow. Take the generated RTL out of Chipyard, combine them with your memories' Verilog models as inputs to your simulation flow.
Using Hammer (advanced)
  • Since you have access to a foundry compiler, you can actually provide a sram-compiler.json file instead and turn on the USE_SRAM_COMPILER flag in the vlsi Makefile. This will save you from having to pre-compile a library.
  • sram-compiler.json is different from sram-cache.json because it encodes the supported range of each memory family. This is what the difference in schema looks like: https://github.com/ucb-bar/plsi-mdf/blob/master/macro_format.json. This source code shows the finer difference between an SRAMMacro and SRAMGroup. Notice how fields like depth & width are Ranges in SRAMGroup.
  • The sram_generator tool you write will instead invoke the foundry compiler directly and process its outputs instead of merely pointing to files already generated by the compiler.
Regarding the extra ports being omitted, we haven't seen that happen. Are you specifying them just like this: https://github.com/ucb-bar/hammer/blob/92e98c1a2c46f207cdbf6aeb134c3c6d03c79fd9/hammer/technology/sky130/sram-cache.json#L30-L36? We see that the tie-lows are properly assigned in the generated .mems.v files.

Giang Nguyễn

unread,
Feb 27, 2023, 4:41:18 AM2/27/23
to Chipyard
Hi Harrison,

Thanks a lot for your help.

Please correct me if I'm missing something. From what I understand we need to write an __init__.py for our SRAM Compiler, and we have to modify the script (with the given SRAM parameters) to provoke the foundry compiler to generate the block. After that, it also needs to collect all the information (LEF, GDS, Model path) into a sram_generator-output.json, right?

But up to this point, I have one problem. The flow needs a MEM_CONFIG file (I think it is called: chipyard.TestHarness.{chipConfig}.mems.hammer.json in generated-src), and this file is auto-generated by the VLSI flow. I'm not sure how to specify the this file before running the flow. Is it handled by sram-compiler.json or something else?

Regarding the missing ports, I've cross-checked with the json file and found some small errors. So it's not a problem now. Thanks a lot!

Best,
Giang

Harrison Liew

unread,
Feb 27, 2023, 8:36:27 PM2/27/23
to chip...@googlegroups.com
Giang,

Correct on all fronts regarding the __init__.py.

The .mems.hammer.json file (actually called SMEMS_HAMMER in the vlsi/Makefile) is generated by MacroCompiler when you run it with your sram-compiler.json file. When you run make srams ... this file gets passed by Make to the sram_generator tool you wrote above via an sram_generator-input.json file that gets written (see the SRAM_GENERATOR_CONF variable).

If you still have issues, wipe the OBJ_DIR, run make srams ..., and paste the terminal output so we can help debug.

-Harrison

--
You received this message because you are subscribed to a topic in the Google Groups "Chipyard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/chipyard/-T6-mvQZmOM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/1e47a06f-5f75-4c64-a55a-f624365847den%40googlegroups.com.

Giang Nguyễn

unread,
Feb 28, 2023, 8:01:07 PM2/28/23
to Chipyard
Hi Harrison,

Thanks for pointing that out. It helps a lot!

And I'm actually having some problems when implementing sram_compiler with chipyard. For some reasons, Chipyard cannot compile the given memory blocks.

Here's the error:

barstools.macros.MacroCompilerException: Target memory data_arrays_0_ext could not be compiled and strict mode is activated - aborting

Since it's not really descriptive, so I tried to add some println()'s in the function groupMatchesMask to print memMask and libMask. And here is the output respectively:

[info] List(MacroPort(PolarizedPort(RW0_addr,ActiveHigh),Some(PolarizedPort(RW0_clk,PositiveEdge)),Some(PolarizedPort(RW0_wmode,ActiveHigh)),None,Some(PolarizedPort(RW0_en,ActiveHigh)),Some(PolarizedPort(RW0_rdata,ActiveHigh)),Some(PolarizedPort(RW0_wdata,ActiveHigh)),Some(PolarizedPort(RW0_wmask,ActiveHigh)),Some(8),Some(32),Some(4096)))
[info] List(MacroPort(PolarizedPort(AW,ActiveHigh),Some(PolarizedPort(CLK,PositiveEdge)),Some(PolarizedPort(RDWEN,ActiveLow)),None,Some(PolarizedPort(CEN,ActiveLow)),Some(PolarizedPort(Q,ActiveHigh)),Some(PolarizedPort(D,ActiveHigh)),None,None,None,None))

I'm not sure why my last 4 items from the sram compiler are "None". 

And when I used sram-cache.json instead, the flow runs perfectly fine. The only differences between the two .json files are the data types of the width and depth attributes.

I've included the full terminal output, sram-compiler.json and the generated top.mem.conf. I hope you could have a look into this.


Best,
Giang
chipyardLog.zip

Harrison Liew

unread,
Mar 1, 2023, 6:42:50 PM3/1/23
to chip...@googlegroups.com
Giang,

I believe you've actually identified the issue in the debug prints you made. There is no mask port in the memories you specified in sram-compiler.json, hence the match case in the groupMatchesMask returns false. This would cause this to not match and return false.

I guess the logic for compiler memories is stricter on the presence of mask ports than the logic for cached memory instances, hence why the ones in your sram-cache.json compiled properly. For the cached libraries, there's a use of an effectiveMaskGran field that's not used for the groups matching. I have made an issue: https://github.com/ucb-bar/barstools/issues/128.

To be honest, I'm not sure why the width & depth fields in the libMask you printed return None, however... from what I can tell you've specified it correctly. But, since you can't add a mask port to your foundry compiler, you'll need to map to cached SRAMs going forward. 

-Harrison



Giang Nguyễn

unread,
Mar 24, 2023, 10:55:33 AM3/24/23
to Chipyard
Hi Harrison,

I have successfully implemented the memory cells as you suggested, and everything is going smoothly. I'm really appreciated for your help!

Since then, I have one concern regarding that I'm not really sure about Chipyard. Could you kindly help me with it?

My question is as follows:
  • How can we communicate/test the chip once it is taped out? Or in other words, how can we upload a simple program/fill up the memory of the chip?
I've been reading this section and found that we can achieve such goals by two ways: SerialTL and JTAG. Since this is our first tape-out, we're trying to keep everything as simple as possible; hence we decide to use the TinyRocket. I've checked that it has both JTAG and Serial TL. Given that the tape out is successful and there is no error on our part, we should be able to test the chip with the bringup protocol (VCU118 designs). Is my understanding correct?

Thank you very much, and I look forward to your response.

Best,
Giang

Harrison Liew

unread,
Mar 24, 2023, 3:19:48 PM3/24/23
to chip...@googlegroups.com
Giang,

That's great news!

You are reading the correct documentation for chip bringup, as we have been doing it so far. However, there is a cool new update in v1.9.0: https://github.com/ucb-bar/chipyard/pull/1345 and associated documentation, which would allow you to bypass building a soft core on a VCU118 altogether and simply use an Arty 100T as a shim between UART and TSI. We encourage you to go down this route because it will save a large amount of headache of building the Linux image, etc., as long as you don't need the FPGA to also manage other things like I2C peripherals.

To answer your other questions:
  • In both cases, the DRAM on the FPGA boards serves as main memory for the chip; if you have scratchpad memory on your chip, you will need to write a program that fills that up yourself.
  • Loading the program into the chip's memory is different between the VCU118 and the simplified Arty 100T methods. For the Arty 100T method, it's quite simple - you push the binary directly to the chip over UART. However, for VCU118, you need to write/compile the programs onto an SD card, then run a program in the Linux running on the FPGA to dump the program into DRAM and command the chip to read from it. This program must be specially compiled with a FESVR customized to your chip and soft core's configuration, hence there is no public sources/documentation for this part of the bringup. Let us know if you need this instead of using Arty. 
  • We have not gotten TinyRocket chips to talk successfully with the existing bringup platform, due to some mismatch between the 32-bit core and the 64-bit TL widgets. Instead, we have had to manually bit-bang the serial port. We can put you in touch with the person who built that code, if you really do not want to use a 64-bit core.
Best,
Harrison


Jerry Zhao

unread,
Mar 24, 2023, 3:52:17 PM3/24/23
to chip...@googlegroups.com
To follow up on Harrison's notes...


> However, for VCU118, you need to write/compile the programs onto an SD card
, then run a program in the Linux running on the FPGA to dump the program into DRAM and command the chip to read from it

The intention is to eventually bring up a lightweight VCU118 bringup platform similar to the current Arty100T setup. The Arty100T setup is really a proof of concept. While you can connect a test-chip to the Arty via jumper wires or a custom PMOD interface, really the FMC connector on the VCU118 is much better suited for this. 

Either way, the bringup platform choice --- whether a soft-core-on-VCU118, a lightweight-VCU118, or a lightweight-Arty platform --- shouldn't really affect the design of the test chip too much. It will influence how you design the package and PCB, but package and PCB design are outside the scope of Chipyard at the moment.

-Jerry

You received this message because you are subscribed to the Google Groups "Chipyard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/CAEp5d3yrqysLTooh9Ts4wDPFCySOrGbzFas386h2Wwnsev%2BxEw%40mail.gmail.com.
Message has been deleted

Giang Nguyễn

unread,
Mar 24, 2023, 4:41:08 PM3/24/23
to Chipyard
Hi Harrison, Jerry,

Thanks a lot for your detailed information. It really clears up the confusion on my part!

@Harrison,
Regarding the 64-bit core, if that's the case, I'd think it's best to switch to 64-bit core to avoid some potential problems. What do you think of the the small rocket? Have you been able to successfully connect it with the bringup platform?

Best,
Giang

Harrison Liew

unread,
Mar 24, 2023, 6:18:58 PM3/24/23
to chip...@googlegroups.com
Matt & Giang,

I don't think any of us have taped-out with anything smaller than the default BigCore with shrunken cache sizes. If you're considering the SmallCore (https://github.com/chipsalliance/rocket-chip/blob/e1cdecba0a608f4ed16206ee1f7c187302365419/src/main/scala/subsystem/Configs.scala#L150), the major differences I can tell between small vs. big are: no floating-point unit, no virtual memory, and no set associative (+smaller) caches. Maybe someone else can chime in about whether VM is needed for bringup, but if your core is low performance and doesn't need to do FP calcs, you should theoretically be fine using a small core, as it is still 64-bit.

In any case, you should run simulations with your programs to determine if your ChipTop/TestHarness work together.

-Harrison

Giang Nguyễn

unread,
Mar 24, 2023, 6:51:09 PM3/24/23
to Chipyard
Hi Harrison,

Thanks for your response!

As for now, performance is not really an issue for us at all; we're fine with low performance as our main objectives are to explore the framework and push through our first tape-out successfully. Hence, it's ideal for us to stick with the simplest design at least in our first attempt.

That being said, we'll definitely test out the BigCore with shrunken cache sizes in the near future.

Best,
Giang
Reply all
Reply to author
Forward
0 new messages