Some confusion on tapout a chipyard based design

69 views

Skip to first unread message

Jerry Ho

unread,

Jul 11, 2024, 2:07:03 AMJul 11

to Chipyard

Hello community,

Our team plans to tapout a chipyard based design(with our own accelerators and some modifications for the rocketcore) in months , and we already finished all RTL design, simulated using vcs, prototyping on FPGA. I know there is a hammer flow in chipyard. But since there is another team doing the backend stuff for us(syn, par, lvs,drc, etc.), they have their typical toolchains and hesitate to take a look in hammer. This is our teams first tapout, therefore We actually feel a liitle bit anxious about what will happen along the road. I have some confusion, can anyone in the chipyard team shed some lights:

1, I know we need to substitute the cache SRAM and seperate the harness & top-level verilog files. And I am able to do this following the chipyard documentation. But what about the IOCells? There has to be technology node specific IOCells right? We are using chipyard 1.8.1, and I know there are IOBinders and HarnessBinder, most of the IOBinders in chipyard are just simple gpios. How can I substitute these with the foundry specific IOCells? Is there some examples？The chipyard documentation says there is on-going plan to develop FIRRTL Transforms for this, are there some updates?

2, The sims has its own TestHarness, and fpga prototyping also has its own TestHarness . In terms of a vlsi flow without harmmer, is there a specific test harness to follow? Or the generated chipyard.TestHarness.Top.v and the substituted sram verilog files are only files needed to pass to the backend? I check the vlsi folder, and there isn't any TestHarness.scala, just a bunch of makrfiles and yamls.There is a generated IOCell.v, Is this only for simulation?

3, Is there some special considerations about reset and clock when tapout a design? My thought is that I only have to insert a foundry pll into the design and Make it a diplomatic clock node and pass it to the holistic clockGroup. Is there some special considerations for Reset?

4, Our team decides to use a 28nm tecnology node, we did not decide which foundry yet because the MPW date, Is it possible that our rocket based design reach a 800MHZ clock rate in this tecnology node? Most of the modifications on rocket are on FPU(from 1bit a time to iterative method) and L2 Cache. Is the origin RocketCore able to run 800MHZ under 28nm?

5), It seems that there are VLSI related FIRRTL transforms according to the following representation:

I know how to invoke Top and Harness Spilt, and Replace Memory, they already are documented in the bartools section. But what are Module Promotion, Module Group and I/O Cell Technology Mapping? How to invoke these Transforms?

Thanks, This is actually a long post, and I really appreciated any reply, It's a little bit diffcult for me to persuade other experts in my company to use chisel and chipyard. I hope the tapout will work just ok, Thanks again.

Jerry Zhao

unread,

Jul 11, 2024, 1:35:44 PMJul 11

to chip...@googlegroups.com

There are two approaches to taping out with Chipyard without using Hammer. You can either
a) pass the ChipTop to your VLSI tools as the top of the design
b) write a custom verilog wrapper around ChipTop, and pass that to your VLSI tools as the top of the design.

The advantage of a) is that you can reuse Chipyard's generated TestHarness infrastructure as-is to simulate your design.
The disadvantage of a) is that depending on how your top-level is expected to be set-up, and what your PD flow expects, the ChipTop system may not work for you, or may not provide sufficient flexibility compared to b).

1, I know we need to substitute the cache SRAM and seperate the harness & top-level verilog files. And I am able to do this following the chipyard documentation. But what about the IOCells? There has to be technology node specific IOCells right? We are using chipyard 1.8.1, and I know there are IOBinders and HarnessBinder, most of the IOBinders in chipyard are just simple gpios. How can I substitute these with the foundry specific IOCells? Is there some examples？The chipyard documentation says there is on-going plan to develop FIRRTL Transforms for this, are there some updates?

If doing a) and you have single-bit foundry-provided IOCells, you can use the IOCellKey to replace the GenericIOCells with your custom ones. See how `WithCustomIOCells` works, and how if your foundry IOCells require special connections, extending ChipTop (see example.CustomChipTop) gives you a place to do custom top-level connections.

If doing b), you would just instantiate the IOCells in your custom top-level wrapper.

2, The sims has its own TestHarness, and fpga prototyping also has its own TestHarness . In terms of a vlsi flow without harmmer, is there a specific test harness to follow? Or the generated chipyard.TestHarness.Top.v and the substituted sram verilog files are only files needed to pass to the backend? I check the vlsi folder, and there isn't any TestHarness.scala, just a bunch of makrfiles and yamls.There is a generated IOCell.v, Is this only for simulation?

The TestHarness for simulation with VLSI flows is the same as in RTL sim.

If doing a), you can pass all the TestHarness stuff, along with your synthesized RTL to a post-synthesis simulation tool.
If doing b), you have to adapt the generated TestHarness to match your custom top level verilog wrapper.

3, Is there some special considerations about reset and clock when tapout a design? My thought is that I only have to insert a foundry pll into the design and Make it a diplomatic clock node and pass it to the holistic clockGroup. Is there some special considerations for Reset?

There are no special considerations for reset, it is internally synchronized (See ResetSynchronizer in the clock graph).
See PLLSelectorDividerClockGenerator for an example of how a foundry PLL (FakePLL in the example) is integrated.

4, Our team decides to use a 28nm tecnology node, we did not decide which foundry yet because the MPW date, Is it possible that our rocket based design reach a 800MHZ clock rate in this tecnology node? Most of the modifications on rocket are on FPU(from 1bit a time to iterative method) and L2 Cache. Is the origin RocketCore able to run 800MHZ under 28nm?

It should be able to meet 800 MHz, but the answer really depends on the timing characteristics of the SRAMs you are using, as well as the timing characteristics of your custom FPU.

5), It seems that there are VLSI related FIRRTL transforms according to the following representation:

Module promotion and module grouping were transforms to manipulate the module hierarchy towards a more PD-friendly hierarchy, in cases. However, these transforms were never upstreamed, as they tended to be ad-hoc passes written for specific use cases.

I/O-cell tech mapping was once an idea to use FIRRTL to replace generic IOcells with foundry cells. I think we just decided to do this in Chisel instead.

-Jerry

--
You received this message because you are subscribed to the Google Groups "Chipyard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/3c5e9e76-f350-43b8-9222-df34959ec82bn%40googlegroups.com.

Jerry Ho

unread,

Jul 16, 2024, 5:10:42 AM (11 days ago) Jul 16

to Chipyard

I just replied but it disappeared, so I replied here again.

Thanks for your reply, and sorry for my delayed response, our team is busy negotiating with the physical design team recently.

I have 2 confusions:

As you said: Module promotion and module grouping were transforms to manipulate the module hierarchy towards a more PD-friendly hierarchy, in cases. However, these transforms were never upstreamed, as they tended to be ad-hoc passes written for specific use cases. Is Module promotion and module grouping frequent during PD? If so, are there some examples I can follow, because it seems very diffculty for me to write firrtl transforms.
The chipyard documentation claims that "Much of the design effort that goes into building a chip involves developing optimal floorplans for the instance of the design that is being manufactured. Often this is a highly manual and iterative process which consumes much of the physical designer’s time. This cost becomes increasingly apparent as the parameterization space grows rapidly when using tools like Chisel- cycle times are hampered by the human labor that is required to floorplan each instance of
the design. The Hammer team is actively developing methods of improving the agility of floorplanning for generatorbased
designs, like those that use Chisel. The libraries we are developing will emit Hammer IR that can be passed directly to the Hammer tool without the need for human intervention. Stay tuned for more information." So, it seems that there are much jobs need to be done when conducting PD for generator based design comparing with the hand written verilog ones. Is this true? If so, how can I make this procedure easier? Will using hammer greatly reduce this complexity?
You said that the original rocketcore should reach 800MHZ at 28nm, and this greatly depends on the modifications we made to the hardfloat and SRAM we choose. But I am more concerned the PD design process. Will PD have a huge effect on the clock rate when I choose appropriate SRAM marcos and the modifications I made are minor?