Gemmini design doesn't fit on instance target

Maria Jose Belda Beneyto

unread,

Jun 27, 2022, 7:52:29 AM6/27/22

to Chipyard

Hello,

I am trying to generate a design with the Rocket core and the Gemmini accelerator.

I have modified the accelerator config to use FP and a 16x16 systolic array. It also has a 256KB scratchpad and 128KB accumulator registers.

I have seen that in the Berkeley paper about Gemmini they use a similar configuration at 1GHz, but I can't get my config generated.

When I try to create the RTL using FireSim it gives me an error about LUTs overuse.

In my config the target design is z1d.3xlarge, and the frequency is only 40MHz.

Has anyone faced the same problem? How should I proceed to generate a big design?

I attach the log in case you need it.

Thank you very much in advance.

Bests regards,

MªJosé.

2022-06-27--08-40-15-buildafi-ZY5D0G6Y7LMQK037.log

Hasan Genc

unread,

Jul 1, 2022, 2:08:28 AM7/1/22

to chip...@googlegroups.com

Hi,

If you're using floating-point, then the LUT usage will get much higher. It doesn't surprise me that a 16-by-16 config with FP failed to build on Firesim.

In your situation, there's a few things you can try:

You can save a little bit of area by copying options over from the "leanConfig" and setting "has_training_convs" to false.
You can also pipeline the systolic array more aggressively by increasing the "tileLatency" parameter, which may help placement. The "mesh_output_delay" parameter can also be increased for further pipelining. You can play around with "acc_scale_args", which is responsible for performing matrix-scalar multiplications. For example, you can increase its "latency" argument (to make placement easier).
You can try reducing the FPGA frequency even further by adding your own frequencies to this file.
You can try using the "dummyConfig" which gets exactly the same performance as Gemmini's default configuration, but which doesn't instantiate MAC units or SRAMs in order to save area. The dummy config won't return functionally correct matmul/conv results, but the performance numbers will still be correct.

The dummy config assumes 8-bit inputs and 32-bit partial sums. If you want to simulate an FP32 accelerator, then you can change that to assume 32-bit inputs.

Typically, when we built FP configs in the past, we didn't build them with 16-by-16 spatial arrays. I think it might be possible to optimize our FP arithmetic implementations in the future (right now, we just use the HardFloat library in a somewhat naive way).

Regards,

Hasan

--
You received this message because you are subscribed to the Google Groups "Chipyard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/e2b5831b-6d61-440e-9cbd-e496e3575732n%40googlegroups.com.

Majo Belda

unread,

Jul 4, 2022, 4:48:26 AM7/4/22

to Chipyard

Oh, ok. I thought you did because on the Gemmini paper (https://people.eecs.berkeley.edu/~ysshao/assets/papers/genc2021-dac.pdf) the designs were 16-by-16 and you runned nn like MobileNet that is the one I am using.

I need the real functionality, so I am going to try to deploy it on a bigger FPGA, maybe on a Virtex VC707.

Thank you very much anyway!

Regards,

MªJosé.

Hasan Genc

unread,

Jul 12, 2022, 5:44:27 AM7/12/22

to chip...@googlegroups.com

For the paper, we ran the MobileNet program in `software/gemmini-rocc-tests/imagenet/mobilenet.c`, which is quantized to int8. Perhaps that would be sufficient for your purposes?

> Thank you very much anyway!

No problem; thanks for using Gemmini!

Regards,

Hasan

To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/28c6a0ba-0c06-4d28-9f1a-c8c7b29b657fn%40googlegroups.com.

Maria Jose Belda Beneyto

unread,

Jul 12, 2022, 6:39:57 AM7/12/22

to chip...@googlegroups.com

Oh, I thought you used float.

I am trying to compare my performance to a Nvidia Jetson Nano and it only uses float precision.

I am trying to generate the bitstream of a 16-by-16 float Gemmini + Rocket design to map on the Virtex VC707.

Let's see if I'm lucky and get to deploy it!

Regards,

MªJosé.

You received this message because you are subscribed to a topic in the Google Groups "Chipyard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/chipyard/XhxUmhs9Fyk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to chipyard+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chipyard/CAMPrpMAP5bXj7jg%2B4GLn_ZTGWaTHaSLWitdAwrHRCM6Q7ucA3Q%40mail.gmail.com.

Reply all

Reply to author

Forward

Message has been deleted