gatecount of RiscV implementation

Kristoff

unread,

Nov 14, 2016, 12:26:14 PM11/14/16

to RISC-V HW Dev

Hi all,

I a very new in chip hardware design, so please excuse what is perhapsa
stupid question.

I was reading about riscv in this post on hackaday:
https://hackaday.com/2016/10/10/the-journey-toward-a-completely-open-microcontroller/

In the comments-section, there was an interesting remark:
"one of the major reasons why Cortex M0(+) is competitive is because of
the low gate count, with the CPU only having 12K NAND2 equivalent gates
(roughly 24K transistors)."

I read that the use of "gate count" to quantity a processor (especially
as the development riscv has just started) is "not undisputed" (hum),
but does anybody have any idea how many gates an implementation of (say)
a microcontroller-oriented implementation of a RV32I-cpu would be.
(just to have an idea).

BTW. I was really surprised how few transistors are needed to implement
a complete CPU.

Kristoff

Michael Gautschi

unread,

Nov 14, 2016, 12:35:19 PM11/14/16

to hw-...@groups.riscv.org

Hi Kristoff

GE are usually used in order to have a technology independent measure.
e.g. it allows to compare 65nm implementations with 90nm
implementations. However, it is easy to do mistakes in these conversions
(some people don't even use NAND2 gates for example..). I'd advice
everyone to state the area consumption of a design in kGE and mm^2
(+information about the technology).

the size of a RV32I core architecture also depends on several things
such as pipeline stages, branch predictions. I'd say you can expect an
area of 10-30kGE for such a configuration. so if you go for a flat
pipeline as in the M0 you will require more or less the same area.

best

michael

--
-----------------------------------------------------------------------
Michael Gautschi phone: +41 44 632 99 58
Integrated Systems Laboratory fax: +41 44 632 11 94
ETZ J69.2 e-mail: gaut...@iis.ee.ethz.ch
Gloriastr. 35
ETH Zentrum CH-8092 Zurich
-----------------------------------------------------------------------

Ghada Dessouky

unread,

Nov 14, 2016, 1:57:15 PM11/14/16

to Michael Gautschi, hw-...@groups.riscv.org

I am synthesizing some additional hardware units and want to compare their size with Pulpino - these slides say the total is 500,000 GE for the Pulpino. If I am synthesizing on Synopsys using their standard gtech library, can I make a straightforward comparison of the number of cells I get with this figure? What should I synthesize the hardware into for a fair comparison with the Pulpino 500 kGE?

Regards,

Ghada Dessouky

--
You received this message because you are subscribed to the Google Groups "RISC-V HW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+un...@groups.riscv.org.
To post to this group, send email to hw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/5829F5D1.9060709%40iis.ee.ethz.ch.

Michael Gautschi

unread,

Nov 14, 2016, 5:38:24 PM11/14/16

to Ghada Dessouky, hw-...@groups.riscv.org

the 500kGE are the whole chip (pulpino with memories). we can obviously not release the memory macros, so if you want a fair comparison you should replace the generic memory models for the instruction and data memories with macro cells.

if you only want to compare the core, this would be around 50kGE in the newest release

best,
michael

Kristoff

unread,

Nov 15, 2016, 2:12:52 AM11/15/16

to hw-...@groups.riscv.org

Hallo Michael,

On 14-11-16 18:35, Michael Gautschi wrote:

> Hi Kristoff
>
> GE are usually used in order to have a technology independent measure.
> e.g. it allows to compare 65nm implementations with 90nm implementations.
> However, it is easy to do mistakes in these conversions (some people
> don't even use NAND2 gates for example..). I'd advice everyone to
> state the area consumption of a design in kGE and mm^2 (+information
> about the technology).
>

I have absolutely no knowledge of chipdesign (just starting to learn
VHDL now), so thank you for replying to my "newbie" question.

Does this mean that 65nm technology isn't just a scaled-down version of
90 nm? Is the number of actual transistors (or components) to implement
a NAND2 gate different from one technology to another?

If you can point me to an introductionary document about that, that
would be great! :-)

How does a full asic design then compair to a fpga? I guess a asic can
implement a design much more efficient. How much less
space/GEs/transistors does a asic take compaired to an implementation on
a general-purpose commercial-grade fpga?

(I do understand that a fpga is internally completely different so there
are probably many different factors in this, but it would be nice if
there is a "rule of thumb" about this, just to have an idea)

> the size of a RV32I core architecture also depends on several things
> such as pipeline stages, branch predictions. I'd say you can expect an
> area of 10-30kGE for such a configuration. so if you go for a flat
> pipeline as in the M0 you will require more or less the same area.

OK, makes sence.

Thanks!

> best
> michael
Cheerio!
Kr. Bonne.

Michael Gautschi

unread,

Nov 15, 2016, 2:53:44 AM11/15/16

to Ghada Dessouky, hw-...@groups.riscv.org

From my co worker frank (expert on these things)

Hello,

Please note that the GTECH library does not really have comparative ‘area’ info. If you do not have access to a technology library, I suggest to use the NANGATE library, which is freely available for academic use:
http://projects.si2.org/openeda.si2.org/projects/nangatelib

As in most cases, the best comparison would be by running the original PULPino core against your additions/modifications directly in the same technology.

We use GE to have only a rough comparison (i.e. 10kGE, 100kGE), it is not possible to compare two different designs between different technologies using GE reliably. Also note that synthesis results will vary (due to heuristics and used constraints). In the end, what really will count is the area of the circuit. If you have the full code (like you do in Pulpino) it is really easy to run back to back comparisons, this is also one of the main motivations for us to make the code available.

Cheers,
KGF

--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Maik Merten

unread,

Nov 15, 2016, 3:35:55 AM11/15/16

to hw-...@groups.riscv.org

Hi there!

Am 15.11.2016 um 08:12 schrieb Kristoff:
> Does this mean that 65nm technology isn't just a scaled-down version of
> 90 nm? Is the number of actual transistors (or components) to implement
> a NAND2 gate different from one technology to another?

Disclaimer: I never had the pleasure of working on a chip design, only
some hobbyist interest. My view on things might be wildly inaccurate
(corrections welcome!).

Given that you'd most likely synthesize many parts of a chip design from
a high-level description there's a high chance you will end up with
different gate counts when synthesizing the very same design on
different manufacturing nodes (e.g., 90nm and 65nm). This is because the
basic circuit elements will have different electrical characteristics
across nodes, so the synthesis process may end up producing different
circuits to satisfy the given constraints.

> How does a full asic design then compair to a fpga? I guess a asic can
> implement a design much more efficient. How much less
> space/GEs/transistors does a asic take compaired to an implementation on
> a general-purpose commercial-grade fpga?

There has been some research on this:

http://www.eecg.toronto.edu/~jayar/pubs/kuon/kuontcad06.pdf

"The area required to implement these circuits in
FPGAs compared to standard-cell ASICs is on average a factor
of 35 times larger, with the different designs covering a range
from 17 to 54 times."

"Table IV shows that, for circuits with logic only, the average
FPGA circuit is 3.4 times slower than the ASIC implementa-
tion."

"The results indicate that on average FPGAs consume
14 times more dynamic power than ASICs when the circuits
contain only logic. "

Of course, these gaps can be narrowed when making good use of dedicated
hardware blocks on FPGA. For instance, it's very costly to synthesize
RAM from reconfigurable FPGA logic elements, which is why FPGAs usually
include "Block RAM", which is "proper" SRAM and performs accordingly.
Other dedicated blocks may include, e.g., DSP functions - using those
instead of synthesizing DSP logic from reconfigurable resources is much
more effective. Tables II to VI in the paper provide numbers for
different usages of dedicated hardware blocks.

Best regards,

Maik

Ghada Dessouky

unread,

Nov 18, 2016, 8:34:42 AM11/18/16

to Maik Merten, hw-...@groups.riscv.org

Thanks Mike, Michael and your expert coworker :) That nand gate lib is a lot of help!

Regards,

Ghada Dessouky

--
You received this message because you are subscribed to the Google Groups "RISC-V HW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+un...@groups.riscv.org.
To post to this group, send email to hw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/bc65551a-85fe-d12f-39c7-73e724db5d10%40googlemail.com.

Stefan O'Rear

unread,

Nov 18, 2016, 11:22:57 AM11/18/16

to Kristoff, RISC-V HW Dev

On Mon, Nov 14, 2016 at 9:26 AM, Kristoff <kris...@skypro.be> wrote:
> BTW. I was really surprised how few transistors are needed to implement a
> complete CPU.

Even RV32I goes quite far beyond "how few are needed to implement".
6502 had 3,510 active transistors (confirmed by the visual6502 reverse
engineering project) (but the transistors/GE is not really comparable
between 1975 NMOS and modern CMOS).

-s

Samuel Falvo II

unread,

Nov 18, 2016, 11:24:56 AM11/18/16

to Stefan O'Rear, Kristoff, RISC-V HW Dev

On Fri, Nov 18, 2016 at 8:22 AM, Stefan O'Rear <sor...@gmail.com> wrote:
> Even RV32I goes quite far beyond "how few are needed to implement".
> 6502 had 3,510 active transistors (confirmed by the visual6502 reverse
> engineering project) (but the transistors/GE is not really comparable
> between 1975 NMOS and modern CMOS).

I might be mistaken, but I believe a modern CMOS 65C02 processor has
somewhere in the vicinity of 4200 transistors. Still exceedingly
tiny. :)

--
Samuel A. Falvo II

Kristoff

unread,

Nov 18, 2016, 12:03:02 PM11/18/16

to Samuel Falvo II, Stefan O'Rear, RISC-V HW Dev

Hi all,

First of all. Thanks for everybody who replied.
This is al very interesting reading!

(inline comments)

On 18-11-16 17:24, Samuel Falvo II wrote:
>> Even RV32I goes quite far beyond "how few are needed to implement".
>> 6502 had 3,510 active transistors (confirmed by the visual6502 reverse
>> engineering project) (but the transistors/GE is not really comparable
>> between 1975 NMOS and modern CMOS).
> I might be mistaken, but I believe a modern CMOS 65C02 processor has
> somewhere in the vicinity of 4200 transistors. Still exceedingly
> tiny. :)

Well, I have been playing with with nrf24le1 devices this week. It's a
nordic semi 2.4 ghz radio-chip (nrf24l01+) plus a 8051 clone.
I don't know what is the GE-count for a mcs51, but I guess it is in the
region of a 6502.

Just wondering. What would be needed to make riscv interesting enough as
an alternative as a onboard cpu in (say) a radio-device.
(or are there already other open-source CPUs (8 bit) that fill that gap).

I don't know what is the licensing-cost of a mcs51 thesedays and how
much that would compair to the actual price of the silicon.

On the other hand, the newer 2.4 Ghz radiochip of nordicSemi do use a 32
bit CPU (a M0+), so there seams to be a migration from 8 bits to 32bits
in that area
Perhaps the riscv is a better match for these chips!

Cheerio! Kr. Bonne

kr...@berkeley.edu

unread,

Nov 19, 2016, 11:44:15 AM11/19/16

to Kristoff, Samuel Falvo II, Stefan O'Rear, RISC-V HW Dev

Very low gate count implementations of almost any ISA are possible.
See the IBM 360 Model 30 for how to build a mainframe with an 8-bit
datapath.

Krste

| --
| You received this message because you are subscribed to the Google Groups "RISC-V HW Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+un...@groups.riscv.org.
| To post to this group, send email to hw-...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.

| To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/d159ecf1-ee4c-370e-9221-86276b9505c5%40skypro.be.

adaptive...@gmail.com

unread,

Nov 19, 2016, 12:03:17 PM11/19/16

to kr...@berkeley.edu, Kristoff, Samuel Falvo II, Stefan O'Rear, RISC-V HW Dev

Hi,

I remember that Pharanx project, 400 GRVI RISC-V soft Processors were on an FPGA, presented @RISC-V 3rd Workshop (you can get slides and watch video from risc-v site), you can calculate how many LUT resources per Tile core, and may consider ASIC gate counts for RISC-V.

Best Regards,
S.Takano

iPadから送信

2016/11/20 1:44、kr...@berkeley.edu のメッセージ:

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/22576.33114.662251.783930%40KAiMac.local.

Aneesh Raveendran

unread,

Nov 19, 2016, 1:07:15 PM11/19/16

to 高野茂幸, kr...@berkeley.edu, Kristoff, Samuel Falvo II, Stefan O'Rear, RISC-V HW Dev

Hi,

We have did a small implementation of RV32-IMFD, Gate count information published in a research paper in IEEE.

http://ieeexplore.ieee.org/document/7593047/

Thank you,

Aneesh Raveendran

> | To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+unsubscribe@groups.riscv.org.

> | To post to this group, send email to hw-...@groups.riscv.org.
> | Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.
> | To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/d159ecf1-ee4c-370e-9221-86276b9505c5%40skypro.be.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V HW Dev" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+unsubscribe@groups.riscv.org.

> To post to this group, send email to hw-...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/22576.33114.662251.783930%40KAiMac.local.

--
You received this message because you are subscribed to the Google Groups "RISC-V HW Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to hw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/9B718E97-F76B-4E21-A6E9-23A4ADFBC5D1%40gmail.com.

--

ANEESH R
9995604009

Jan Gray

unread,

Nov 19, 2016, 1:57:07 PM11/19/16

to adaptive...@gmail.com, kr...@berkeley.edu, Kristoff, Samuel Falvo II, Stefan O'Rear, RISC-V HW Dev

Thanks, S.Takano, for the reference. More information on the GRVI Phalanx accelerator framework and its 400-core KU040 example may be found here: http://fpga.org/grvi-phalanx/ .

GRVI is an FPGA-optimized soft processor. It is configurable and typically implements RV32I minus CSRs, plus multiply instructions from M, plus LR/SC from A. It is hand technology mapped and floor planned for Xilinx 6-LUT FPGAs. The configuration described in the 3rd RISC-V workshop talk uses about 320 LUTs per core and implements a 32b-wide 2- or 3-stage scalar pipeline. (A classic 5-stage pipeline induces more, and larger, multiplexers and such muxes are relatively expensive in LUTs.)

GRVI is optimized to be a parallel processing element, not a standalone MCU, so resources like byte-load-store muxes, 32x32=>64 multipliers, and message-send / NOC interface may be factored out of the core and shared amongst cores in a cluster. For example, in an 8-core/cluster configuration there may be up to four simultaneous loads/stores, up to four multiplies, and up to one 32B message send/cycle.

It is problematic/bogus to map FPGA LUTs to ASIC gates or even to compare one FPGA's LUTs to another FPGA's LEs. But (thanks to RISC-V's relatively clean and simple ISA) surely an ASIC core designed using similar frugal principles would be "tiny" and enable high memory parallelism across a fabric of such cores.

As Krste noted, narrow microarchitectures are possible. An 8b or 16b datapath RV32I is easy to build but some structures like immediate-format-mux are no smaller. I believe a flat 32b datapath achieves best MIPS/LUT in modern FPGAs.

I look forward to someone demonstrating a bit-serial RV32I, perhaps targeting a gate-constrained platform such as Minecraft, photonics, or superconducting logic.

Cheers,
Jan Gray
Gray Research LLC

高野茂幸

unread,

Nov 19, 2016, 6:32:38 PM11/19/16

to Jan Gray, kr...@berkeley.edu, Kristoff, Samuel Falvo II, Stefan O'Rear, RISC-V HW Dev

Hi,

I remember that old FPGA data sheets show an "equivalent gate count", means how many NAND2 gates for logic implementation can be on such the old FPGAs. So, if possible, synthesizing targeting to older FPGAs such as XC4000, and get resource utilization report and consider gate count based on the equivalent gate count number.

And I also remember that Professor Jonathan Rose (Toronto Univ.) have researched implementation gap between FPGA and ASIC in terms of clock, area, and power. The paper is available on his home page. His work was based on a small bench mark logic set but if you consider a tiny RISC-V, the error might be acceptable.

Gray-san,
Thank you for your supplemental comments.
I had fun Pharanx "extreme" project slides at the time (I am in Japan so difficult to attend).

Best Regards,
S.Takano

iPhoneから送信

2016/11/20 3:56、Jan Gray <jsg...@acm.org> のメッセージ:

> --
> You received this message because you are subscribed to the Google Groups "RISC-V HW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+un...@groups.riscv.org.
> To post to this group, send email to hw-...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/000001d24296%24aff74070%240fe5c150%24%40acm.org.

高野茂幸

unread,

Nov 19, 2016, 10:55:17 PM11/19/16

to Jan Gray, kr...@berkeley.edu, Kristoff, Samuel Falvo II, Stefan O'Rear, RISC-V HW Dev

Hi,

I made a graph from XC2000 to Virtex-II (not all of Virtex-II devices).

X-axis, left most plot is the XC2000 and right most plot is the Virtex-II (note this is not on a timeline), and Y-axis shows the number of equivalent number of gates. When you got resource utilization report for your RISC-V synthesis, you can assess a very "raw number" with the utilization percentage multiply by the equivalent gate counts on synthesized one. You can also make same graph by getting all of data sheets from internet.

Best Regards,

S.Takano

2016/11/20 8:32、高野茂幸 <adaptive...@gmail.com> のメール：

Reply all

Reply to author

Forward