Hi,
Being active in 100 Gbps telecom stuff and with an FPGA/ASIC background,
the ZPU is a refreshing acquaintance – I do not understand some mechanisms
that maybe you guys can explain.
1/ Most of the ZPU cores can be modified with generics. But how does the
complier know if e.g. a multiplier is present? I’ve looked through the
zillion switches of the zpu-elf-gcc without enlightment. Can it be some
linking step I do not understand? Can it be related to the “Code points 33 to 63
may be emulated by code in vectors 2 through 32” mechanism?
2/ When running e.g. the Dhrystone test the time is measured. But how does the
complier know what clock frequency the core is running at? As far as I can
understand the “-phi” switch informs the compiler about an address map where
to find a timer. I suspect it is related to the crt_io.c file? If so, when is this used?
3/ The phi memory map specifies timers and interrupt registers, which fit with
well with the straight-forward code of the IO-module. But are the “phi”
registers enough for running a simple RTOS? I do not really need perfect timing,
but would like to run multi-threaded to simplify my (multi-channel) programs.
4/ I’m looking for a Cache solution in front of the ram to push the top speed
towards 1GHz J Any pointers to work being done in this area? Also any work
on a stalling / halting the processor? Maybe someone can clearify the
mem_busy input that seems to have somewhat that intent…
Any comments are greatly appreciated!
/ Björn “MrBear” Berglöf
PS. I’m have synthesized the Zealot core into 28nm. It’s really
small! 0.006 mm2 excluding the ram. About 500 MHz, but it’s
my memory w ECC logic that limits the speed.
Hi,
Being active in 100 Gbps telecom stuff and with an FPGA/ASIC background,
the ZPU is a refreshing acquaintance – I do not understand some mechanisms
that maybe you guys can explain. Â Â
1/ Most of the ZPU cores can be modified with generics. But
how does the
    complier know if e.g. a multiplier is present? I’ve looked through the
    zillion switches of the zpu-elf-gcc without enlightment. Can it be some
   linking step I do not understand? Can it be related to the “Code points 33 to 63
    may be emulated by code in vectors 2 through 32” mechanism?
2/ When running e.g. the Dhrystone test the time is measured. But
how does the
    complier know what clock frequency the core is running at? As far as I can
   understand the “-phi” switch informs the compiler about an address map where
   to find a timer. I suspect it is related to the crt_io.c file? If so, when is this used?
3/ The phi memory map specifies timers and interrupt registers, which fit with
     well with the straight-forward code of the IO-module. But are the “phi”
     registers enough for running a simple RTOS? I do not really need perfect timing,
     but would like to run multi-threaded to simplify my (multi-channel) programs.  Â
4/ I’m looking for a Cache solution in front of the ram to push the top speed
    towards 1GHz J Any pointers to work being done in this area? Also any work
   on a stalling / halting the processor? Maybe someone can clearify the
    mem_busy input that seems to have somewhat that intent…   Â
Any comments are greatly appreciated!
    / Björn “MrBear” Berglöf
PS. Â I’m have synthesized the Zealot core into 28nm. It’s really
      small! 0.006 mm2 excluding the ram. About 500 MHz, but it’s
      my memory w ECC logic that limits the speed.  Â
Øyvind wrote:
> So make that 100000 ZPU's where you can fit a single Intel CPU? :-)
> 600mm2/0.006m2 = 100000
Øyvind,
unfortunately the ram is not small, 64kB takes 0.19mm2, so making a 100 000
core chip will make it a 144mm side :-) On the serious side - there are
architectures for putting hundreds of processor on the same chip. This processor
starts 300 000 000 new programs (yes programs, not OPs) each second. (and
yes I'm one of the designers).
http://www.marvell.com/network-processors/technology/data-flow-architecture/
----------------------------------------------------------------------------------
Rick Collins wrote:
> Why does the compiler care what clock speed the
> CPU is running at? It would be up to you to do
> the math of converting your timer values to actual time values.
Rick,
I agree when I write my own code, I easily just read the timer and do the trivial
math. But I noticed that several users seems to run the Dhrystone test (which
includes various time.h stuff) without any reference to compensating for clock speed.
So, how it is supposed to work is still a mystery to me.
----------------------------------------------------------------------------------
Rick Collins wrote:
> If you are implementing this design in an ASIC you may just achieve 1 GHz if you pipeline it
> more. Is there a reason why you don't implement the RAM on the same chip which could speed it up
> tremendously? I am not familiar with the Zealot core, but the original ZPU cores were designed
> for small size and flexibility. They used a lot of clock cycles to accomplish what many designs
> do in fewer cycles. So don't equate a high clock speed with a fast CPU. I expect the machine
> architecture can be optimized to reduce the number of clock cycles used at the expense of
> clock rate and possibly more logic.
Rick,
I am implementing ram within the same chip, and I am not really trying to push the frequency
for performance reasons. It is just very convenient if the uController runs at the same
speed as the rest to avoid any asynch stuff.
I do understand pipelining of the processor, but I believe it easily runs in 1GHz. My problem
is the RAM access time coupled with the ECC logic (necessary at 28nm) needs to be pipelined.
Keeping the top of the stack + a simple instruction cache in a write-through-cache is one way
to handle this. But I need to halt the Zpu whenever I get a cache miss. Hence the mem_busy
question.
----------------------------------------------------------------------------------
Rick Collins wrote:
> But if the clock rate is limited by the memory speed rather than the logic delays you might
> do better with a more complex architecture. At 28 nm you certainly can afford to use a
> few more transistors.
Rick,
I completely agree! I am really not looking for small or high performance. And I'm not
really in love with ZPU either. But it fulfills 3 Must and 1 NiceToHave:
- Must be a real LGPL, no code-compatible clone, for legal reasons.
- Must have open (or commercial) tool chain for C.
- Must have or be able to add GDG or similar debug.
- 32b data path to speed up data copying (a substantial part of the job).
This said, I am VERY open to suggestions of other cores that fulfills the above.
----------------------------------------------------------------------------------
Thanks for comments! And again, any input of my original
questions (below) are greatly appreciated.
/ Björn “MrBear” Berglöf
----------------------------------------------------------------------------------
1/ Most of the ZPU cores can be modified with generics. But
how does the
complier know if e.g. a multiplier is present?
I’ve looked through the
zillion switches of the zpu-elf-gcc without enlightment. Can it be some
linking step I do not understand?
2/ When running e.g. the Dhrystone test the time is measured. But
how does the
complier know what clock frequency the core is running at?
As far as I can
understand the “-phi” switch informs the compiler about an address map where
to find a timer. I suspect it is related to the crt_io.c file? If so, when is this used?
3/ The phi memory map specifies timers and interrupt registers, which fit with
well with the straight-forward code of the IO-module. But
are the “phi”
registers enough for running a simple RTOS?
I do not really need perfect timing,
but would like to run multi-threaded to simplify my (multi-channel) programs.
> --
> You received this message because you are subscribed to the Google Groups "zylin-zpu" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to zylin-zpu+...@googlegroups.com.
> To post to this group, send email to zyli...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/zylin-zpu/55513973.6498320a.31c6.0101SMTPIN_ADDED_MISSING%40gmr-mx.google.com.
> For more options, visit https://groups.google.com/d/optout.
>
> At 05:23 PM 5/11/2015, you wrote:
----------------------------------------------------------------------------------
Rick Collins wrote:
> But if the clock rate is limited by the memory speed rather than the logic delays you might
> do better with a more complex architecture. At 28 nm you certainly can afford to use a
> few more transistors.
Rick,
I completely agree! I am really not looking for small or high performance. And I'm not
really in love with ZPU either. But it fulfills 3 Must and 1 NiceToHave:
- Must be a real LGPL, no code-compatible clone, for legal reasons.
- Must have open (or commercial) tool chain for C.
- Must have or be able to add GDG or similar debug.
- 32b data path to speed up data copying (a substantial part of the job).
This said, I am VERY open to suggestions of other cores that fulfills the above.
----- Original Message ----- From : Rick Collins <gnuar...@arius.com> Sent : Di 12 Mai 2015 18:54:00 CEST To : zyli...@googlegroups.com Cc : Subject : Re: [zylin-zpu] Sw/Hw inter-op ?
At 09:31 AM 5/12/2015, you wrote:
----------------------------------------------------------------------------------
Rick Collins wrote:
> But if the clock rate is limited by the memory speed rather than the logic delays you might
> do better with a more complex architecture. At 28 nm you certainly can afford to use a
> few more transistors.
Rick,
I completely agree! I am really not looking for small or high performance. And I'm not
really in love with ZPU either. But it fulfills 3 Must and 1 NiceToHave:
- Must be a real LGPL, no code-compatible clone, for legal reasons.
- Must have open (or commercial) tool chain for C.
- Must have or be able to add GDG or similar debug.
- 32b data path to speed up data copying (a substantial part of the job).
This said, I am VERY open to suggestions of other cores that fulfills the above.
Ok, I understand better now. To double the clock speed you may need to perform extensive redesign of the architecture. Like I said, I don't recall having looked at the Zealot core, but I did look at the original ZPU. It was designed for minimum size and made heavy reuse of the various components which adds multiplexors slowing the clock. If you consider the micro-ops that need to be implemented and lay out the logic paths for optimum speed it is likely you can achieve a much more streamlined architecture.
I seem to recall that the ZPU is the only soft core that meets your second requirement as well as the first. There are tons of original open cores but last time I looked no one had a C compiler for them. There are also a few cores that duplicate existing commercial ISAs, but you indicate you don't want to deal with the potential issues. But... as I look it up I see there are several, larger cores that *are* GPL'd with C compilers.
http://en.wikipedia.org/wiki/LEON
http://en.wikipedia.org/wiki/OpenSPARC
http://en.wikipedia.org/wiki/OpenRISC
http://en.wikipedia.org/wiki/LatticeMico32
http://en.wikipedia.org/wiki/AEMB
http://en.wikipedia.org/wiki/RISC-V
There are more than one "clean" implementations of the microBlaze and other CPUs which should be ok from a license viewpoint (including the <http://en.wikipedia.org/w/index.php?title=PacoBlaze&action=edit&redlink=1>PacoBlaze) . The RISC-V is all about open source but may be a bigger bite to chew, I think it may be available only as 64 bit. The LatticeMicro32 is fully open source. I don't know how large it is or how fast you might run it.
really in love with ZPU either. But it fulfills 3 Must and 1 NiceToHave:
- Must be a real LGPL, no code-compatible clone, for legal reasons.
- Must have open (or commercial) tool chain for C.
- Must have or be able to add GDG or similar debug.
- 32b data path to speed up data copying (a substantial part of the job).
This said, I am VERY open to suggestions of other cores that fulfills the above.
Thanks guys
for the last couple of days of feedback, its been very helpful!
My current plan is to go with the MinSoc (based on the or1200)
http://opencores.org/project,minsoc and adopt it to my asic.
Pretty much a perfect fit. J
/ MrBear
From: zyli...@googlegroups.com [mailto:zyli...@googlegroups.com] On Behalf Of Hieronymus vanWontz
Sent: Friday, May 15, 2015 12:38
To: zyli...@googlegroups.com
Subject: Re: [zylin-zpu] Sw/Hw inter-op ?
Hi,
--
You received this message because you are subscribed to the Google Groups "zylin-zpu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zylin-zpu+...@googlegroups.com.
To post to this group, send email to zyli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/zylin-zpu/92810ac4-fc7e-4bb6-888a-84b66534d246%40googlegroups.com.