DMIPS of Rocket core

756 views
Skip to first unread message

재민김

unread,
Oct 5, 2016, 4:35:44 AM10/5/16
to sw-...@groups.riscv.org

Dear all.
In some presentation of the previous RISC V workshops, I've seen that rocket core obtains somewhat 1.72 DMIPS/MHz.
However, when I run dhrystone included in the basic test suite on Rocket core with default configuration, the output shows that each loop consumes around 459 cycles. Putting in the number of cycles to the equation for calculating DMIPS/MHz, the obtained number is around 1.23, which is 30% lower than the official value.
Should this be the problem of dhrystone program included in the test suite, or is this due to some changes in Rocket chip repository?
Does anybody have any idea?

Thanks.
Jamie Kim.

Christopher Celio

unread,
Oct 5, 2016, 3:56:54 PM10/5/16
to 재민김, sw-...@groups.riscv.org
The dhrystone value reported by the rocket-chip emulator testing is now invalid, for a few reasons.

1) there is a special set of cflags that must be used for a valid dhrystone run. 

2) the riscv-tests are run bare metal, however changes in the past few months including moving the physical, cache able memory region to the higher half of the available memory region. This means that inefficient code sequences are emitted for some static memory accesses. 

3) rocket-chip is no longer directly tethered to the front-end server (fesvr) for host-target communications. Instead, for running the riscv-test simulations, the Debug Module is used to load binaries and communicate with fesvr. This requires the Debug Module to poll the core with constant interrupts to check if the core has anything to say. These constant interrupts are disruptive to any benchmarking attempts. 


The current solution that I use for benchmarking is the following:

1) compile dhrystone yourself. By default this will compile dhrystone against a c library and put the binary in the lower address space. Make sure to use the proper cflags as specified by dhrystone. 

2) run dhrystone on top of pk or Linux to virtualize the addresses. 

3) run on an fpga where Debug polling frequency will be less invasive. 

There are other solutions I'm sure, but this is what I do. 


-Chris
--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2B1fW8DOZ8jU3aUYzjyb02pJbJh91f3Zxysjo2bh6vH26WbRAA%40mail.gmail.com.

Tim Newsome

unread,
Oct 6, 2016, 2:22:37 PM10/6/16
to Christopher Celio, sw-...@groups.riscv.org
On Wed, Oct 5, 2016 at 12:56 PM, Christopher Celio <ce...@berkeley.edu> wrote:
The dhrystone value reported by the rocket-chip emulator testing is now invalid, for a few reasons.

1) there is a special set of cflags that must be used for a valid dhrystone run. 

2) the riscv-tests are run bare metal, however changes in the past few months including moving the physical, cache able memory region to the higher half of the available memory region. This means that inefficient code sequences are emitted for some static memory accesses. 

3) rocket-chip is no longer directly tethered to the front-end server (fesvr) for host-target communications. Instead, for running the riscv-test simulations, the Debug Module is used to load binaries and communicate with fesvr. This requires the Debug Module to poll the core with constant interrupts to check if the core has anything to say. These constant interrupts are disruptive to any benchmarking attempts. 

Why is the Debug Module so intrusive?
I assume it's being used for printf() or other I/O. I would have thought you'd set a breakpoint at the IO function, which is only intrusive when IO does happen. Of course if the test is short compared to the IO speed it will still impact benchmark numbers.

Tim

Andrew Waterman

unread,
Oct 6, 2016, 3:07:39 PM10/6/16
to Tim Newsome, Christopher Celio, sw-...@groups.riscv.org
For the tethered systems, it's being used in a much more naive fashion
(using the core to poll a memory location). Using a breakpoint would
get rid of the problem.

>
> Tim
>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sw-dev+un...@groups.riscv.org.
> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAGDihekTew-%2BC_Y5etnwQsLkKe%2BtEnfnp-O7d3X3r9pZOBVk%2BQ%40mail.gmail.com.

Christopher Celio

unread,
Oct 6, 2016, 6:56:17 PM10/6/16
to Andrew Waterman, Tim Newsome, sw-...@groups.riscv.org
For the tethered systems, it's being used in a much more naive fashion
(using the core to poll a memory location).  

0:-)

Why is the Debug Module so intrusive?
I assume it's being used for printf() or other I/O. I would have thought
you'd set a breakpoint at the IO function, which is only intrusive when IO
does happen. 

Sorry Tim, I didn't mean to imply the Debug Module was bad! Thanks guys for helping to clarify. 


-Chris

steven

unread,
May 16, 2017, 9:23:06 AM5/16/17
to RISC-V SW Dev, joi...@gmail.com
Hi Celio and all,

1) why moving the physical, cacheable memory region to the higher half available memory region will cause inefficient code sequence ? 
2) how to run the dhrystone on top of pk ? 
    ./emulator-roketchip-DefaultConfig pk $RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/dhrystone.riscv ? 
    
Thanks,
Steven


celio於 2016年10月6日星期四 UTC+8上午3時56分54秒寫道:

Andrew Waterman

unread,
May 16, 2017, 12:03:06 PM5/16/17
to steven, RISC-V SW Dev, 재민김
On Tue, May 16, 2017 at 8:23 AM, steven <etche...@gmail.com> wrote:
> Hi Celio and all,
>
> 1) why moving the physical, cacheable memory region to the higher half
> available memory region will cause inefficient code sequence ?

In RV64, linking at 0x80000000 means you can't use LUI for
global-variable addressing, thereby preventing use of the global
pointer. We are working on a fix, but it's probably several weeks
out.
> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/9faa7932-5b80-4613-9bd0-22a7b2d932c1%40groups.riscv.org.

steven

unread,
May 17, 2017, 11:52:43 AM5/17/17
to RISC-V SW Dev, etche...@gmail.com, joi...@gmail.com
Andrew, 

Thanks for your reply. could you be more specific ? 
why LUI causes the problem .?? 


andrew於 2017年5月17日星期三 UTC+8上午12時03分06秒寫道:

Palmer Dabbelt

unread,
May 17, 2017, 1:10:55 PM5/17/17
to etche...@gmail.com, sw-...@groups.riscv.org, etche...@gmail.com, joi...@gmail.com
There are two code models supported by RISC-V's GCC port: "-mcmodel=medlow" and
"-mcmodel=medany". The default is "-mcmodel=medlow". When you compile some
code that looks like

int glob;
int func(void) { return glob; }

you either get

# -mcmodel=medlow
func:
lui a0, %hi(glob)
ld a0, %lo(glob)(a0)

or

# -mcmodel=medany -mexplicit-relocs
func:
auipc a0, %pcrel_hi(glob)
ld a0, %pcrel_lo(glob)(a0)

[as an aside, without explicit relocs (which is the default in medany) you'll
actually get a third instruction

func:
auipc a0, %pcrel_hi(glob)
addi a0, %pcrel_lo(glob)(a0)
ld a0, 0(a0)

so you should turn on explicit relocations when building Dhrystone. This is
something we'll eventually fix, but it's a way off.]

Thus, on RV64 the medlow code (lui-based addressing) can only load addresses
between 2GB and -2GB, which doesn't contain our default RAM location.

If you look at the generated code for Dhrystone, you'll find that it spends a
lot of cycles touching global variables. The vast majority of these accesses
are within a 12-bit displacement from __global_pointer$, which means the
lui-based sequences can be relaxed to something like

func:
ld a0, %lo(glob)(gp)

We don't currently have a linker relaxation that optimizes the cooresponding
auipc-based sequences, which means you get worse performance in medany mode.
There is an outstanding binutils patch that relaxes auipc-based sequences, but
it still fails one of the GCC regression tests so we're not ready to merge it yet

https://github.com/riscv/riscv-binutils-gdb/pull/68

On Wed, 17 May 2017 08:52:43 PDT (-0700), etche...@gmail.com wrote:
> Andrew,
>
> Thanks for your reply. could you be more specific ?
> why LUI causes the problem .??
>
>
> andrew於 2017年5月17日星期三 UTC+8上午12時03分06秒寫道:
>>
>> On Tue, May 16, 2017 at 8:23 AM, steven <etche...@gmail.com <javascript:>>
>> > email to sw-dev+un...@groups.riscv.org <javascript:>.
>> > To post to this group, send email to sw-...@groups.riscv.org
>> <javascript:>.

steven

unread,
May 18, 2017, 10:46:07 AM5/18/17
to RISC-V SW Dev, etche...@gmail.com, joi...@gmail.com, pal...@sifive.com
Thanks Palmer for the detailed information. (I will need sometime to absorb these information)

I re-compile Dhrystone by adding  both # -mcmodel=medany -mexplicit-relocs,
The DMPIS boost a bit from 1.18 to 1.26. 


Regarding your comment about "RV64 the medlow code (lui-based addressing) can only load addresses between 2GB and -2GB, which doesn't contain our default RAM location",  
if I recompile the Dhrystone to somewhere like 0x50000000 and move default RAM location to this place, do you expect to see better performance ? (>1.26 ) 
 
Thanks,
Steven  

Palmer Dabbelt於 2017年5月18日星期四 UTC+8上午1時10分55秒寫道:
Reply all
Reply to author
Forward
0 new messages