GCC optimization

ubspsc

unread,

Sep 9, 2018, 9:35:36 AM9/9/18

to RISC-V SW Dev

Hello,

We are working on a very simple RISCV core that is based on Berkley ROCKET. The core does not have branch predictor, and in case of a taken jump, many of the cycles will be wasted.

Initially, we used a very old version (2016) for Berkley ROCKET to do our changes, however after moving to a recent version, we found that the IPC performance has been degraded by 10% after using our changes. After debugging this issue we found that the main source of degradation comes from taking more jumps through execution, which hurts us a lot especially without having BTB functionality.

You may find below the assembly code resulted from compiling the median benchmark using the old and the recent tool chain. I am not familiar with GCC optimization, but I think that in the recent version, one of the jumps in 800010da and 800010de will happen in every loop iteration. On the other hand, the old tool chain was able to arrange the code such that the jump in 80001108 will happen only 50% of the loop iterations, which leads to reducing the number of taken branches.

Can anyone help me to figure out why we faced this behaviour in the recent tool chain? Is there a way to instruct the GCC to optimize the number of jumps?

Thank you,
Armia

*RECENT:***********************************************************************
0000000080001088 <median>:
    80001088:    00251793              slli    a5,a0,0x2
    8000108c:    97b2                    add    a5,a5,a2
    8000108e:    00062023              sw    zero,0(a2)
    80001092:    fe07ae23              sw    zero,-4(a5)
    80001096:    4789                    li    a5,2
    80001098:    02a7da63              ble    a0,a5,800010cc <median+0x44>
    8000109c:    3575                    addiw    a0,a0,-3
    8000109e:    1502                    slli    a0,a0,0x20
    800010a0:    8179                    srli    a0,a0,0x1e
    800010a2:    00458793              addi    a5,a1,4
    800010a6:    0611                    addi    a2,a2,4
    800010a8:    953e                    add    a0,a0,a5
    800010aa:    0001                    nop
    800010ac:    4198                    lw    a4,0(a1)
    800010ae:    41dc                    lw    a5,4(a1)
    800010b0:    4594                    lw    a3,8(a1)
    800010b2:    00f75f63              ble    a5,a4,800010d0 <median+0x48>
    800010b6:    02d7c363              blt    a5,a3,800010dc <median+0x54>
    800010ba:    00e6df63              ble    a4,a3,800010d8 <median+0x50>
    800010be:    0001                    nop
    800010c0:    c218                    sw    a4,0(a2)
    800010c2:    0001                    nop
    800010c4:    0591                    addi    a1,a1,4
    800010c6:    0611                    addi    a2,a2,4
    800010c8:    feb512e3              bne    a0,a1,800010ac <median+0x24>
    800010cc:    8082                    ret
    800010ce:    0001                    nop
    800010d0:    fed748e3              blt    a4,a3,800010c0 <median+0x38>
    800010d4:    00f6c463              blt    a3,a5,800010dc <median+0x54>
    800010d8:    c214                    sw    a3,0(a2)
    800010da:    b7ed                    j    800010c4 <median+0x3c>
    800010dc:    c21c                    sw    a5,0(a2)
    800010de:    b7dd                    j    800010c4 <median+0x3c>
    800010e0:    0000                    unimp
    800010e2:    0000                    unimp
*******************************************************************************

*OLD:**************************************************************************
0000000080001088 <median>:
    80001088:    00251793              slli    a5,a0,0x2
    8000108c:    00f607b3              add    a5,a2,a5
    80001090:    00062023              sw    zero,0(a2)
    80001094:    fe07ae23              sw    zero,-4(a5)
    80001098:    00200793              li    a5,2
    8000109c:    06a7d263              ble    a0,a5,80001100 <median+0x78>
    800010a0:    ffd5071b              addiw    a4,a0,-3
    800010a4:    02071713              slli    a4,a4,0x20
    800010a8:    02075713              srli    a4,a4,0x20
    800010ac:    00270713              addi    a4,a4,2
    800010b0:    00271713              slli    a4,a4,0x2
    800010b4:    00460793              addi    a5,a2,4
    800010b8:    00e60633              add    a2,a2,a4
    800010bc:    01c0006f              j    800010d8 <median+0x50>
    800010c0:    02a74863              blt    a4,a0,800010f0 <median+0x68>
    800010c4:    04d54063              blt    a0,a3,80001104 <median+0x7c>
    800010c8:    00a7a023              sw    a0,0(a5)
    800010cc:    00478793              addi    a5,a5,4
    800010d0:    00458593              addi    a1,a1,4
    800010d4:    02f60663              beq    a2,a5,80001100 <median+0x78>
    800010d8:    0005a683              lw    a3,0(a1)
    800010dc:    0045a703              lw    a4,4(a1)
    800010e0:    0085a503              lw    a0,8(a1)
    800010e4:    fce6cee3              blt    a3,a4,800010c0 <median+0x38>
    800010e8:    00a6ce63              blt    a3,a0,80001104 <median+0x7c>
    800010ec:    fce55ee3              ble    a4,a0,800010c8 <median+0x40>
    800010f0:    00e7a023              sw    a4,0(a5)
    800010f4:    00478793              addi    a5,a5,4
    800010f8:    00458593              addi    a1,a1,4
    800010fc:    fcf61ee3              bne    a2,a5,800010d8 <median+0x50>
    80001100:    00008067              ret
    80001104:    00d7a023              sw    a3,0(a5)
    80001108:    fc5ff06f              j    800010cc <median+0x44>
*******************************************************************************

Jim Wilson

unread,

Sep 9, 2018, 12:18:08 PM9/9/18

to ubspsc, RISC-V SW Dev

On Sun, Sep 9, 2018 at 2:35 PM, ubspsc <eng....@gmail.com> wrote:
> Initially, we used a very old version (2016) for Berkley ROCKET to do our
> changes, however after moving to a recent version, we found that the IPC
> performance has been degraded by 10% after using our changes.
>

> You may find below the assembly code resulted from compiling the median
> benchmark using the old and the recent tool chain. I am not familiar with
> GCC optimization, but I think that in the recent version, one of the jumps
> in 800010da and 800010de will happen in every loop iteration. On the other
> hand, the old tool chain was able to arrange the code such that the jump in
> 80001108 will happen only 50% of the loop iterations, which leads to
> reducing the number of taken branches.

You haven't specified the old compiler, and you haven't provided a
testcase. A google search for median benchmark source code doesn't
turn up anything relevant. Since I have access to neither the old
compiler nor the testcase, there isn't much I can do. GCC
optimization can vary greatly depending on the testcase, the gcc
version, and the compiler options being used which also wasn't
specified. There are many different optimization passes that could be
causing the problem, which I can't identify unless I have enough info
to reproduce.

One thing I can point out is that we have a -mbranch-cost=X option.
It defaults to 3. It gets set to 2 when compiling for size, or if the
branch is predictable. And it gets set to 1 if you use -mtune=size.
If this is not a predictable branch, then increasing the value of
branch-cost might help. if you are using -Os, then using -O2 instead
might help. If it is a predictable branch, then you would have to
modify gcc sources to try a different branch cost. See the definition
of BRANCH_COST in gcc/config/riscv/riscv.h. There is no guarantee
that changing BRANCH_COST will help. It is just the only obvious
thing I can mention based on the info you gave.

Jim

Armia Mrassy

unread,

Sep 9, 2018, 1:57:07 PM9/9/18

to ji...@sifive.com, sw-...@groups.riscv.org

Hello Jim,

Thank you for this quick response. I am sorry that I did not include all the details needed to reproduce this behaviour. Here is the old tool-chain that I used: https://github.com/riscv/riscv-tools/commit/5e9763fe7306e1214fc6babf05b3f15728932738, and the new one is https://github.com/riscv/riscv-tools/commit/98682995dc4a1ab8777ff45ba673cf2658e54ae2/ . You can find the benchmark I am referring under those version in riscv-tests/benchmarks/. The building scripts are there also. Both of the versions use O2 optimization flags. I only introduced "-falign-labels=4" flag to the Makefile of the recent version of the tool-chain.

I tried to use 'mbranch-cost' option, however, I do not see any change in the median function output assembly, I can notice changes outside it.

Best regards,

Armia

Jim Wilson

unread,

Sep 10, 2018, 4:56:37 PM9/10/18

to Armia Mrassy, RISC-V SW Dev

On Sun, Sep 9, 2018 at 6:56 PM, Armia Mrassy <eng....@gmail.com> wrote:
> Thank you for this quick response. I am sorry that I did not include all the
> details needed to reproduce this behaviour. Here is the old tool-chain that
> I used:
> https://github.com/riscv/riscv-tools/commit/5e9763fe7306e1214fc6babf05b3f15728932738,
> and the new one is
> https://github.com/riscv/riscv-tools/commit/98682995dc4a1ab8777ff45ba673cf2658e54ae2/
> . You can find the benchmark I am referring under those version in
> riscv-tests/benchmarks/. The building scripts are there also. Both of the
> versions use O2 optimization flags. I only introduced "-falign-labels=4"
> flag to the Makefile of the recent version of the tool-chain.

I see you are using riscv-tools. This is poorly maintained, difficult
to build, and in the process of being obsoleted. This contains the
precise set of tools necessary for rocket-chip, and should not be used
for any other purpose. It is known to contain an old obsolete and
broken gcc version, gcc-7.2. So you aren't actually using an
up-to-date gcc. But gcc-7.2 handles this the same as current gcc-8
though.

I tracked the problem down to the basic-block reorder pass (bbro)
which is making different decisions in gcc-6 and gcc-7. Not
immediately obvious why. This will take time to track down. I
suspect a target independent optimizer fix, since I don't think that
target dependent costs are used in this pass.

Jim

Michael Clark

unread,

Sep 11, 2018, 3:26:22 AM9/11/18

to Jim Wilson, Armia Mrassy, RISC-V SW Dev

On Tue, 11 Sep 2018 at 8:56 AM, Jim Wilson <ji...@sifive.com> wrote:

On Sun, Sep 9, 2018 at 6:56 PM, Armia Mrassy <eng....@gmail.com> wrote:
> Thank you for this quick response. I am sorry that I did not include all the
> details needed to reproduce this behaviour. Here is the old tool-chain that
> I used:
> https://github.com/riscv/riscv-tools/commit/5e9763fe7306e1214fc6babf05b3f15728932738,
> and the new one is
> https://github.com/riscv/riscv-tools/commit/98682995dc4a1ab8777ff45ba673cf2658e54ae2/
> . You can find the benchmark I am referring under those version in
> riscv-tests/benchmarks/. The building scripts are there also. Both of the
> versions use O2 optimization flags. I only introduced "-falign-labels=4"
> flag to the Makefile of the recent version of the tool-chain.

I see you are using riscv-tools. This is poorly maintained, difficult
to build, and in the process of being obsoleted. This contains the
precise set of tools necessary for rocket-chip, and should not be used
for any other purpose. It is known to contain an old obsolete and
broken gcc version, gcc-7.2. So you aren't actually using an
up-to-date gcc. But gcc-7.2 handles this the same as current gcc-8
though.

It’s sad that riscv-tools is not well maintained. I have been using riscv-tools since gcc-6.3 and I have fond memories of riscv-tools. There have been many changes over time and I have even gone to the extent of using combinations of multiple predefined macros intersecting multiple compiler versikbs to keep code building with the vintage RISC-V compilers that I have been collecting on my machine, including a gcc-6.3 that doesn’t have as nearly many redundant sign extensions as more recent compilers.

I don’t think replacing riscv-tools is the answer. That just shifts the problem somewhere elsewhere. The problem goes deeper and we need better version management for integrated tools releases as there are so many interdependencies. This is a policy problem not a technical problem.

The system of using good dates to search for submodule commits to find out the magic commit ids and then reverse these to find active branches is not that great. I periodically do this so that I can reconstruct the current set of stable backport branches in a list I maintain (which is not currently open source).

Of course we rely on upstream versions for riscv-fesvr, riscv-pk, riscv-isa-sim and riscv-openocd because they don’t have versioned releases... yet.

In my opinion, I think we should adopt a versioning scheme and have tested integrated releases made from stable backport branches in the development trees, and to use git tags more effectively so we have archives and release binaries such as .deb, .rpm and .zip on GitHub Releases for popular operating systems such as Debian, Ubuntu, Fedora, CentOS, Windows and macOS. Possibly git-mirror so we can guarantee reproducible builds.

We don’t want to have to check out old dependencies to restore a vintage compiler; to perform tests. Remember there are those risk averse (not RISC averse) folk who will to stick to a particular compiler version unless there is some really compelling new features like extra good compression on PC-Relative code. i.e. 2 pass relaxation solver in binutils that jiggles the code into a local minima with only a few gaps

It’s a shared problem for many in the RISC-V community who build riscv-tools. We also should add some of the more recent tools to riscv-qemu and build the multilib version. If we provide packages then the cost of multilib compiles is amortised vs the current submodule global warming situation we have.

riscv-tools needs better version and release management. This has been stated on the list several times. It’s not rocket science. In fact I responded to a request to help with CI for riscv-tools, and ended up working on riscv-QEMU at SiFive. We have so many deadlines and other tasks that CI falls by the wayside. Notwithstanding my own contribution such as working on QEMU TCG at the first RISC-V Hackathon instead of CI.

However, I’m a little suspect about basing it on Gentoo. I had a go at installing Gentoo today and it reminded me of installing Slackware Linux back in 1993. It’s totally manual. Nothing is automated. I had to manually partition my disk and write my own /etc/fstab. Things have changed. Now we have devops, the cloud and containers. Obsolete? What does obsolete mean?

I’d personally offer to support .deb and .rpm for Fedora, Ubuntu, CentOS, Debian, Windows 10 and macOS.

No Gentoo. Well maybe tbz2 for Gentoo. We’ll think about it. The rationale here is that Gentoo users are already used to compiling all of their tools from scratch so it’s situation normal. The other true fact I like to tease Gentoo users with is their contribution to Global Warming. It’s much more than the mainstream distributions that primarily use binary packages. It’s ironic. You spend ages recompiling your system with -03 so you can do faster compiles. 😀

In any case I did volunteer but my Docker/Kubernetes container build, with native build options for Linux, macOS and Windows didn’t fly. It needs a concerted effort to automate all of the testing. We could run benchmarks and generate historical data to track code sizes across compiler versions like arm does.

This was my prototype charts:

https://rv8.io/bench

Note: code sizes currently hidden due to long double (f128) penalising RISC-V

Anyway I’m probably not allowed to spend time on riscv-tools. I likely have to focus on porting PLIC, CLIC and PMP to riscv-isa-sim and possible adding some tests to riscv-tests. We’ll see...

I’d love it as a primary responsibly as i’d completely automate the hell out of it and gate all commits that failed a full tool suite integration test, uploading tested binaries for major OSes. If I was able to automate my job I could spend my spare time on simulation. Ideal.

I’m using gcc-7.3 circa 2018.1-1 using my own custom build scripts, but they need a little more work before they could act as a riscv-tools replacement. The verification and hardware folk have pretty stringent criteria on backwards compatibility.

I tracked the problem down to the basic-block reorder pass (bbro)
which is making different decisions in gcc-6 and gcc-7. Not
immediately obvious why. This will take time to track down. I
suspect a target independent optimizer fix, since I don't think that
target dependent costs are used in this pass.

Jim

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAFyWVabbPGyz3QRCwgLV2M%2B_1bU%2BAqCPffzwZAaAoVEaz%2BoXGw%40mail.gmail.com.

Michael Clark

unread,

Sep 11, 2018, 3:27:53 AM9/11/18

to Jim Wilson, Armia Mrassy, RISC-V SW Dev

I hit send by mistake 🤔

Tommy Murphy

unread,

Sep 11, 2018, 5:28:30 AM9/11/18

to RISC-V SW Dev, ji...@sifive.com, eng....@gmail.com

On Tuesday, 11 September 2018 08:26:22 UTC+1, mjc wrote:

I don’t think replacing riscv-tools is the answer.

Well isn't it too late now that much of the RISC-V support has been upstreamed to the relevant "master" projects (gcc, gdb, binutils, newlib, glibc etc.) and more will be upstreamed over time?

Isn't this a "good thing" that the master projects provide the support and not some poorly maintained fork?

There are other existing threads on the mailing list about this issue of upstream support for RISC-V that may be relevant here.

Michael Clark

unread,

Sep 11, 2018, 6:08:36 AM9/11/18

to Tommy Murphy, RISC-V SW Dev, ji...@sifive.com, eng....@gmail.com

I don’t know why folk talk about poorly maintained forks. riscv-tools is a meta repo. Many of the riscv- projects have more bug fixes and advanced features than upstream. riscv-qemu and riscv-openocd case in point; as well as the projects that only exist in the RISC-V organisation such as riscv-opcodes, riscv-fesvr, riscv-isa-sim, riscv-tests.

I know for a fact that several of the chip repos depend on riscv-tools and they are usually wary of some of the tool changes that may break things. This is typically not anybodies fault rather the integration test infrastructure doesn’t exist for the open source projects.

Development is carried out in the RISC-V repos. Which poorly maintained fork are you referring to? I would like to know.

riscv-qemu for example has more bug fixes and features than upstream and the same is the case for other folk who choose to host their maintainer branches in the RISC-V organisation.

Again; the problem is a CI problem and a matter of extensive integration tests because there are many cross dependencies. I have data on these but it is not open source... yet... I actually made a proposal to SiFive, however it is up to them whether they would donate time for me to work on this...

Again, curious which project you are referring to as a poorly maintained fork? please elaborate.

The problem is integration testing off almost a dozen repos containing multiple simulators, FPGA tests (freedom or rocket-chip), compiler changes, two difference C libraries, Linux kernel, bbl, the open source test suite, etc

There are no poorly maintained forks, but it is a relatively complex problem to integrate and test all of these moving parts, given they get resynced with upstream changes for target independent code. The riscv repos are the source of truth.

Curious which repo you are talking about. I somewhat agree that riscv-tools has a lot of room for improvement, but if we were to change it we would need to be very careful to make the changes backwards compatible i.e. emulate submodules for local workspace builds or fetch binary dependencies.

I have written about this problem in more detail as a proposal to SiFive but I’m not at liberty to share much more other than things like mainstream OS support that are typically used. I probably shouldn’t have mentioned Gentoo. I really did accidentally press send (I have lots of drafts). I’ve probably said to much, however it was a response to a request for help with CI that led me to working on QEMU for SiFive. I enjoy the work, however I’m sad about riscv-tools as I’ve been a long time user.

Michael

Tommy Murphy

unread,

Sep 11, 2018, 6:39:03 AM9/11/18

to RISC-V SW Dev

You said yourself that riscv-tools was not well maintained...

Jim Wilson

unread,

Sep 11, 2018, 7:30:48 AM9/11/18

to Michael Clark, Tommy Murphy, RISC-V SW Dev, ubspsc

On Tue, Sep 11, 2018 at 11:08 AM, Michael Clark <michae...@mac.com> wrote:
> I don’t know why folk talk about poorly maintained forks. riscv-tools is a
> meta repo. Many of the riscv- projects have more bug fixes and advanced
> features than upstream. riscv-qemu and riscv-openocd case in point; as well
> as the projects that only exist in the RISC-V organisation such as
> riscv-opcodes, riscv-fesvr, riscv-isa-sim, riscv-tests.

My specific complaints:

1) The only people that add changes to riscv-tools are doing
rocket-chip work, and they only add changes required to keep
rocket-chip working. Since rocket-chip includes it directly, we can't
add anything that might break rocket-chip. Either the direct tie in
to rocket-chip needs to be removed, or it needs to be renamed to
rocket-chip-tools.
2) The REAMDE.md file is horribly out-of-date, and confuses people
that read it as it talks about building linux distros from scratch in
ways that may not work anymore.
3) The riscv.org page pointing at riscv-tools is even more horribly
out-of-date and even more horribly confusing w.r.t. building linux
distros.
4) riscv-tools includes an old and known broken riscv-gnu-toolchain.
I can't upgrade it because the direct tie in to rocket-chip means I
would have to build and test rocket-chip. I don't know how to do
that, and I'm not interested in learning, so I'm stuck not being able
to update the broken riscv-gnu-toolchain. Either someone on the other
side needs to update it, or we need to remove riscv-gnu-toolchain from
riscv-tools, or alternatively remove riscv-tools from rocket-chip as
mentioned above. Meanwhile, people are being confused into using an
old obsolete and known broken compiler.
5) riscv-tools used to include LLVM, but no longer does because they
stopped maintaining riscv-llvm and it was accidentally left in a
broken state. I think we are doing a disservice to the RISC-V
community when we promote a tool set that is missing LLVM (and has a
broken gcc).

As a person who is trying to do a lot of toolchain support work,
riscv-tools just causes nothing but trouble for me, as I have to spend
a lot of time dealing with problems that wouldn't have happened if
riscv-tools didn't exist, which is why I spend a lot of time telling
people not to use riscv-tools.

I don't have a problem with riscv-tools being used to collect together
various RISC-V specific packages. It is the direct tie in to
rocket-chip on one side and riscv-gnu-toolchain on the other side that
is causing the trouble. It should be renamed though. Calling it
riscv-tools confuses people into thinking that it is the right place
for RISC-V software tools, and it isn't. Broken GCC. No LLVM. If it
was called something like riscv-misc-tools it would be OK.

Jim

Jim Wilson

unread,

Sep 11, 2018, 7:34:47 AM9/11/18

to Tommy Murphy, RISC-V SW Dev

On Tue, Sep 11, 2018 at 11:39 AM, Tommy Murphy <tommy_...@hotmail.com> wrote:
> You said yourself that riscv-tools was not well maintained...
>
>> It’s sad that riscv-tools is not well maintained

I assumed that was a typo. "It's said that riscv-tools is not well
maintained", which is presumably a reference to me, as I'm the one
that keeps complaining about it.

Jim

Jim Wilson

unread,

Sep 11, 2018, 7:41:14 AM9/11/18

to Michael Clark, Armia Mrassy, RISC-V SW Dev

On Tue, Sep 11, 2018 at 8:26 AM, Michael Clark <m...@sifive.com> wrote:

I don’t think replacing riscv-tools is the answer. That just shifts the problem somewhere elsewhere. The problem goes deeper and we need better version management for integrated tools releases as there are so many interdependencies. This is a policy problem not a technical problem.

And another thing, your bad habit of hijacking threads drives me nuts. I'm trying to respond to a gcc optimization question here, and now this riscv-tools debate is getting in the way. I'd be happy to debate riscv-tools with you, but it should have been a different thread. You should always start a new thread for a new topic.

Jim

Michael Clark

unread,

Sep 11, 2018, 9:02:30 AM9/11/18

to Tommy Murphy, RISC-V SW Dev

> On 11/09/2018, at 10:39 PM, Tommy Murphy <tommy_...@hotmail.com> wrote:
>
> You said yourself that riscv-tools was not well maintained...

I also said it was a meta repo. riscv-tools is a set of submodules and some shell scripts

>> It’s sad that riscv-tools is not well maintained

I don’t think the underlying problem is completely technical and we can’t blame individuals other than their organisation as a group of volunteers; as it’s an integration test problem, and more repos are affected by changes than just the riscv repos. Folk are depending on these tools. Sometimes there are accidental breakages. Sometimes there are valid reasons to deliberately break features.

The issues comes when all of the distinct tools are integrated together and changes in one module affects another, and how a CI handles inter module dependencies.

Companies of course have their own well run internal CI systems, and there are Linux distros that pick up the open source tools, however, they are not necessarily the ideal source to consume for embedded tools. I know folk consume riscv-tools and if it was more complete, better tested and faster to install, it’s use would become even more widespread. Of course being careful regards to backwards compatibility. e.g. ${RISCV}

On-boarding someone with riscv-tools could take minutes instead of hours and it could be deterministic, not hit or miss depending on the state of a repo at any given point in time.

The RISC-V foundation certainly doesn’t want to build a Linux distro, but there is a role somewhat similar to other groups of licensees that have formed around other architectures to support their tools. It of course needs members donating time or resources. I don’t have the authority to speak on anyone’s behalf so I won’t. It is just my personal opinion that it makes sense to have a cross-vendor set of tools for a cross-vendor ISA with multiple licensees.

We can’t do CI unless we better express dependencies and resolve some dependencies from last known good version (from a stable or beta branch/channel) i.e. move away from the “make world” model to something more compatible with CI. If we build spike or QEMU and want to run tests, we could perhaps build riscv-tests, but building the toolchain is not scalable, and the toolchain is required to build the tests. The dependent modules need to resolve a recent binary dependency and we need a test matrix of branches to build and test from the active development in the RISC-V repos.

Palmer had a very good email calling for volunteers with a description of this around September last year.

Anyway we’ll see what happens...

Jim Wilson

unread,

Sep 14, 2018, 8:48:37 PM9/14/18

to ubspsc, RISC-V SW Dev

On Sun, Sep 9, 2018 at 10:57 AM Armia Mrassy <eng....@gmail.com> wrote:
> Thank you for this quick response. I am sorry that I did not include all the details needed to reproduce this behaviour. Here is the old tool-chain that I used: https://github.com/riscv/riscv-tools/commit/5e9763fe7306e1214fc6babf05b3f15728932738, and the new one is https://github.com/riscv/riscv-tools/commit/98682995dc4a1ab8777ff45ba673cf2658e54ae2/ . You can find the benchmark I am referring under those version in riscv-tests/benchmarks/. The building scripts are there also. Both of the versions use O2 optimization flags. I only introduced "-falign-labels=4" flag to the Makefile of the recent version of the tool-chain.
>
> I tried to use 'mbranch-cost' option, however, I do not see any change in the median function output assembly, I can notice changes outside it.

This looked like an interesting problem, so I spent a little more time
looking at it. I tracked the problem down to a bug fix in the
gcc/config/riscv/riscv.c file, in the riscv_load_store_insns function,
where the new code sets might_split_p to false for 32-bit
loads/stores, as we never have to split them into 2 instructions. The
old code does not. This causes the old code to overestimate the size
of load/store instructions, and decide that basic blocks are bigger
than they actually are. The basic block reorder pass uses heuristics
to decide when to copy a basic block depending on its size. So the
miscalculation of the instruction size in the old compiler is changing
optimization choices made by the bb-reorder pass, which is then
accidentally giving you the code you want. I don't see an obvious way
to get the code you want, other than putting the bug back in, but that
may hurt optimization for other testcases. There is no user option or
heuristic parameter you can set to change the behavior here. The new
code is 2 instructions shorter than the old code, but has 1 more
branch instruction. The bb-reorder pass is estimating branch
probabilities. You might get a different result if you use profile
guided optimization to provide actual branch probabilities instead of
using the estimates.

Jim

Armia Mrassy

unread,

Sep 21, 2018, 5:47:04 PM9/21/18

to ji...@sifive.com, sw-...@groups.riscv.org

Hello Jim,

I really thank you for that :) . I am sure that it took long time to isolate the change that cause this behaviour.

Our system does not have branch prediction, which makes it behaves badly in case of taken jump. I tried to use the suggested 'mbranch-cost' option, but it did not change the performance. I also changed the defination of "BRANCH_COST" to:

#define BRANCH_COST(speed_p, predictable_p) \
((!(speed_p)) ? 2 : riscv_branch_cost)

to make it independent from predictable_p, but I did not notice change in the performance.

After some trials, I was able to customize GCC to give extra penalty for the jumps and branches by making the following changes to the defined "length" attribute in riscv.md

;; Length of instruction in bytes.
(define_attr "length" ""
   (cond [
    ;; Branches further than +/- 4 KiB require two instructions.
    (eq_attr "type" "branch")
    (if_then_else (and (le (minus (match_dup 0) (pc)) (const_int 4088))
                (le (minus (pc) (match_dup 0)) (const_int 4092)))
    (const_int 16)
    (const_int 32))

    ;; For the jump instructions.
    (eq_attr "type" "jump") (const_int 32)

Those changes makes GCC thinks that the branches instructions take 4x its actual instruction size. With this change, the GCC is forced to minimize the branch and jump instructions. Note that there was no attribute for the "jump" instructions. That gives a consistent speed improvment for the benchmarks I am using. Of course that may little increase the code size ;)

What is your opinion about those changes? Is there another clean way to optimize GCC to minimize the branches and jumps? I think, some other processor architectures provide "delay" attribute to the instructions to optimize for speed, but I cauld not find this attribute in RISCV architecture.

Best regards,

Armia

Jim Wilson

unread,

Sep 24, 2018, 8:55:26 PM9/24/18

to ubspsc, RISC-V SW Dev

On Fri, Sep 21, 2018 at 2:47 PM Armia Mrassy <eng....@gmail.com> wrote:
> After some trials, I was able to customize GCC to give extra penalty for the jumps and branches by making the following changes to the defined "length" attribute in riscv.md
>

> Those changes makes GCC thinks that the branches instructions take 4x its actual instruction size. With this change, the GCC is forced to minimize the branch and jump instructions. Note that there was no attribute for the "jump" instructions. That gives a consistent speed improvment for the benchmarks I am using. Of course that may little increase the code size ;)
>
> What is your opinion about those changes? Is there another clean way to optimize GCC to minimize the branches and jumps? I think, some other processor architectures provide "delay" attribute to the instructions to optimize for speed, but I cauld not find this attribute in RISCV architecture.

Most optimization passes will look at the cost of an instruction, not
its size. There are a few optimization passes that can increase code
size, and have heuristics to try to limit the code size increase. The
basic block reorder pass is one of them. By increasing the size of
branches, you are hitting the code size increase limit earlier, and
hence preventing it from duplicating code in some cases. So you are
only indirectly preventing the optimization you don't want. But since
the main purpose of the bb-reorder pass is to try to eliminate
branches, by preventing it from duplicating code, you may be
preventing it from reducing the number of branches in other cases. I
think the particular testcase you are looking at,
riscv-tests/benchmark/median.c just happens to trigger worst case
behavior from this optimization pass, and accidentally increases the
number of branches, while decreasing the total number of instructions
in the loop. It isn't clear that this trick of increasing branch size
will also work more generally. You would need to test this on many
more examples to get a better idea of how well this works.

The delay attribute is for targets like the MIPS that have
architectural delayed branches, where the instruction after the branch
is always executed. Without delayed branch optimization, this gets
filled with a nop and is wasted. With delayed branch optimization, we
try to find an instruction from before the branch that can be moved
after the branch, or if there isn't one, then maybe an instruction
from the target path and fallthrough path that can be moved forward
into the branch delay slot. Delay branches are no longer considered a
good idea, and RISC-V does not have them.

Instruction lengths can be used for compile-time relaxation. For
instance, deciding whether to emit a direct branch (4KB range) or an
instruction sequence that loads the target address into a register and
then use a jump register instruction. The RISC-V port currently does
not do this, but lying to the compiler about instruction lengths would
reduce the effectiveness of this optimization. We would like to
modify the compiler to emit compressed instructions directly someday.
The smaller offsets in compressed instructions means that we may need
accurate instruction length info to make good use of the compressed
instructions. Lying to the compiler about instruction lengths would
reduce the effectiveness of this, which in turn would reduce the
number of compressed instructions generated by the compiler.

So I think your change is mainly working by accident for your
testcase, and is probably safe today, but may cause problems in the
future.

Jim

Armia Mrassy

unread,

Sep 25, 2018, 4:02:09 PM9/25/18

to ji...@sifive.com, sw-...@groups.riscv.org

Thank you, Jim, for this detailed explanation.

Reply all

Reply to author

Forward