On Fri, Sep 21, 2018 at 2:47 PM Armia Mrassy <
eng....@gmail.com> wrote:
> After some trials, I was able to customize GCC to give extra penalty for the jumps and branches by making the following changes to the defined "length" attribute in riscv.md
>
> Those changes makes GCC thinks that the branches instructions take 4x its actual instruction size. With this change, the GCC is forced to minimize the branch and jump instructions. Note that there was no attribute for the "jump" instructions. That gives a consistent speed improvment for the benchmarks I am using. Of course that may little increase the code size ;)
>
> What is your opinion about those changes? Is there another clean way to optimize GCC to minimize the branches and jumps? I think, some other processor architectures provide "delay" attribute to the instructions to optimize for speed, but I cauld not find this attribute in RISCV architecture.
Most optimization passes will look at the cost of an instruction, not
its size. There are a few optimization passes that can increase code
size, and have heuristics to try to limit the code size increase. The
basic block reorder pass is one of them. By increasing the size of
branches, you are hitting the code size increase limit earlier, and
hence preventing it from duplicating code in some cases. So you are
only indirectly preventing the optimization you don't want. But since
the main purpose of the bb-reorder pass is to try to eliminate
branches, by preventing it from duplicating code, you may be
preventing it from reducing the number of branches in other cases. I
think the particular testcase you are looking at,
riscv-tests/benchmark/median.c just happens to trigger worst case
behavior from this optimization pass, and accidentally increases the
number of branches, while decreasing the total number of instructions
in the loop. It isn't clear that this trick of increasing branch size
will also work more generally. You would need to test this on many
more examples to get a better idea of how well this works.
The delay attribute is for targets like the MIPS that have
architectural delayed branches, where the instruction after the branch
is always executed. Without delayed branch optimization, this gets
filled with a nop and is wasted. With delayed branch optimization, we
try to find an instruction from before the branch that can be moved
after the branch, or if there isn't one, then maybe an instruction
from the target path and fallthrough path that can be moved forward
into the branch delay slot. Delay branches are no longer considered a
good idea, and RISC-V does not have them.
Instruction lengths can be used for compile-time relaxation. For
instance, deciding whether to emit a direct branch (4KB range) or an
instruction sequence that loads the target address into a register and
then use a jump register instruction. The RISC-V port currently does
not do this, but lying to the compiler about instruction lengths would
reduce the effectiveness of this optimization. We would like to
modify the compiler to emit compressed instructions directly someday.
The smaller offsets in compressed instructions means that we may need
accurate instruction length info to make good use of the compressed
instructions. Lying to the compiler about instruction lengths would
reduce the effectiveness of this, which in turn would reduce the
number of compressed instructions generated by the compiler.
So I think your change is mainly working by accident for your
testcase, and is probably safe today, but may cause problems in the
future.
Jim