I'm using the tricore-gcc v2.8.1 to generate code for my Infineon
TriCore DSP. Analyzing the generated code (compiled with -O0), I
noticed a strange behavior of the compiler. (My questions do not
concern a specific processor but more the theory behind the assembly
language.)
Here is the code for a simple C function "loopfunc" that is
called from the main function (not included):
.file "for.c"
# Generated by gcc 2.8.1 for TRICORE (generic)
gcc2_compiled.:
__gnu_compiled_c:
.section .text
.align 1
.global loopfunc
.type loopfunc,@function
loopfunc:
# arg_overflow_area = 0, pretend = 0
# frame_size = 12, outgoing_args_size = 0
# f_size = 16, frame_pointer_needed = 0
# pool_size = 0
sub.a %SP, 16 # allocate space for locals
st.w [%a10] 12, %d4
mov %d1, 0
st.w [%a10] 8, %d1
.L2:
ld.w %d15, [%a10] 8
ld.w %d0, [%a10] 12
jlt %d15, %d0, .L5
j .L3
.L5:
ld.w %d15, [%a10] 4
add %d0, %d15, 1
st.w [%a10] 4, %d0
.L4:
ld.w %d15, [%a10] 8
add %d0, %d15, 1
st.w [%a10] 8, %d0
j .L2
.L3:
ld.w %d15, [%a10] 4
mov %d2, %d15
j .L1
.L1:
ret
.align 1
.global main
.type main,@function
I don't understand why the compiler
1) is not spliting basic block .L2 into two blocks where
the second block just contains the instruction "j .L3".
I though that each re-direction of the control flow imposes the
creation of a new basic block (like .L4 and .L3). The instuction
"jlt %d15, %d0, .L5" can possibly direct the program flow to .L5 and
thus, in my opinion, the basic block should end here.
2) splits block .L5 and .L4? There are no
calls to label .L4, so, according to my understanding, all instructions
of blocks .L5 and .L4 should be held in one block.
Might these two points be bugs or do the authors of the tricore-gcc
have a different definition of basic blocks?
I would be very glad if you could shed some light on my questions.
Thank you.
Chris
The lack of label does not imply lack of a basic block. The only way
to enter the basic block, consisting of the single insn ``j .L3'' is a
fallthrough edge, hence no label necessary.
> 2) splits block .L5 and .L4? There are no
> calls to label .L4, so, according to my understanding, all instructions
> of blocks .L5 and .L4 should be held in one block.
And the presense of a label does not imply presense of a basic block.
I suspect the .L4 label is output during code generation for the loop
as a label where ``continue'' statements to jump.
~velco
>Here is the code for a simple C function "loopfunc" that is
>called from the main function (not included):
>
<snip>
>.L2:
> ld.w %d15, [%a10] 8
> ld.w %d0, [%a10] 12
> jlt %d15, %d0, .L5
> j .L3
>.L5:
> ld.w %d15, [%a10] 4
> add %d0, %d15, 1
> st.w [%a10] 4, %d0
>.L4:
> ld.w %d15, [%a10] 8
> add %d0, %d15, 1
> st.w [%a10] 8, %d0
> j .L2
>.L3:
> ld.w %d15, [%a10] 4
> mov %d2, %d15
> j .L1
>.L1:
> ret
> .align 1
> .global main
> .type main,@function
First, it would be a lot easier to understand the assembler if the HLL
that produced it was included.
>I don't understand why the compiler
>1) is not spliting basic block .L2 into two blocks where
>the second block just contains the instruction "j .L3".
The code sequence
jlt %d15, %d0, .L5
j .L3
is a standard encoding of an exiting IF.
Notice also that .L3 jumps to .L1 instead of just falling through.
The splitting algorithm needs to guarantee that each block ends with
an *unconditional* control transfer (a jump or return) so that the
blocks may be rearranged without messing up the program logic. A
block which ends in IF is coded as a conditional branch to the IF part
followed by and unconditional branch to the ELSE part (or to whatever
follows the IF).
Using a higher optimization level should result in better merging of
the blocks and eliminate (at least some of) the redundant jumps.
>I though that each re-direction of the control flow imposes the
>creation of a new basic block (like .L4 and .L3). The instuction
>"jlt %d15, %d0, .L5" can possibly direct the program flow to .L5 and
>thus, in my opinion, the basic block should end here.
There is no prohibition against a block having multiple exits so long
as no non-exit processing is performed between them.
On machines that have a separate compare instruction and non-computing
jumps you may see 3-way exits that look like
cmp %d1, %d2
jlt .L4
jgt .L5
j .L3
You can also see such code using subtract instead of compare.
>2) splits block .L5 and .L4? There are no
>calls to label .L4, so, according to my understanding, all instructions
>of blocks .L5 and .L4 should be held in one block.
It looks like the compiler created a new block label for each HLL
statement (just in case). After splitting it turned out that the
label .L4 did not really name a block entry point - but the label was
retained anyway.
This is, again, a normal side effect. It's extra work to go back and
relabel after merging blocks ... I've never seen a (non research)
compiler bother to do it.
George
I'm not familiar with the tricore's instruction set, but this looks like
it really wanted to generate "jge" and perhaps it doesn't exist, or,
more likely, the conditional branch has a very short range and so the
compiler needs to use the longer-range unconditional branch to reach the
target. In either of these cases, the code would be the result of
instruction selection, which takes place after gcc is done with any
thoughts of blocks.
> 2) splits block .L5 and .L4? There are no
> calls to label .L4, so, according to my understanding, all instructions
> of blocks .L5 and .L4 should be held in one block.
Yeah, that does seem to be an unneeded label. There was probably some
reason for it, but it's very hard to guess, particularly without the
source code.
> Might these two points be bugs or do the authors of the tricore-gcc
> have a different definition of basic blocks?
Probably neither. Extra labels do no harm, and the two jumps are
probably caused by one of the mechanisms I mentioned.
- ken
> Might these two points be bugs or do the authors of the tricore-gcc
> have a different definition of basic blocks?
An assembler label does not necessarily mark the begin of an basic
block.
In your example the compiler has inserted additional jumps and labels,
perhaps due to (lack of?) optimization...
DoDi
That's my opinion.
G.R.
GCC versions up to 2.8.x did not have any notion of a CFG at -O0. They
worked all the time with jumps directly encoded in their intermediate
representation.
Yes, this is bad. Luckily more and more passes in GCC have been
converted to look directly at CFG edges, but not all of GCC does even
now, after 10+ years.
Paolo
> I don't understand why the compiler
> 1) is not spliting basic block .L2 into two blocks where
> the second block just contains the instruction "j .L3".
> I though that each re-direction of the control flow imposes the
> creation of a new basic block (like .L4 and .L3). The instuction
> "jlt %d15, %d0, .L5" can possibly direct the program flow to .L5 and
> thus, in my opinion, the basic block should end here.
> 2) splits block .L5 and .L4? There are no
> calls to label .L4, so, according to my understanding, all instructions
> of blocks .L5 and .L4 should be held in one block.
>
> Might these two points be bugs or do the authors of the tricore-gcc
> have a different definition of basic blocks?
It would have been helpful to see the C code in comparison, but I can
guess what it looks like.
Short answer: there are no 'basic blocks' in assembly as such, only
Zuul^Wjump targets. If the compiler doesn't see a (possible) need to
jump to a certain location, why should it generate a label for that
location?
The .L2 'block' for example is the implementation of an if() itself,
including jump the 'then' part (jit ... .L5) and jump to the 'else' part
(j .L3), so having a label for the 'j .L3' doesn't make sense.
You are right that label .L4 is not targeted at all, and I can't explain
why - it might be just an artifact of GCC's code generation - but keep
in mind that with -O0 gcc does no optimization at all, as evidenced by
the 'j .L1' immediately followed by the .L1 label. Run gcc with -O1 or
above, and you should see improvements in the generated assembly.
--
Lars Duening; lars at bearnip dot com
You are assuming that each basic block begins with a label,
but that's not necessary: for a BB that is only
reached by fall-through from the preceding block, there is no need to
give it a label at all. So even though it's a distinct BB,
it will not have a label in the final assembly output.
To see the real partitioning of the the code into BBs that
gcc has used, ou need to dump the RTL.
> 2) splits block .L5 and .L4? There are no
> calls to label .L4, so, according to my understanding, all instructions
> of blocks .L5 and .L4 should be held in one block.
BBs do not necessarily have to be maximal.
Compiling at -O0, I would not assume that gcc form
maximal basic blocks.
Also, as the previous point, you should not assume
that the labels in the assembly output actually show the BB
partioning that the compiler has used. To see that you should dump
the RTL.
Steve.