Could you explain ARM Branch with Link (BL) instruction considering prefetch?

Robert Willy

unread,

Jul 30, 2015, 12:28:10 PM7/30/15

to

Hi,

When I read the words below dot line, I don't understand why "R14 is adjusted
to allow for the prefetch"

Could you explain it to me?

Thanks,

.......
Branch with Link (BL) writes the old PC into the link register (R14) of the
current bank. The PC value written into R14 is adjusted to allow for the
prefetch, and contains the address of the instruction following the branch
and link instruction. Note that the CPSR is not saved with the PC and
R14[1:0] are always cleared.

Theo Markettos

unread,

Jul 30, 2015, 12:57:13 PM7/30/15

to

Robert Willy <rxj...@gmail.com> wrote:
> Hi,
>
> When I read the words below dot line, I don't understand why "R14 is
> adjusted to allow for the prefetch"
>
> Could you explain it to me?

Any time you move the PC into another register, for instance the link
register R14, what you actually get is the address of the current
instruction plus 8. The reason for this dates back to the ARM1, which had a
3 stage pipeline, fetch-decode-execute. When you executed the move, the
instruction fetch stage was already two instructions further on.

A bit like the branch delay slot on MIPS, exposure of this
microarchitectural artifact to the ISA has meant that all 32 bit ARMs use
this current+8, even though they don't have 3 stage pipelines any more.

Theo

Mel Wilson

unread,

Jul 30, 2015, 1:00:23 PM7/30/15

to

On Thu, 30 Jul 2015 09:28:06 -0700, Robert Willy wrote:

> When I read the words below dot line, I don't understand why "R14 is
> adjusted
> to allow for the prefetch"
>
> Could you explain it to me?

> .......
> Branch with Link (BL) writes the old PC into the link register (R14) of
> the
> current bank. The PC value written into R14 is adjusted to allow for
> the prefetch, and contains the address of the instruction following the
> branch and link instruction. Note that the CPSR is not saved with the
> PC and R14[1:0] are always cleared.

Most likely the processor is fetching instructions in anticipation of
executing them later, so if you store the PC, the value you get is the
address of the instruction two places down from the one you're executing
(i.e. a BL). That's not the instruction you expect to return to, so it's
decremented before storing it in R14. Embedded IBM S/360?

glen herrmannsfeldt

unread,

Jul 30, 2015, 1:21:44 PM7/30/15

to

Theo Markettos <theom...@chiark.greenend.org.uk> wrote:

(snip regarding ARM and PC values)

> Any time you move the PC into another register, for instance the link
> register R14, what you actually get is the address of the current
> instruction plus 8. The reason for this dates back to the ARM1, which had a
> 3 stage pipeline, fetch-decode-execute. When you executed the move, the
> instruction fetch stage was already two instructions further on.

The JSR instruction on the 6502 pushes one less than the address
of the next instruction. RET pops the address and adds one.
Again, it seems related to the value of the register at the time.

> A bit like the branch delay slot on MIPS, exposure of this
> microarchitectural artifact to the ISA has meant that all 32 bit ARMs use
> this current+8, even though they don't have 3 stage pipelines any more.

Is there a compensating branch instruction?

-- glen

rickman

unread,

Jul 30, 2015, 2:11:23 PM7/30/15

to

So they are not saying they are adjusting the address to allow for a
prefetch using that address. Rather they are adjusting the prefetch
address to get the correct next instruction address to return to?

--

Rick

Richard Damon

unread,

Jul 30, 2015, 2:28:42 PM7/30/15

to

If the processor didn't do prefetching, but fetched an instruction
(incrementing the PC), decoded it, then executed it, and when done,
fetched the next (and so on), no adjustment would be needed.

Since the ARM actually has fetched more data (and incremented the PC) by
the time the instruction is executed, if it just moved the now current
value of the PC, it would point past the point you really wanted to
return to (either requiring empty slots after every call or the code
needing to adjust the value of R14 before returning), so the processor
will automatically correct the value so it has the value of the PC just
after the instruction was fetch, and undoes the effect of the prefetch
that happened.

Dimiter_Popoff

unread,

Jul 30, 2015, 3:44:45 PM7/30/15

to

They clearly say the address is the correct one to return to; why do
they go on talking prefetch I don't know, perhaps someone who put
work into designing the core wrote the manual, too and got carried
away into details not needed in this context.

Dimiter

------------------------------------------------------
Dimiter Popoff, TGI http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/

Robert Wessel

unread,

Jul 30, 2015, 4:31:44 PM7/30/15

to

On Thu, 30 Jul 2015 22:44:47 +0300, Dimiter_Popoff <d...@tgi-sci.com>
wrote:

It's the way the ISA was designed - because of prefetching, PC is
always pointing 8 bytes ahead of where you are. So if you're using a
PC relative instruction, you have to compensate for that. In the case
of a subroutine call, they're just telling you they've backed out that
+8 in the saved return address.

Dimiter_Popoff

unread,

Jul 30, 2015, 5:22:25 PM7/30/15

to

I see, that +8 is quite common and they just remind you this is taken
into account in this case.
One must usually check how branch etc. PC relative offsets are to be
calculated and it differs on various processors.
I can see the point of the guy who made the power architecture; no
PC register is available, you have to do a linked call and use
the value in the LR... :-). Not an issue as one can do it once
and keep the absolute address in a GPR - doing so would be more
painful on ARM with its fewer GPRs.

Theo Markettos

unread,

Jul 31, 2015, 8:15:27 AM7/31/15

to

AFAIR, it's 'MOV rN, pc' that's affected.

If you do 'MOV pc, rN', you start executing at the instruction pointed to by
rN, not one or two after.

If you do 'BL label', r14 points to the instruction after the BL (pc+4), not
two after (pc+8).

label:
<code>
MOV pc,r14
and

label:
STMFD r13!,{r14} ; push r14
<code>
LDMFD r13!,{pc} ; pop old r14 and write to pc

are common subroutine paradigms, and they do what you'd expect.

Theo

James Harris

unread,

Sep 6, 2015, 6:52:25 AM9/6/15

to

"Theo Markettos" <theom...@chiark.greenend.org.uk> wrote in message
news:szd*2M...@news.chiark.greenend.org.uk...

> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>> Theo Markettos <theom...@chiark.greenend.org.uk> wrote:

...

>> > Any time you move the PC into another register, for instance the
>> > link
>> > register R14, what you actually get is the address of the current
>> > instruction plus 8. The reason for this dates back to the ARM1,
>> > which had a
>> > 3 stage pipeline, fetch-decode-execute. When you executed the
>> > move, the
>> > instruction fetch stage was already two instructions further on.

...

>> > A bit like the branch delay slot on MIPS, exposure of this
>> > microarchitectural artifact to the ISA has meant that all 32 bit
>> > ARMs use
>> > this current+8, even though they don't have 3 stage pipelines any
>> > more.
>>
>> Is there a compensating branch instruction?
>
> AFAIR, it's 'MOV rN, pc' that's affected.

Yes, and AIUI that can be a benefit. It allows the program to use
various types of branch instruction while still setting the return
address properly. For example,

mov r14, pc
bx somewhere

or

mov r14, pc
ldr pc, =target

In each case R14 will hold the address *after* the branch, not the
address of the branch. When I first saw that I thought it was weird. It
still looks odd to someone like me who is used to PC pointing to the
next instruction.

James