Need RISCV linker to handle JAL offsets

345 views
Skip to first unread message

Phil Wright

unread,
Nov 15, 2016, 11:04:17 PM11/15/16
to RISC-V SW Dev
The following RISC-V assembly code (RV32) is used to show the problem...

    start:    jal end
    end:      jal start


I invoke the assembler using the following simple command...

    riscv32-unknown-elf-as -m32 example.s -o example.o


To check the generated code I disassemble it...

    riscv32-unknown-eft-objdump -D example.o


Which gives the following output...

    00000000 <start>:
        0:  004000ef    jal 4 <end>
    00000004 <end>
        4:  ffdff0ef    jal 0 <start>


The first 'jal' has correctly indicated that it needs to add 4 to the PC in order to jump to the following line, which is address 4. (Note: the odd layout of the immediate value for the jal instruction means the actual instruction encodes 2 which is then multiplied at the CPU by 2 to get the actual offset of 4). The second 'jal' has -2 as the offset. Again, with the CPU multiplying by 2 we can -4 as the offset.

I actually want to generate a raw output file that contains no ELF information and is made up of just 8 bytes, the 8 bytes that make up the two instructions. I am running directly against a microcontroller and that microcontroller is not running an operating system. I want the output to be flashed to non-volatile RAM which then executes at reset.

So I use the following linker command to generate the binary output...

    riscv32-unknown-elf-ld --oformat=binary example.o -o example


But this seems to have lost the relative addressing values because using the following command to look at the generate output bytes...

    xxd example


...gives the following result...

    00000000:  ef00 0000 eff0 ffff


Taking into account it is little endian it means that each set of four bytes is in reverse order compared to the disassembly seen earlier. We can see the first jump 000000ef has lost its jump offset. The second jump fffff0ef is also different from before and after applying twos compliment we get -1, which is definitely wrong!

Any ideas how the jump offsets have been corrupted? Is there some extra option I need to specify to the linker to handle offsets correctly? I cannot find anything obvious and as a beginner to Linux and GNU I am stuck.

Thanks.

Arthur Jones

unread,
Nov 15, 2016, 11:46:52 PM11/15/16
to Phil Wright, RISC-V SW Dev
Hi Phil, I don't know why happens, but it happened to me too (for a similar reason), here is how I work-around it:

$ riscv32-unknown-elf-objcopy -O binary example.o example.bin
$ xxd example.bin
00000000: ef00 4000 eff0 dfff                      ..@.....

Arthur


--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/26713317-209c-4c05-ac63-523ddb7d3a2e%40groups.riscv.org.

Stefan O'Rear

unread,
Nov 15, 2016, 11:46:57 PM11/15/16
to Phil Wright, RISC-V SW Dev
On Tue, Nov 15, 2016 at 8:04 PM, Phil Wright
<phil....@componentfactory.com> wrote:
> Which gives the following output...
>
> 00000000 <start>:
> 0: 004000ef jal 4 <end>
> 00000004 <end>
> 4: ffdff0ef jal 0 <start>

Two non-obvious things here. First the bytes are being displayed in
"reverse order" because RISC-V uses little-endian for instruction
words. Second RISC-V is a "RELA" target, which means that the 20-bit
immediate fields are *not used* by the linked; the object file
contains a separate "relocation" section which indicates which JAL
points to which label. "objdump -d -r" is what you should use and it
outputs this:

example.o: file format elf32-littleriscv


Disassembly of section .text:

00000000 <start>:
0: 004000ef jal 4 <end>
0: R_RISCV_JAL end

00000004 <end>:
4: ffdff0ef jal 0 <start>
4: R_RISCV_JAL start

> The first 'jal' has correctly indicated that it needs to add 4 to the PC in
> order to jump to the following line, which is address 4. (Note: the odd
> layout of the immediate value for the jal instruction means the actual
> instruction encodes 2 which is then multiplied at the CPU by 2 to get the
> actual offset of 4). The second 'jal' has -2 as the offset. Again, with the
> CPU multiplying by 2 we can -4 as the offset.

(This is all meaningless because of the RELA bit)

> I actually want to generate a raw output file that contains no ELF
> information and is made up of just 8 bytes, the 8 bytes that make up the two
> instructions. I am running directly against a microcontroller and that
> microcontroller is not running an operating system. I want the output to be
> flashed to non-volatile RAM which then executes at reset.

Yep, that's a valid use case

> So I use the following linker command to generate the binary output...
>
> riscv32-unknown-elf-ld --oformat=binary example.o -o example


> But this seems to have lost the relative addressing values because using the
> following command to look at the generate output bytes...
>
> xxd example
>
>
> ...gives the following result...
>
> 00000000: ef00 0000 eff0 ffff
>
>
> Taking into account it is little endian it means that each set of four bytes
> is in reverse order compared to the disassembly seen earlier. We can see the

There's an easier way:

# objdump -bbinary -mriscv:rv32 -D example

example: file format binary


Disassembly of section .data:

00000000 <.data>:
0: ef000000 jal 0x0
4: eff0ffff jal 0x2

(weird that when dumping a binary file objdump uses "big-endian" for
the opcodes.)

> first jump 000000ef has lost its jump offset. The second jump fffff0ef is
> also different from before and after applying twos compliment we get -1,
> which is definitely wrong!

> Any ideas how the jump offsets have been corrupted? Is there some extra

It doesn't work for me either, so you probably found a bug. Suggest
filing it on https://github.com/riscv/riscv-binutils-gdb/issues/new
be sure to specify that you are trying to create a binary file
directly from ld.

> option I need to specify to the linker to handle offsets correctly? I cannot

It looks like UCB/SiFive has been using ld with an ELF output followed
by objcopy, which works:

[root@sorear6 tmp]# cat example.lds
OUTPUT_ARCH( "riscv" )

SECTIONS
{
. = 0;
.text : { *(.text) }
}
[root@sorear6 tmp]# ld -m elf32lriscv -T example.lds example.o -o example
[root@sorear6 tmp]# objcopy -S -O binary example example.bin
[root@sorear6 tmp]# objdump -bbinary -mriscv:rv32 -D example.bin

example.bin: file format binary


Disassembly of section .data:

00000000 <.data>:
0: ef004000 jal 0x4
4: eff0dfff jal 0x0

> find anything obvious and as a beginner to Linux and GNU I am stuck.

What you're new to right now is the ELF toolchain and advanced linker
usage, which is fairly arcane even to most GNU/Linux developers. I've
*done this before* and I actually forgot the linker script syntax so I
copied the lds file from riscv-pk and deleted almost everything.

-s

Michael Clark

unread,
Nov 16, 2016, 12:02:30 AM11/16/16
to Stefan O'Rear, Phil Wright, RISC-V SW Dev

On 16 Nov 2016, at 5:46 PM, Stefan O'Rear <sor...@gmail.com> wrote:

Disassembly of section .data:

00000000 <.data>:
  0:   ef000000                jal     0x0
  4:   eff0ffff                jal     0x2

(weird that when dumping a binary file objdump uses "big-endian" for
the opcodes.)

objdump is showing “words” here so it is self-consistent.

You’ll notice that on little-endian targets with byte level opcodes that there are spaces between the “bytes”.

Yes, it must be a binutils bug. I typicall use the compiler to invoke the linker for some reason so have yet another alternative but it appears to suffer the same issue:


$ cat example.S 
.section .text
.globl _start
_start:

1:
    jal 2f
2:
    jal 1b
$ riscv64-unknown-elf-gcc -c example.S -o example.o
$ riscv64-unknown-elf-gcc -nostartfiles -Wl,--oformat=binary example.o -o example
$ xxd example
00000000: ef00 0000 eff0 ffff                      ........


Another approach may be to compile the binary normally and use opjcopy to copy out the binary section

$ riscv64-unknown-elf-gcc -nostartfiles example.o -o example
$ riscv64-unknown-elf-objcopy -O binary -j .text example  example.bin 

Stefan O'Rear

unread,
Nov 16, 2016, 12:05:39 AM11/16/16
to Michael Clark, Phil Wright, RISC-V SW Dev
On Tue, Nov 15, 2016 at 9:02 PM, Michael Clark <michae...@mac.com> wrote:
> objdump is showing “words” here so it is self-consistent.
>
> You’ll notice that on little-endian targets with byte level opcodes that
> there are spaces between the “bytes”.

The part I'm pointing out is that objdumping an ELF file uses
"little-endian words" and objdumping a binary file uses "big-endian
words". I don't know if that's an expected difference; naively I
think there's just one disassembler?

-s

Michael Clark

unread,
Nov 16, 2016, 12:22:51 AM11/16/16
to Stefan O'Rear, Phil Wright, RISC-V SW Dev
OK I see that now. Yes, that is inconsistent.

It is apparent that “words” don’t have endianness in hexadecimal whereas bytes, so the bug case seems to be with -bbinary -mriscv:rv32 (although it is subjective). hexidecimal words tend to have the MSN (Most Significant Nibble) on the Left. i.e. if it wants to should it in byte order, it should have spaces between the bytes.


$ riscv64-unknown-elf-objdump -d example

example:     file format elf64-littleriscv


Disassembly of section .text:

0000000000010000 <_ftext>:
   10000: 004000ef           jal 10004 <_ftext+0x4>
   10004: ffdff0ef           jal 10000 <_ftext>

$ riscv64-unknown-elf-objdump -bbinary -mriscv:rv32 -D example.bin 

example.bin:     file format binary


Disassembly of section .data:

00000000 <.data>:

Michael Clark

unread,
Nov 16, 2016, 12:33:38 AM11/16/16
to Stefan O'Rear, Phil Wright, RISC-V SW Dev

On 16 Nov 2016, at 6:22 PM, Michael Clark <michae...@mac.com> wrote:

It is apparent that “words” don’t have endianness in hexadecimal whereas bytes, so the bug case seems to be with -bbinary -mriscv:rv32 (although it is subjective). hexidecimal words tend to have the MSN (Most Significant Nibble) on the Left. i.e. if it wants to should it in byte order, it should have spaces between the bytes.


It is apparent that “words” don’t have endianness in hexadecimal whereas bytes do, so the bug case seems to be with -bbinary -mriscv:rv32 (although it is subjective). hexadecimal words tend to have the MSN (Most Significant Nibble) on the Left. i.e. if it wants to be consistent, natural byte order should have spaces between bytes and words should be in the natural order for words.

apologies for my dyslexia. it’s absolutely terrible. only half of a hemisphere of my brain actually works (right hemisphere). i am blind in one eye.

Phil Wright

unread,
Nov 16, 2016, 12:44:21 AM11/16/16
to RISC-V SW Dev, sor...@gmail.com, phil....@componentfactory.com
Thanks for the feedback.

As suggested, using objcopy with the output as binary gives the correct output.
I would never have worked that out myself.
I will file a bug report for linker.
Reply all
Reply to author
Forward
0 new messages