[RFC] Proposal: ELF attributes for RISC-V

691 views
Skip to first unread message

Kito Cheng

unread,
Mar 26, 2018, 11:11:16 PM3/26/18
to RISC-V SW Dev (sw-dev@groups.riscv.org)
​# ​
What’s an ELF attributes section?

​An ELF attributes section is a special section in an ELF object file to record
information that a linker or runtime loader needs to check the compatibility. It was originally proposed by ARM[1], and has been widely used in many other targets (e.g. MIPS, PPC, Sparc, s390, TI-C6x and MSP430). Each attribute in this section can be an integer, a string, or a combination of both, according to the specifications below:

  - Integer values are encoded in the uleb128[2] format. Although in theory the format can represent unsigned integers of arbitrary length, it usually denotes 32-bit integers in an ELF object file due to implementation limitation in binutils.
  - String values are encoded using NUL-terminated byte strings.
  - There is no limitation on the number of attributes.


# ​Why do we need an ELF attributes section?​

Currently, the RISC-V instructions used in an object/executable file are unknown to programmers until they incur an exception during execution. There is a need to check the compatibility of object files beforehand.

The existing e_flags in ELF objects, though holding processor-specific flags, consists of only 32 bits and is insufficient to record all information required by a linker. Besides, e_flags defines ABI-level information instead of ISA-level. That is, it doesn’t help to determine an object that contains only soft floating point ABI and has hardware floating point instructions.

Therefore, we propose to use ELF attributes to encode the information a linker or runtime loader needs. ELF attributes are capable of recording lots of information from ISA version to the subset in use. As for implementation, the porting to RISC-V is only needed as all mainstream tools, including LLVM and binutils, have had supported the use of ELF attributes.​


Tag Name

Type

Description

Tag_arch

String

Denotes the target architecture specified by the compiler option “-march”. The default values are RV32G or RV64G.

Tag_priv_spec

Integer

Denotes the version of the privileged specification.

Tag_strict_align

Integer

Denotes if strict alignment is required for memory accesses in code generation.

Tag_stack_align

Integer

Denotes the stack alignment requirement (in bytes) that is specified by the option “-mpreferred-stack-boundary”. The default values are 16-byte aligned for RV32I/RV64I ABI and 4-byte aligned for RV32E.

Tag_A_ext

Integer + String

Denotes the version of the standard extension A.

Tag_C_ext

Integer + String

Denotes the version of the standard extension C.

Tag_D_ext

Integer + String

Denotes the version of the standard extension D.

Tag_E_ext

Integer + String

Denotes the version of the standard extension E.

Tag_F_ext

Integer + String

Denotes the version of the standard extension F.

Tag_I_ext

Integer + String

Denotes the version of the standard extension I.

Tag_M_ext

Integer + String

Denotes the version of the standard extension M.

Tag_Q_ext

Integer + String

Denotes the version of the standard extension Q.

Tag_X_ext

String

Denotes information about non-standard extensions if there are any.


​More detail in github:



Richard W.M. Jones

unread,
Mar 27, 2018, 9:15:47 AM3/27/18
to Kito Cheng, RISC-V SW Dev (sw-dev@groups.riscv.org)
On Tue, Mar 27, 2018 at 11:10:53AM +0800, Kito Cheng wrote:
> *Tag Name*
>
> *Type*
>
> *Description*
>
> Tag_arch
>
> String
>
> Denotes the target architecture specified by the compiler option “-march”.
> The default values are RV32G or RV64G.

This duplicates information in the ELF header, but to what end? I'm
guessing that other strings might be included here (eg. "RV32IMC"),
but what could those be, and wouldn't those conflict with (eg) the
Tag_C_ext flag? Or are the Tag_*_ext flags only meant to indicate
version number and not whether the extension is needed?

> Tag_priv_spec
>
> Integer
>
> Denotes the version of the privileged specification.

Should a userspace binary care?

> Tag_strict_align
>
> Integer
>
> Denotes if strict alignment is required for memory accesses in code
> generation.

I'm not sure I understand this one. What is "strict alignment" in the
sense used here?

> Tag_stack_align
>
> Integer
>
> Denotes the stack alignment requirement (in bytes) that is specified by the
> option “-mpreferred-stack-boundary”. The default values are 16-byte aligned
> for RV32I/RV64I ABI and 4-byte aligned for RV32E.

AIUI those are part of the ABI, so all binaries will require this
alignment. It doesn't seem to be worth noting since any binary with
different alignment could never be run.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Richard W.M. Jones

unread,
Mar 27, 2018, 9:41:06 AM3/27/18
to Kito Cheng, RISC-V SW Dev (sw-dev@groups.riscv.org)
I had a chat with some other developers here, and I think the general
consensus is that this is wrong on several levels, although what I'm
going to say only applies to general purpose 64 bit Linux for servers
and desktops, not to very specialized embedded uses.

* Try to avoid the mistakes of ARM of having multiple incompatible
sub-variants of the architecture.

* For Fedora we will adopt a baseline (RV64GC most likely) and
everything else will be handled at runtime via multi-versioning,
ifuncs and similar mechanisms.

* As time passes we may move the baseline requirement forward,
deprecating and eventually dropping support for hardware below the
baseline.

We think that there is only a superficial relationship between the
instructions emitted in a translation unit (TU) and instructions that
might be used at runtime. For example a TU might provide multiple
versions of functions, optimized for different extensions, or
instruction scheduling / microarchitecture, and then could choose the
right one to use at run time (eg. using the ifunc mechanism or
something else). As these ELF attributes are not rich enough to
express this and probably can never be, they are not helpful.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Alex Bradbury

unread,
Mar 27, 2018, 10:31:06 AM3/27/18
to Richard W.M. Jones, Kito Cheng, RISC-V SW Dev (sw-dev@groups.riscv.org)
On 27 March 2018 at 14:41, Richard W.M. Jones <rjo...@redhat.com> wrote:
> I had a chat with some other developers here, and I think the general
> consensus is that this is wrong on several levels, although what I'm
> going to say only applies to general purpose 64 bit Linux for servers
> and desktops, not to very specialized embedded uses.
>
> * Try to avoid the mistakes of ARM of having multiple incompatible
> sub-variants of the architecture.
>
> * For Fedora we will adopt a baseline (RV64GC most likely) and
> everything else will be handled at runtime via multi-versioning,
> ifuncs and similar mechanisms.
>
> * As time passes we may move the baseline requirement forward,
> deprecating and eventually dropping support for hardware below the
> baseline.
>
> We think that there is only a superficial relationship between the
> instructions emitted in a translation unit (TU) and instructions that
> might be used at runtime. For example a TU might provide multiple
> versions of functions, optimized for different extensions, or
> instruction scheduling / microarchitecture, and then could choose the
> right one to use at run time (eg. using the ifunc mechanism or
> something else). As these ELF attributes are not rich enough to
> express this and probably can never be, they are not helpful.

I fully agree that your last paragraph identifies the biggest
unaddressed issue in this proposal.

I take your point that ELF attributes may have very limited use for a
Linux-capable targets where the baseline target ISA can be safely
guessed, but RISC-V of course addresses a much wider market than that.

Kito: describing ELF attributes as being used to "check the
compatibility" of different objects might be misleading. Although I
might want a 'strict mode' that errors out if linking objects with
different metadata (I know developers of a popular RTOS were
prototyping such a mechanism for other architectures as an early
warning sign for build system problems that might lead to obscure
runtime failures), in general it's not true that e.g. a -march=rv32im
ELF is not compatible with a -march=rv32i ELF.

The problem that this proposal aims to solve, as I see it, is making
RISC-V ELF objects self-describing. This is useful for something as
trivial as objdump. The current GNU objdump will attempt to
disassemble any instruction that matches one described in the current
ISA specs, with no actual knowledge of whether the object was compiled
targeting that ISA extension or not. Non-standard extensions are of
course free to conflict with encoding space used by standard
extensions, meaning this behaviour would have to be overridden. It
would be useful for the disassembler to 'do the right thing' without
additional intervention.

It's worth noting that the Arm BuildAttributes spec supports
attributes having narrower scope than the whole file.

Best,

Alex

Alex Bradbury

unread,
Mar 27, 2018, 10:47:41 AM3/27/18
to Kito Cheng, RISC-V SW Dev (sw-dev@groups.riscv.org)
On 27 March 2018 at 04:10, Kito Cheng <kito....@gmail.com> wrote:
>
> #
> What’s an ELF attributes section?
>
> An ELF attributes section is a special section in an ELF object file to record
> information that a linker or runtime loader needs to check the compatibility. It was originally proposed by ARM[1], and has been widely used in many other targets (e.g. MIPS, PPC, Sparc, s390, TI-C6x and MSP430). Each attribute in this section can be an integer, a string, or a combination of both, according to the specifications below:
>
> - Integer values are encoded in the uleb128[2] format. Although in theory the format can represent unsigned integers of arbitrary length, it usually denotes 32-bit integers in an ELF object file due to implementation limitation in binutils.
> - String values are encoded using NUL-terminated byte strings.
> - There is no limitation on the number of attributes.
>
> [1] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0045e/IHI0045E_ABI_addenda.pdf
> [2] https://en.wikipedia.org/wiki/LEB128
>
> # Why do we need an ELF attributes section?
>
> Currently, the RISC-V instructions used in an object/executable file are unknown to programmers until they incur an exception during execution. There is a need to check the compatibility of object files beforehand.
>
> The existing e_flags in ELF objects, though holding processor-specific flags, consists of only 32 bits and is insufficient to record all information required by a linker. Besides, e_flags defines ABI-level information instead of ISA-level. That is, it doesn’t help to determine an object that contains only soft floating point ABI and has hardware floating point instructions.
>
> Therefore, we propose to use ELF attributes to encode the information a linker or runtime loader needs. ELF attributes are capable of recording lots of information from ISA version to the subset in use. As for implementation, the porting to RISC-V is only needed as all mainstream tools, including LLVM and binutils, have had supported the use of ELF attributes.

For describing the target ISA variant, an alternative to defining all
these tag types could just be to emit a canonical form of the -march
string with version information. The unprivileged spec does describe
how to give version information (section 22.4
https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf),
but neither GCC or Clang currently support parsing ISA strings given
in that form. e.g. RV64I2p0M2p0A2p0F2p0D2p0.

Best,

Alex

Alex Bradbury

unread,
Mar 27, 2018, 11:02:01 AM3/27/18
to Richard W.M. Jones, Kito Cheng, RISC-V SW Dev (sw-dev@groups.riscv.org)
I re-read, really it was my faulty for jumping from "compatibility" to
"link compatibility". Still, it might be worth being more explicit
e.g. "compatibility with the target execution environment or software
development tools".

To give another practical use case for embedding this sort of extra
metadata. Suppose the vector working group publish a draft '0.6'
version of the specification and support gets added to Spike. 0.7
might change in backwards incompatible ways and it would be very handy
for spike to complain when it sees a binary targeting a V ISA version
it doesn't recognise.

Alex

Kito Cheng

unread,
Mar 28, 2018, 12:39:51 AM3/28/18
to Richard W.M. Jones, RISC-V SW Dev (sw-dev@groups.riscv.org)
​​
Hi Richard:

Thanks your comment, ifunc is what I didn't consider before.

The goal of ELF attribute is let ELF object self-describe the
minimal requirement for the target execution environment,
so my thought is ifunc or any runtime multi-versioning code is
not minimal requirement, so those can be ignored when building
attributes.

However I think it might not useful to Fedora (or any another Linux
distribution for RISC-V desktops/Servers), but it could be useful
for embedded system, because multiple incompatible
sub-variants of the architecture is expectable in that area.

> This duplicates information in the ELF header, but to what end?  I'm
> guessing that other strings might be included here (eg. "RV32IMC"),
> but what could those be, and wouldn't those conflict with (eg) the
> Tag_C_ext flag?  Or are the Tag_*_ext flags only meant to indicate
> version number and not whether the extension is needed?

No, it's not duplicate info, because we can't determine an object didn't
use any F/D instructions even e_flags indicated it using soft-float abi,
but attribute can.

>> Tag_priv_spec
Should a userspace binary care?

Privileged specification defined the CSR numbers , and it may change
between different version. 

It's might not useful for a linux user space binary. but would be useful on
embedded system. 

>> Tag_strict_align
> I'm not sure I understand this one.  What is "strict alignment"
> in the sense used here?

The naming is because here is a corresponding options in risc-v gcc,
-m[no-]strict-align to determine compiler can generate mis-aligned
load/store or not.

>> Tag_stack_align
>
> AIUI those are part of the ABI, so all binaries will require this
> alignment.  It doesn't seem to be worth noting since any binary with
> different alignment could never be run.

It defined in ABI, but it could change by -mpreferred-stack-boundary
option in risc-v gcc, in case we hit an strange mis-align issue, then it
would be useful during diagnose the binary .




--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/20180327134101.GZ6044%40redhat.com.

Kito Cheng

unread,
Mar 28, 2018, 12:40:08 AM3/28/18
to Alex Bradbury, Richard W.M. Jones, RISC-V SW Dev (sw-dev@groups.riscv.org)
Hi Alex:

Yeah, target execution environment and toolchain issue is the main issue
I want to handle by ELF attribute.

> For describing the target ISA variant, an alternative to defining all
> these tag types could just be to emit a canonical form of the -march
> string with version information. The unprivileged spec does describe
> how to give version information (section 22.4
https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf),
> but neither GCC or Clang currently support parsing ISA strings given
> in that form. e.g. RV64I2p0M2p0A2p0F2p0D2p0.

I had consider that, but it seems hard to read, especially when we consider
the proposal of "Expanding RISC-V instruction-set namespace"[1].


in general it's not true that e.g. a -march=rv32im
ELF is not compatible with a -march=rv32i ELF.

I think we can define several merge policy during merging/linking
different -march objects, and the default is compatible mode
it will merge into a super-set between all inputs [2].


> I re-read, really it was my faulty for jumping from "compatibility" to
> "link compatibility". Still, it might be worth being more explicit
> e.g. "compatibility with the target execution environment or software
> development tools".

I agree `compatibility` is not clear enough, thanks your input :)

> To give another practical use case for embedding this sort of extra
> metadata. Suppose the vector working group publish a draft '0.6'
> version of the specification and support gets added to Spike. 0.7
> might change in backwards incompatible ways and it would be very handy
> for spike to complain when it sees a binary targeting a V ISA version
> it doesn't recognise.

Thanks for giving such example, that's the problem I want to resolve.

Alex Bradbury

unread,
Mar 28, 2018, 5:06:39 AM3/28/18
to Kito Cheng, Richard W.M. Jones, RISC-V SW Dev (sw-dev@groups.riscv.org)
On 28 March 2018 at 05:39, Kito Cheng <kito....@gmail.com> wrote:
> Hi Alex:
>
> Yeah, target execution environment and toolchain issue is the main issue
> I want to handle by ELF attribute.
>
>> For describing the target ISA variant, an alternative to defining all
>> these tag types could just be to emit a canonical form of the -march
>> string with version information. The unprivileged spec does describe
>> how to give version information (section 22.4
>> https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf),
>> but neither GCC or Clang currently support parsing ISA strings given
>> in that form. e.g. RV64I2p0M2p0A2p0F2p0D2p0.
>
> I had consider that, but it seems hard to read, especially when we consider
> the proposal of "Expanding RISC-V instruction-set namespace"[1].
>
> [1]
> https://groups.google.com/a/groups.riscv.org/forum/#!msg/isa-dev/DgZ1sS_OP-U/uQKNXgWZAgAJ

Thanks for reminding me of that proposal. I agree that
RV64I2p0M2p0A2p0F2p0D2p0 isn't pleasant to read. But it is
standardised and trivially extensible. I have just proposed that _
should be allowed as a separator for standard extensions
(https://github.com/riscv/riscv-isa-manual/issues/151) which would
allow RV64_I2p0_M2p0_A2p0_F2p0_D2p0. A tool could always pretty print
this is a nicer way. Given that a tool would need to parse the ISA
string format for non-standard extensions anyway, I'm not really sure
I see the benefit of separating out the standard extensions.

Additionally, how do you envision the subarch string for each standard
extension being used?

Best,

Alex

Richard W.M. Jones

unread,
Mar 28, 2018, 7:29:39 AM3/28/18
to Kito Cheng, RISC-V SW Dev (sw-dev@groups.riscv.org)
On Wed, Mar 28, 2018 at 12:39:28PM +0800, Kito Cheng wrote:
> Privileged specification defined the CSR numbers , and it may change
> between different version.

I really hope not, else RISC-V will never be able to make
it in the server space.

> >> Tag_strict_align
> > I'm not sure I understand this one. What is "strict alignment"
> > in the sense used here?
>
> The naming is because here is a corresponding options in risc-v gcc,
> -m[no-]strict-align to determine compiler can generate mis-aligned
> load/store or not.
>
> >> Tag_stack_align
> >
> > AIUI those are part of the ABI, so all binaries will require this
> > alignment. It doesn't seem to be worth noting since any binary with
> > different alignment could never be run.
>
> It defined in ABI, but it could change by -mpreferred-stack-boundary
> option in risc-v gcc, in case we hit an strange mis-align issue, then it
> would be useful during diagnose the binary .

These don't seem useful for the Linux server/desktop use, perhaps for
embedded. I think in general it's a good idea to define an ABI and
stick to it, rather than allowing weird ABI variations that will
doubtless cause endless trouble.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
Reply all
Reply to author
Forward
0 new messages