[RFC] Proposal for new ELF extension - "Symbol meta-information"

75 views
Skip to first unread message

Jozef Lawrynowicz

unread,
Aug 27, 2020, 7:48:12 AM8/27/20
to Generic System V Application Binary Interface
Hi,

I'd like to propose an ELF extension named "Symbol Meta-Information", which is a new mechanism used to describe additional information about symbols.

I've attached a nicely formatted PDF version of the full proposal. The specific implementation details which would augment the existing ELF spec is also attached.

Alternatively, HTML versions are available here:

Below is also a plain-text version of the documents, although it might not be as pleasant to read as the other versions. 

I look forward to hearing your thoughts,
Jozef Lawrynowicz

-----------------------------

ELF Symbol Meta-Information

Developed by Todd Snider (Texas Instruments)
in consultation with Jozef Lawrynowicz

Written by Jozef Lawrynowicz

August 2020

Table of Contents

1 Introduction
2 Background
  2.1 Motivation
  2.2 Alternative vehicles for symbol meta-information implementation
3 Design
  Abbreviations
  3.1 Symbol Meta-Information Table
  3.2 Symbol Meta-Information Table Entries
  3.3 Symbol Meta-Information Values
    3.3.1 Restrictions on applying symbol meta-information types to symbols
    3.3.2 SMT_NOINIT use case
    3.3.3 SMT_PRINTF_FMT use case
    3.3.4 Considerations for placement of SMT_LOCATION meta-information symbols (locsyms)
    3.3.5 Initialization of locsyms at program startup
4 Using Symbol Meta-Information
  4.1 Usage example
5 Conclusion
  5.1 Symbol meta-information benefits
  5.2 Symbol meta-information as an extension to the ELF gABI


1 Introduction

Here we propose a new mechanism for describing additional 
information about ELF symbols, called Symbol Meta-Information.

Symbol Meta-Information is intended to solve the problem of how 
the compiler or assembler can communicate information about 
symbols, not supported by existing ELF constructs, to downstream 
tools such as the linker and other consumers of ELF files. These 
consumers can then change how they handle the symbols, based on 
the supplementary information.

A new ELF special section named .symtab_meta enumerates which 
symbols have meta-information, the type of meta-information, and 
the associated value of that meta-information.

The use of attributes set on symbol declarations in the source 
code provides the programmer with a simple interface to the new 
functionality.

Symbol meta-information is designed to be extensible, with plenty 
of room for new types of meta-information to be added, and 
flexible, as the value of the meta-information can take on any 
format.

2 Background

2.1 Motivation

The modular nature of toolchain components means that 
communicating information from the source code through the build 
process to downstream tools is not always straightforward. Of 
course, this is partly why formats like ELF exist, but when those 
formats are reaching the limit of information that is able to be 
precisely described by them, programmers search for alternative 
solutions.

Placing code and data into special named sections is the most 
common method used to make the linker handle specific symbols in 
some non-standard way. A modified linker script with knowledge of 
these special sections can then be used to apply specific 
properties to the sections, such as saving them from garbage 
collection or placing them at specific memory addresses.

However, it can be inconvenient for programmers to modify linker 
scripts:

• Entire applications can be written without consideration for 
  the linker script, its existence perhaps acknowledged by the 
  programmer but otherwise being an opaque part of the build 
  process. The programmer may therefore lack knowledge of the 
  syntax of the linker script, or the ability to leverage the 
  full breadth of functionality available to achieve what they 
  want.

• In the context of embedded microcontrollers, linker scripts 
  provided by semiconductor manufacturers are usually specific to 
  a particular device, describing a unique combination of the 
  memory map, peripheral register addresses, vector table etc.

  – Modifying linker scripts can therefore be bothersome when an 
    application targets different devices, each with a unique 
    linker script, or when linker script updates from 
    semiconductor manufacturers require merging of downstream and 
    upstream changes.

• Linker scripts can have a large amount of boilerplate code, and 
  modifications to this boilerplate, as a side-effect to the 
  handling of any new special sections, can be error-prone.

Another way to supply additional information about a symbol is to 
give the symbol itself a special name. This requires the ELF file 
consumer program to have knowledge of the special name, and may 
not be desirable if it interferes with the way the symbol would 
be handled if it had its original name. Furthermore, since there 
is no opportunity in the gABI for the standardization of special 
names for code and data symbols to have some unique meaning, 
there is likely to be inconsistencies between processor and 
vendor support for any toolchains trying to make use of this 
mechanism.

2.2 Alternative vehicles for symbol meta-information 
  implementation

We acknowledge some existing constructs which could be used to 
supply additional information about ELF symbols, and describe why 
they are unsuitable vehicles for the proposed symbol 
meta-information functionality.

New symbol types or bindings

• If a type of symbol meta-information implied only one existing 
  symbol type or binding attribute, then the meta-information 
  type could be implemented as a new type or binding. However, 
  since the proposed symbol meta-information types support 
  symbols with different types and different bindings, this 
  approach would not work.

• There are only 3 remaining “slots” for generic symbol types and 
  it is desirable to have more than 3 new types of symbol 
  meta-information. There are further reserved ranges for 
  operating system-specific and processor-specific types, but it 
  would not be appropriate to use these for new types which have 
  generic use.

• Fundamentally, symbol meta-information supplies additional 
  information about symbols, and does not change the intrinsic 
  type or binding of a symbol.

st_other member of symbol table entry

• st_other is only 8 bits in size and is used as a bit-mask. Bits 
  0 and 1 are reserved, with an additional proposal currently 
  pending to reserve bit 2 as well. The remaining bits 3-7 have 
  not been officially reserved but are all in use by a variety of 
  targets. Therefore, there are no remaining bits which can be 
  used without creating a conflict with some target or operating 
  system.

• There is no standard way to provide supplemental information 
  which gives a non-boolean value for the st_other field. Further 
  modifications, such as the creation of a special section, would 
  be required to provide non-boolean values to accompany the 
  st_other value.

Solaris SymInfo

• Solaris SymInfo specifically targets dynamic symbols, and the 
  proposed functionality should be available to targets which do 
  not support the concept of dynamic linking. SymInfo “types” are 
  flags that can be augmented by extracting a value from the 
  .dynamic section.

  – The .dynamic section is identified by the sh_info field of 
    the section header, and could arguably be repurposed to point 
    to some other section in cases when there are no dynamic 
    symbols with SymInfo entries. However, this behavior would 
    not be well defined when there is also a .dynamic section in 
    the file.

• The si_flags field, which describes the properties of the 
  associated symbol, is the size of a half-word. On a target 
  implementing 32-bit ELF, this would be 16-bits. Since the flags 
  are implemented as a bit-mask with 10 types already 
  implemented, there only remains space for 6 further types. This 
  is unlikely to be enough room for all current and future 
  meta-information types, especially once factoring in any 
  additional vendor or processor-specific extensions.

New ABI-mandated “Special Sections”

• A new type of ELF “special section” could be created for each 
  of the proposed new types of symbol meta-information. ELF file 
  consumers such as the linker would then handle these sections 
  in a specific way, without assistance from the linker script. 
  However, this has some downsides:

  – The user may not want to put a symbol in it's own section 
    just to make use of the desired functionality.

  – A special section for the symbol obscures the fact that the 
    meta-information is for a symbol, not a section.

  – If the sh_info member is used to provide an accompanying 
    value for the meta-information type, then only one value can 
    be specified per section, meaning symbols with the same type 
    might not be able to be grouped together in a section.

  – An application making use of a large amount of new special 
    sections to describe symbol meta-information could pollute 
    the section header table.

3 Design

Abbreviations
metasym
  Any type of meta-information symbol
locsym
  A meta-information symbol with type SMT_LOCATION

3.1 Symbol Meta-Information Table

ELF relocatable and executable files may contain a new section 
named .symtab_meta. This section can be omitted from ELF files if 
there is no meta-information for any symbols, but if present, 
there can only be one section with this name and type.

Table 1: 
Section types, sh_type
+------------------+-------+
|      Name        | Value |
+------------------+-------+
+------------------+-------+
| SHT_SYMTAB_META  |  19   |
+------------------+-------+

Table 2: 
sh_link and sh_info interpretation
+------------------+-----------------------------------------------------------+----------------------------------+
|      Name        |    sh_link              |                         sh_info                                    |
+------------------+-----------------------------------------------------------+----------------------------------+
+------------------+-----------------------------------------------------------+----------------------------------+
| SHT_SYMTAB_META  | The section header      | The format version number of the symbol meta-information table     |
|                  | index of the associated | (ELFxx_SMH_VER), and the section header index of the .strtab_meta  |
|                  | symbol table.           | string table used by entries in this section (ELFxx_SMH_STR).      |
+------------------+-----------------------------------------------------------+----------------------------------+

Sub-Table a: 
Accessors for the sh_info field
---
#define ELF32_SMH_STR(i)    ((i)>>8)
#define ELF32_SMH_VER(i)    ((unsigned char)(i))
#define ELF32_SMH_INFO(s,v) (((s)<<8)+(unsigned char)(v))

#define ELF64_SMH_STR(i)    ((i)>>32)
#define ELF64_SMH_VER(i)    ((i)&0xffffffffL)
#define ELF64_SMH_INFO(s,v) (((s)<<32)+((v)&0xffffffffL))


Sub-Table b: 
.symtab_meta versions
+--------+-------------------------------------------------------------------------------+
| Value  |                                    Meaning                                    |
+--------+-------------------------------------------------------------------------------+
+--------+-------------------------------------------------------------------------------+
|   0    |                                Invalid Version                                |
+--------+-------------------------------------------------------------------------------+
|   1    |             There is no header at the beginning of .symtab_meta.              |
+--------+-------------------------------------------------------------------------------+
|   2    | A header containing the hash of .symtab is at the beginning of                |
|        | .symtab_meta.                                                                 |
+--------+-------------------------------------------------------------------------------+


Table 3: 
Special Sections
+---------------+------------------+------------+
|     Name      |      Type        | Attributes |
+---------------+------------------+------------+
+---------------+------------------+------------+
| .symtab_meta  | SHT_SYMTAB_META  |    None    |
+---------------+------------------+------------+
| .strtab_meta  |   SHT_STRTAB     |    None    |
+---------------+------------------+------------+

Version 2 of the table has a short header, and a list of symbol 
meta-information entries follows.

(
typedef struct {
  unsigned char symtab_hash[20];
} Elf32_SMhdr;

typedef struct {
  unsigned char symtab_hash[20];
} Elf64_SMhdr;
)


symtab_hash
  For version >= 2, a 20-byte SHA-1 hash of the 
  entire contents of .symtab (taken once the symbol table indices 
  have been finalized) is used to verify .symtab has not been 
  modified by tools which do not recognize .symtab_meta. These 
  tools would not update the symbol index stored in the symbol 
  meta-information table entry when making changes to the 
  program, possibly corrupting the state of .symtab_meta.

----
3.2 Symbol Meta-Information Table Entries
----

Symbol meta-information table entries describe the symbol that 
the meta-information applies to, the type of meta-information, 
and the associated value of the meta-information.

The format of symbol meta-information table entries is physically 
identical to ELF Rel relocation entries. The smi_info field 
encodes the symbol table index of the corresponding symbol and 
the type of meta-information in the same way that the symbol 
table index and type of a relocation are encoded in the r_info 
field of relocation entries.

Figure 1: 
Structure of a .symtab_meta entry
(
typedef struct {
  Elf32_Addr smi_info;
  Elf32_Word smi_value;
} Elf32_SymMetaInfo;

typedef struct {
  Elf64_Addr  smi_info;
  Elf64_Xword smi_value;
} Elf64_SymMetaInfo;
)

smi_info
  This field describes both the symbol table index of 
  the ELF symbol this symbol meta-information this applies to, 
  and the type of meta-information entry this is. A number of 
  generic types are pre-defined. There are also reserved ranges 
  for processor-specific and application-specific (i.e. 
  vendor-specific) types.

Figure 2: 
Accessors for the smi_info field
(
#define ELF32_SMI_SYM(i)    ((i)>>8)
#define ELF32_SMI_TYPE(i)   ((unsigned char)(i))
#define ELF32_SMI_INFO(s,t) (((s)<<8)+(unsigned char)(t))

#define ELF64_SMI_SYM(i)    ((i)>>32)
#define ELF64_SMI_TYPE(i)   ((i)&0xffffffffL)
#define ELF64_SMI_INFO(s,t) (((s)<<32)+((t)&0xffffffffL))
)

smi_value
  The interpretation depends on the associated type. 
  The value could be interpreted as a boolean, symbol table 
  index, address, string table index etc.


Figure 3: 
Symbol Meta-Information Types
+--------+-----------------+--------------------+
| Value  |      Type       |  Format of Value   |
+--------+-----------------+--------------------+
+--------+-----------------+--------------------+
|   0    |    SMT_NONE     |        None        |
+--------+-----------------+--------------------+
|   1    |   SMT_RETAIN    |      Boolean       |
+--------+-----------------+--------------------+
|   2    |  SMT_LOCATION   |      Address       |
+--------+-----------------+--------------------+
|   3    |   SMT_NOINIT    |      Boolean       |
+--------+-----------------+--------------------+
|   4    | SMT_PRINTF_FMT  |      Integer       |
+--------+-----------------+--------------------+
| 0xC0   |   SMT_LOPROC    |                    |
+--------+-----------------+ Processor-specific |
| 0xDF   |   SMT_HIPROC    |                    |
+--------+-----------------+--------------------+
| 0xE0   |   SMT_LOUSER    |                    |
+--------+-----------------+ Vendor-specific    |
| 0xFF   |   SMT_HIUSER    |                    |
+--------+-----------------+--------------------+


SMT_NONE
  This indicates an invalid or incomplete entry.

SMT_RETAIN
  A value of 1 indicates the associated symbol should 
  be retained in the output executable file, even it appears 
  unused and so the linker would normally garbage collect it. 
  Other values result in the type being ignored.

SMT_LOCATION
  The VMA of the associated symbol in the output 
  executable file should be set to the specified the value.

SMT_NOINIT
  A value of 1 indicates the associated data symbol 
  should not be initialized by the runtime support code at 
  program startup. Other values result in the type being ignored.

SMT_PRINTF_FMT
  The value indicates a byte offset into the 
  .strtab_meta section. The section header table index of 
  .strtab_meta is extracted from the sh_info value of 
  .symtab_meta, using the ELFxx_SMH_STR accessor.
  The null-terminated string extracted from the string table is a 
  de-duplicated list of format specifiers used by calls to 
  printf-like functions, in the function whose symbol is pointed 
  to by this entry.
  For example, the following C code:
    printf (“%d / %d = %f\n”, ...);
  would generate the following string in .strtab_meta:
    “%d%f”.

SMT_LOPROC..SMT_HIPROC
  Values in this range are reserved for 
  processor-specific semantics.

SMT_LOUSER..SMT_HIUSER
  Values in this range are reserved for 
  vendor-specific semantics.

----
3.3 Symbol Meta-Information Values
----

3.3.1 Restrictions on applying symbol meta-information types to 
  symbols

Symbol meta-information entries are always tied to a symbol in 
the symbol table, so there are no special rules regarding 
different symbols with the same name; the standard symbol binding 
rules apply.

No two entries in .symtab_meta can have the same smi_info value - 
each symbol must only have one value for a given meta-information 
type.

Figure 4: 
Symbol bindings and types permitted for metasyms
+-------------------------------+---------------------------+--------------------------------------+
| Symbol Meta-Information Type  | Permitted Symbol Binding  |        Permitted Symbol Type         |
+-------------------------------+---------------------------+--------------------------------------+
+-------------------------------+---------------------------+--------------------------------------+
|          SMT_RETAIN           |      Any < STB_LOOS       |      STT_FUNC                        |
|                               |                           |      or STT_OBJECT                   |
|                               |                           |      or STT_COMMON                   |
+-------------------------------+---------------------------+--------------------------------------+
|         SMT_LOCATION          |      Any < STB_LOOS       |      STT_FUNC                        |
|                               |                           |      or STT_OBJECT                   |
|                               |                           |      or STT_COMMON                   |
+-------------------------------+---------------------------+--------------------------------------+
|          SMT_NOINIT           |      Any < STB_LOOS       |       STT_OBJECT                     |
|                               |                           |      or STT_COMMON                   |
+-------------------------------+---------------------------+--------------------------------------+
|        SMT_PRINTF_FMT         |      Any < STB_LOOS       |       STT_FUNC                       |
+-------------------------------+---------------------------+--------------------------------------+

3.3.2 SMT_NOINIT use case

When a piece of data is not initialized to a constant value, but 
does not need to be zero-initialized, SMT_NOINIT indicates that 
it can be skipped by runtime startup code that would normally 
initialize it, to save time when starting the program.

Alternatively, when a piece of data is initialized to a constant 
value when the program is loaded, but should not be 
re-initialized when the processor resets, SMT_NOINIT can also be 
applied.

3.3.3 SMT_PRINTF_FMT use case

When the size of an application is a concern to the programmer, 
limiting the format specifiers supported by printf-like functions 
can reduce the code and data usage of these functions in the 
application.

By storing the required format specifiers in the symbol 
meta-information table, the linker can examine each of the 
SMT_PRINTF_FMT entries for functions that will be used in the 
final linked executable, and link in the minimal implementation 
of the printf function required to support all the format 
specifiers used by the application.

3.3.4 Considerations for placement of SMT_LOCATION 
  meta-information symbols (locsyms)

Locsyms are intended to augment a well-defined linker script. The 
linker validates the address provided for the locsym by examining 
the permissions of the segment (p_flags) which contains the 
specified VMA. For example, the linker must ensure that a locsym 
for a read/write symbol with type STT_OBJECT is not placed in a 
segment without write (PF_W) permissions, and emit an error if 
the segment containing the address is invalid.

The linker may need to place the input section of a locsym within 
an output section, within which it would not normally be placed. 
For example, consider an application with a large .text output 
section, which spans most of ROM. If a locsym corresponding to a 
piece of read-only data has an address within range of that .text 
section, and there is no way to offset the .text section within 
ROM such that the read-only data can be placed directly at the 
location, that read-only data can be placed amongst the .text 
input sections at the requested address. As long as the output 
section flags are not changed by adding the new input section, 
there should not be any problems mixing sections in this way.

3.3.5 Initialization of locsyms at program startup

Data which requires initialization at program startup (e.g. 
copying data from their LMA to VMA) has long been handled by the 
associated runtime library. When all data requiring 
initialization is within a range of addresses defined by known 
__*_start and __*_end symbols, only a fixed number of 
target-dependent initialization functions need to be run. 
However, when code and data can reside alone at disparate 
locations in memory, there must be a mechanism to initialize each 
of these as required. The procedure for initializing this data is 
not enforced by this ABI. It is expected that an entry in 
.init_array is created for a function which will run through 
entries in a table describing how to copy data or initialize 
variables as required.

Note that this functionality can be leveraged to easily allow 
functions to be executed from a memory region without persistent 
storage e.g. RAM. When the linker sees that the segment 
containing the VMA of the function has a different LMA and VMA, a 
copy table entry is created, and the runtime startup code will 
copy the contents of this function from the LMA to VMA, in the 
same way it would with a piece of data.

4 Using Symbol Meta-Information

4.1 Usage example

The programmer does not need to be aware of the symbol 
meta-information mechanism itself to be able to make use of the 
different types and apply special handling to symbols. An 
attribute set in the source code will cause the compiler to emit 
an assembler directive describing the meta-information, the 
assembler then creates the .symtab_meta section, which the linker 
absorbs, performs any required actions, and then outputs a new 
.symtab_meta section with all accumulated metasyms from input 
object files.

Figure 5: 
Example
Compiler source code:
[
uint16_t __attribute__((retain,location(0x1000)))
  core0_key = 0x1234;
]

Compiled assembly code:
[
        .global core0_key
        .type   core0_key, @object
        .sym_meta_info  core0_key, SMT_RETAIN, 1
        .sym_meta_info  core0_key, SMT_LOCATION, 0x1000
]

.symtab_meta dump from assembled object file:
[
SYMBOL META-INFORMATION TABLE:
Idx     Kind            Value       Sym idx Name
0:      SMT_RETAIN      0x1         7       core0_key
1:      SMT_LOCATION    0x1000      7       core0_key
]


5 Conclusion

5.1 Symbol meta-information benefits

Ease of use
  The application of an attribute to a symbol 
  declaration in the source code is now enough to achieve what 
  previously required both source code and linker script 
  modifications. For programmers without strong knowledge of 
  linker script functionality, there is an even clearer benefit 
  as functionality which may have previously seemed overwhelming 
  to implement is now possible without leaving the source code. 
  Many toolchains supporting ELF are very powerful, and in the 
  hands of an experienced user, behavior supported by symbol 
  meta-information can already be achieved. In this case, symbol 
  meta-information will at least reduce the number of steps the 
  programmer must take to implement the desired behavior.

Record of operations
  In relocatable files, the symbol 
  meta-information table serves as a list of transformations to 
  be made later in the build process. In executable files, the 
  table shows which transformations have been made. With the 
  assistance of a dump program which has understanding of the 
  format of .symtab_meta, a formatted dump of the table makes it 
  clear which symbols have supplemental information.
  When linker script modifications are used to alter the handling 
  of certain symbols, that file has to be studied by the 
  programmer, possibly in conjunction with the source code, to 
  understand what special handling is going to be applied. The 
  standard boilerplate linker script code required for regular 
  operation is likely to further obscure which symbols have 
  supplemental information.

Clear, defined purpose
  Each symbol meta-information type has a 
  specific purpose. When putting symbols into sections with the 
  aim of having them later be treated in some special way by the 
  linker script, it may not always be clear what is trying to be 
  achieved without examining the relationship between the section 
  and symbol at different stages of the build process.

No limitations
  A type of symbol meta-information can be 
  implemented such that its value describes an offset into the 
  string table, or the section number of a section containing 
  additional information. Therefore, since the true value is not 
  limited to the size of the value in the symbol meta-information 
  table itself, there are many possibilities for what can be 
  accomplished using the meta-information.

5.2 Symbol meta-information as an extension to the ELF gABI

As for why this functionality should be added to the generic ABI, 
and not a processor-specific or vendor-specific ABI, we see this 
functionality helping other targets and vendors solve problems 
previously requiring non-standard and inventive solutions.

Initial versions of this functionality are already implemented 
for the MSP430 target within the MSP430-GCC fork, and for TI ARM 
targets in Texas Instruments’ Clang/LLVM fork. By making this 
available in the gABI and introducing the changes to the upstream 
mainline branches, other targets and vendors can leverage the 
generic functionality immediately. The overall meta-information 
mechanism can then be extended in generic, processor-specific, or 
vendor-specific ways, as required, to further improve the 
toolchain's feature-set.


==================================================================


ELF Symbol Meta-Information Implementation Details

August 2020

This document describes the precise changes to be made to the ELF 
gABI to implement Symbol Meta-Information.



4 Object Files

====
Sections
====


-------------------------------------------


Table 1: 
Section types, sh_type
+------------------+-------+
|      Name        | Value |
+------------------+-------+
+------------------+-------+
| SHT_SYMTAB_META  |  19   |
+------------------+-------+

-------------------------------------------

SHT_SYMTAB_META
  This section contains the symbol 
  meta-information entries for the file. The section might begin 
  with a header, which contains some supplemental information.


Figure 1: 
.symtab_meta Header
(
typedef struct {
  unsigned char symtab_hash[20];
} Elf32_SMhdr;

typedef struct {
  unsigned char symtab_hash[20];
} Elf64_SMhdr;
)


symtab_hash
  For .symtab_meta format version >= 2, a 20-byte 
  SHA-1 hash of the entire contents of .symtab.

-------------------------------------------

Table 2: 
sh_link and sh_info interpretation
+------------------+-----------------------------------------------------------+----------------------------------+
|      Name        |    sh_link              |                         sh_info                                    |
+------------------+-----------------------------------------------------------+----------------------------------+
+------------------+-----------------------------------------------------------+----------------------------------+
| SHT_SYMTAB_META  | The section header      | The format version number of the symbol meta-information table     |
|                  | index of the associated | (ELFxx_SMH_VER), and the section header index of the .strtab_meta  |
|                  | symbol table.           | string table used by entries in this section (ELFxx_SMH_STR).      |
+------------------+-----------------------------------------------------------+----------------------------------+


Sub-Table a: 
Accessors for the sh_info field
---
#define ELF32_SMH_STR(i)    ((i)>>8)
#define ELF32_SMH_VER(i)    ((unsigned char)(i))
#define ELF32_SMH_INFO(s,v) (((s)<<8)+(unsigned char)(v))

#define ELF64_SMH_STR(i)    ((i)>>32)
#define ELF64_SMH_VER(i)    ((i)&0xffffffffL)
#define ELF64_SMH_INFO(s,v) (((s)<<32)+((v)&0xffffffffL))


Sub-Table b: 
.symtab_meta versions
+--------+-------------------------------------------------------------------------------+
| Value  |                                    Meaning                                    |
+--------+-------------------------------------------------------------------------------+
+--------+-------------------------------------------------------------------------------+
|   0    |                                Invalid Version                                |
+--------+-------------------------------------------------------------------------------+
|   1    |             There is no header at the beginning of .symtab_meta.              |
+--------+-------------------------------------------------------------------------------+
|   2    | A header containing the hash of .symtab is at the beginning of                |
|        | .symtab_meta.                                                                 |
+--------+-------------------------------------------------------------------------------+

-------------------------------------------


====
Special Sections
====

-------------------------------------------


Table 3: 
Special Sections
+---------------+------------------+------------+
|     Name      |      Type        | Attributes |
+---------------+------------------+------------+
+---------------+------------------+------------+
| .symtab_meta  | SHT_SYMTAB_META  |    None    |
+---------------+------------------+------------+
| .strtab_meta  |   SHT_STRTAB     |    None    |
+---------------+------------------+------------+

-------------------------------------------


.symtab_meta
  This section holds additional “meta-information” 
  about symbols in .symtab. The different types of 
  meta-information are described in “Symbol Meta-Information”.

.strtab_meta
  If required, this section holds strings used as a 
  value to certain types of symbol meta-information. It can be 
  omitted if no symbol meta-information types require it.


-------------

  Symbol Meta-Information

[ Note: This is a new subsection, intended to be placed at the end 
of the “Symbol Table” section, after the “Symbol Values” 
subsection. ]

ELF relocatable and executable files may contain a new section 
named .symtab_meta. This section describes additional information 
about symbols in .symtab. The section can be omitted from ELF 
files if there is no meta-information for any symbols, but if 
present, there can only be one section with this name and type.

  Symbol Meta-Information Table Entries

Following the initial header of .symtab_meta, there is an array 
of symbol meta-information entries.


-------------------------------------------

(
typedef struct {
  Elf32_Addr smi_info;
  Elf32_Word smi_value;
} Elf32_SymMetaInfo;

typedef struct {
  Elf64_Addr  smi_info;
  Elf64_Xword smi_value;
} Elf64_SymMetaInfo;
)

-------------------------------------------

smi_info
  This field describes both the symbol table index of 
  the ELF symbol this symbol meta-information this applies to, 
  and the type of meta-information entry this is. A number of 
  generic types are pre-defined. There are also reserved ranges 
  for processor-specific and application-specific (i.e. 
  vendor-specific) types.

-------------------------------------------

(
#define ELF32_SMI_SYM(i)    ((i)>>8)
#define ELF32_SMI_TYPE(i)   ((unsigned char)(i))
#define ELF32_SMI_INFO(s,t) (((s)<<8)+(unsigned char)(t))

#define ELF64_SMI_SYM(i)    ((i)>>32)
#define ELF64_SMI_TYPE(i)   ((i)&0xffffffffL)
#define ELF64_SMI_INFO(s,t) (((s)<<32)+((t)&0xffffffffL))
)

-------------------------------------------

smi_value
  The interpretation depends on the associated type. 
  The value could be interpreted as a boolean, symbol table 
  index, address, string table index etc.

-------------------------------------------

Figure 5: 
Symbol Meta-Information Types
+--------+-----------------+--------------------+
| Value  |      Type       |  Format of Value   |
+--------+-----------------+--------------------+
+--------+-----------------+--------------------+
|   0    |    SMT_NONE     |        None        |
+--------+-----------------+--------------------+
|   1    |   SMT_RETAIN    |      Boolean       |
+--------+-----------------+--------------------+
|   2    |  SMT_LOCATION   |      Address       |
+--------+-----------------+--------------------+
|   3    |   SMT_NOINIT    |      Boolean       |
+--------+-----------------+--------------------+
|   4    | SMT_PRINTF_FMT  |      Integer       |
+--------+-----------------+--------------------+
| 0xC0   |   SMT_LOPROC    |                    |
+--------+-----------------+ Processor-specific |
| 0xDF   |   SMT_HIPROC    |                    |
+--------+-----------------+--------------------+
| 0xE0   |   SMT_LOUSER    |                    |
+--------+-----------------+ Vendor-specific    |
| 0xFF   |   SMT_HIUSER    |                    |
+--------+-----------------+--------------------+

-------------------------------------------

SMT_NONE
  This indicates an invalid or incomplete entry.

SMT_RETAIN
  A value of 1 indicates the associated symbol should 
  be retained in the output executable file, even it appears 
  unused and so the linker would normally garbage collect it. 
  Other values result in the type being ignored.

SMT_LOCATION
  The VMA of the associated symbol in the output 
  executable file should be set to the specified the value.

SMT_NOINIT
  A value of 1 indicates the associated data symbol 
  should not be initialized by the runtime support code at 
  program startup. Other values result in the type being ignored.

SMT_PRINTF_FMT
  The value indicates a byte offset into the 
  .strtab_meta section. The section header table index of 
  .strtab_meta is extracted from the sh_info value of 
  .symtab_meta, using the ELFxx_SMH_STR accessor.
  The null-terminated string extracted from the string table is a 
  de-duplicated list of format specifiers used by calls to 
  printf-like functions, in the function whose symbol is pointed 
  to by this entry.
  For example, the following C code:
    printf (“%d / %d = %f\n”, ...);
  would generate the following string in .strtab_meta:
    “%d%f”.

SMT_LOPROC..SMT_HIPROC
  Values in this range are reserved for 
  processor-specific semantics.

SMT_LOUSER..SMT_HIUSER
  Values in this range are reserved for 
  vendor-specific semantics.

====
Restrictions on applying symbol meta-information types to 
  symbols
====

Symbol meta-information entries are always tied to a symbol in 
the symbol table, so there are no special rules regarding 
different symbols with the same name; the standard symbol binding 
rules apply.

No two entries in .symtab_meta can have the same smi_info value - 
each symbol must only have one value for a given meta-information 
type.

Figure 6: 
Symbol bindings and types permitted for metasyms
+-------------------------------+---------------------------+--------------------------------------+
| Symbol Meta-Information Type  | Permitted Symbol Binding  |        Permitted Symbol Type         |
+-------------------------------+---------------------------+--------------------------------------+
+-------------------------------+---------------------------+--------------------------------------+
|          SMT_RETAIN           |      Any < STB_LOOS       |      STT_FUNC                        |
|                               |                           |      | STT_OBJECT                    |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|         SMT_LOCATION          |      Any < STB_LOOS       |      STT_FUNC                        |
|                               |                           |      | STT_OBJECT                    |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|          SMT_NOINIT           |      Any < STB_LOOS       |       STT_OBJECT                     |
|                               |                           |      | STT_COMMON                    |
+-------------------------------+---------------------------+--------------------------------------+
|        SMT_PRINTF_FMT         |      Any < STB_LOOS       |       STT_FUNC                       |
+-------------------------------+---------------------------+--------------------------------------+


elf-symbol-meta-information-implementation.pdf
elf-symbol-meta-information-proposal.pdf

Suprateeka R Hegde

unread,
Aug 27, 2020, 9:33:15 AM8/27/20
to gener...@googlegroups.com, Jozef Lawrynowicz
Symbol meta information, such as the one proposed, is not generic to a
platform compliant with either System V base or extended specs. Its a
new age requirement. I have solved this in a simple way for our AI
Accelerator ELF image.

Symbols, in the context of a compiler/linker toolchain for an AI
Accelerator, or a new age ASIC, differ from the symbol and attributes we
have known for ages.

What I did:

1. Add a new symbol type under the processor/os specific extension
2. The new type of symbol is a pointer to a location where I keep a
versioned (for compatibility) data structure containing all the gory
meta information I want to asscoiate with the symbol. The compiler,
linker and all the tools, understand the structure. All tarditional
concepts like dynamic relocation, etc., all hold good.
3. Thats all!

Actually, what I did is not phenomenally differenty compared to the way
we store debug information associated with symbols.

That said, your write-up is quiote detailed. My opinion is that, its not
generic. Stay tuned for a consensus from the committe. Meanwhile, I
suggest you to look at the GNU gABI, that has seen and accepted such new
age requirements.

--
Supra

On 27-Aug-2020 04:34 pm, Jozef Lawrynowicz wrote:
> Hi,
>
> I'd like to propose an ELF extension named "Symbol Meta-Information",
> which is a new mechanism used to describe additional information about
> symbols.
>
> I've attached a nicely formatted PDF version of the full proposal. The
> specific implementation details which would augment the existing ELF
> spec is also attached.
>
> Alternatively, HTML versions are available here:
> http://www.mittosystems.com/metainfo/elf-symbol-meta-information-proposal.html
> http://www.mittosystems.com/metainfo/elf-symbol-meta-information-implementation.html
>
> Below is also a plain-text version of the documents, although it might
> not be as pleasant to read as the other versions. 
>
> I look forward to hearing your thoughts,
> Jozef Lawrynowicz
>
> -----------------------------
>
> *_ELF Symbol Meta-Information_*
> *_1 Introduction_*
>
> Here we propose a new mechanism for describing additional 
> information about ELF symbols, called Symbol Meta-Information.
>
> Symbol Meta-Information is intended to solve the problem of how 
> the compiler or assembler can communicate information about 
> symbols, not supported by existing ELF constructs, to downstream 
> tools such as the linker and other consumers of ELF files. These 
> consumers can then change how they handle the symbols, based on 
> the supplementary information.
>
> A new ELF special section named .symtab_meta enumerates which 
> symbols have meta-information, the type of meta-information, and 
> the associated value of that meta-information.
>
> The use of attributes set on symbol declarations in the source 
> code provides the programmer with a simple interface to the new 
> functionality.
>
> Symbol meta-information is designed to be extensible, with plenty 
> of room for new types of meta-information to be added, and 
> flexible, as the value of the meta-information can take on any 
> format.
>
> *_2 Background_*
>
> *2.1 Motivation*
> *2.2 Alternative vehicles for symbol meta-information *
> *  implementation*
>
> We acknowledge some existing constructs which could be used to 
> supply additional information about ELF symbols, and describe why 
> they are unsuitable vehicles for the proposed symbol 
> meta-information functionality.
>
> _New symbol types or bindings_
>
> • If a type of symbol meta-information implied only one existing 
>   symbol type or binding attribute, then the meta-information 
>   type could be implemented as a new type or binding. However, 
>   since the proposed symbol meta-information types support 
>   symbols with different types and different bindings, this 
>   approach would not work.
>
> • There are only 3 remaining “slots” for generic symbol types and 
>   it is desirable to have more than 3 new types of symbol 
>   meta-information. There are further reserved ranges for 
>   operating system-specific and processor-specific types, but it 
>   would not be appropriate to use these for new types which have 
>   generic use.
>
> • Fundamentally, symbol meta-information supplies additional 
>   information about symbols, and does not change the intrinsic 
>   type or binding of a symbol.
>
> _st_other member of symbol table entry_
>
> • st_other is only 8 bits in size and is used as a bit-mask. Bits 
>   0 and 1 are reserved, with an additional proposal currently 
>   pending to reserve bit 2 as well. The remaining bits 3-7 have 
>   not been officially reserved but are all in use by a variety of 
>   targets. Therefore, there are no remaining bits which can be 
>   used without creating a conflict with some target or operating 
>   system.
>
> • There is no standard way to provide supplemental information 
>   which gives a non-boolean value for the st_other field. Further 
>   modifications, such as the creation of a special section, would 
>   be required to provide non-boolean values to accompany the 
>   st_other value.
>
> _Solaris SymInfo_
>
> • Solaris SymInfo specifically targets dynamic symbols, and the 
>   proposed functionality should be available to targets which do 
>   not support the concept of dynamic linking. SymInfo “types” are 
>   flags that can be augmented by extracting a value from the 
>   .dynamic section.
>
>   – The .dynamic section is identified by the sh_info field of 
>     the section header, and could arguably be repurposed to point 
>     to some other section in cases when there are no dynamic 
>     symbols with SymInfo entries. However, this behavior would 
>     not be well defined when there is also a .dynamic section in 
>     the file.
>
> • The si_flags field, which describes the properties of the 
>   associated symbol, is the size of a half-word. On a target 
>   implementing 32-bit ELF, this would be 16-bits. Since the flags 
>   are implemented as a bit-mask with 10 types already 
>   implemented, there only remains space for 6 further types. This 
>   is unlikely to be enough room for all current and future 
>   meta-information types, especially once factoring in any 
>   additional vendor or processor-specific extensions.
>
> _New ABI-mandated “Special Sections”_
>
> • A new type of ELF “special section” could be created for each 
>   of the proposed new types of symbol meta-information. ELF file 
>   consumers such as the linker would then handle these sections 
>   in a specific way, without assistance from the linker script. 
>   However, this has some downsides:
>
>   – The user may not want to put a symbol in it's own section 
>     just to make use of the desired functionality.
>
>   – A special section for the symbol obscures the fact that the 
>     meta-information is for a symbol, not a section.
>
>   – If the sh_info member is used to provide an accompanying 
>     value for the meta-information type, then only one value can 
>     be specified per section, meaning symbols with the same type 
>     might not be able to be grouped together in a section.
>
>   – An application making use of a large amount of new special 
>     sections to describe symbol meta-information could pollute 
>     the section header table.
>
> *_3 Design_*
>
> *Abbreviations*
> metasym
>   Any type of meta-information symbol
> locsym
>   A meta-information symbol with type SMT_LOCATION
>
> *3.1 Symbol Meta-Information Table*
> _symtab_hash_
>   For version >= 2, a 20-byte SHA-1 hash of the 
>   entire contents of .symtab (taken once the symbol table indices 
>   have been finalized) is used to verify .symtab has not been 
>   modified by tools which do not recognize .symtab_meta. These 
>   tools would not update the symbol index stored in the symbol 
>   meta-information table entry when making changes to the 
>   program, possibly corrupting the state of .symtab_meta.
>
> ----
> *3.2 Symbol Meta-Information Table Entries*
> ----
>
> Symbol meta-information table entries describe the symbol that 
> the meta-information applies to, the type of meta-information, 
> and the associated value of the meta-information.
>
> The format of symbol meta-information table entries is physically 
> identical to ELF Rel relocation entries. The smi_info field 
> encodes the symbol table index of the corresponding symbol and 
> the type of meta-information in the same way that the symbol 
> table index and type of a relocation are encoded in the r_info 
> field of relocation entries.
>
> Figure 1: 
> Structure of a .symtab_meta entry
> (
> typedef struct {
>   Elf32_Addr smi_info;
>   Elf32_Word smi_value;
> } Elf32_SymMetaInfo;
>
> typedef struct {
>   Elf64_Addr  smi_info;
>   Elf64_Xword smi_value;
> } Elf64_SymMetaInfo;
> )
>
> _smi_info_
>   This field describes both the symbol table index of 
>   the ELF symbol this symbol meta-information this applies to, 
>   and the type of meta-information entry this is. A number of 
>   generic types are pre-defined. There are also reserved ranges 
>   for processor-specific and application-specific (i.e. 
>   vendor-specific) types.
>
> Figure 2: 
> Accessors for the smi_info field
> (
> #define ELF32_SMI_SYM(i)    ((i)>>8)
> #define ELF32_SMI_TYPE(i)   ((unsigned char)(i))
> #define ELF32_SMI_INFO(s,t) (((s)<<8)+(unsigned char)(t))
>
> #define ELF64_SMI_SYM(i)    ((i)>>32)
> #define ELF64_SMI_TYPE(i)   ((i)&0xffffffffL)
> #define ELF64_SMI_INFO(s,t) (((s)<<32)+((t)&0xffffffffL))
> )
>
> _smi_value_
> *3.3 Symbol Meta-Information Values*
> ----
>
> _3.3.1 Restrictions on applying symbol meta-information types to _
> _  symbols_
> _3.3.2 SMT_NOINIT use case_
>
> When a piece of data is not initialized to a constant value, but 
> does not need to be zero-initialized, SMT_NOINIT indicates that 
> it can be skipped by runtime startup code that would normally 
> initialize it, to save time when starting the program.
>
> Alternatively, when a piece of data is initialized to a constant 
> value when the program is loaded, but should not be 
> re-initialized when the processor resets, SMT_NOINIT can also be 
> applied.
>
> _3.3.3 SMT_PRINTF_FMT use case_
>
> When the size of an application is a concern to the programmer, 
> limiting the format specifiers supported by printf-like functions 
> can reduce the code and data usage of these functions in the 
> application.
>
> By storing the required format specifiers in the symbol 
> meta-information table, the linker can examine each of the 
> SMT_PRINTF_FMT entries for functions that will be used in the 
> final linked executable, and link in the minimal implementation 
> of the printf function required to support all the format 
> specifiers used by the application.
>
> _3.3.4 Considerations for placement of SMT_LOCATION _
> _  meta-information symbols (locsyms)_
>
> Locsyms are intended to augment a well-defined linker script. The 
> linker validates the address provided for the locsym by examining 
> the permissions of the segment (p_flags) which contains the 
> specified VMA. For example, the linker must ensure that a locsym 
> for a read/write symbol with type STT_OBJECT is not placed in a 
> segment without write (PF_W) permissions, and emit an error if 
> the segment containing the address is invalid.
>
> The linker may need to place the input section of a locsym within 
> an output section, within which it would not normally be placed. 
> For example, consider an application with a large .text output 
> section, which spans most of ROM. If a locsym corresponding to a 
> piece of read-only data has an address within range of that .text 
> section, and there is no way to offset the .text section within 
> ROM such that the read-only data can be placed directly at the 
> location, that read-only data can be placed amongst the .text 
> input sections at the requested address. As long as the output 
> section flags are not changed by adding the new input section, 
> there should not be any problems mixing sections in this way.
>
> _3.3.5 Initialization of locsyms at program startup_
>
> Data which requires initialization at program startup (e.g. 
> copying data from their LMA to VMA) has long been handled by the 
> associated runtime library. When all data requiring 
> initialization is within a range of addresses defined by known 
> __*_start and __*_end symbols, only a fixed number of 
> target-dependent initialization functions need to be run. 
> However, when code and data can reside alone at disparate 
> locations in memory, there must be a mechanism to initialize each 
> of these as required. The procedure for initializing this data is 
> not enforced by this ABI. It is expected that an entry in 
> .init_array is created for a function which will run through 
> entries in a table describing how to copy data or initialize 
> variables as required.
>
> Note that this functionality can be leveraged to easily allow 
> functions to be executed from a memory region without persistent 
> storage e.g. RAM. When the linker sees that the segment 
> containing the VMA of the function has a different LMA and VMA, a 
> copy table entry is created, and the runtime startup code will 
> copy the contents of this function from the LMA to VMA, in the 
> same way it would with a piece of data.
>
> *_4 Using Symbol Meta-Information_*
>
> *4.1 Usage example*
>
> The programmer does not need to be aware of the symbol 
> meta-information mechanism itself to be able to make use of the 
> different types and apply special handling to symbols. An 
> attribute set in the source code will cause the compiler to emit 
> an assembler directive describing the meta-information, the 
> assembler then creates the .symtab_meta section, which the linker 
> absorbs, performs any required actions, and then outputs a new 
> .symtab_meta section with all accumulated metasyms from input 
> object files.
>
> Figure 5: 
> Example
> Compiler source code:
> [
> uint16_t __attribute__((retain,location(0x1000)))
>   core0_key = 0x1234;
> ]
>
> Compiled assembly code:
> [
>         .global core0_key
>         .type   core0_key, @object
>         .sym_meta_info  core0_key, SMT_RETAIN, 1
>         .sym_meta_info  core0_key, SMT_LOCATION, 0x1000
> ]
>
> .symtab_meta dump from assembled object file:
> [
> SYMBOL META-INFORMATION TABLE:
> Idx     Kind            Value       Sym idx Name
> 0:      SMT_RETAIN      0x1         7       core0_key
> 1:      SMT_LOCATION    0x1000      7       core0_key
> ]
>
>
> *_5 Conclusion_*
>
> *5.1 Symbol meta-information benefits*
>
> _Ease of use_
>   The application of an attribute to a symbol 
>   declaration in the source code is now enough to achieve what 
>   previously required both source code and linker script 
>   modifications. For programmers without strong knowledge of 
>   linker script functionality, there is an even clearer benefit 
>   as functionality which may have previously seemed overwhelming 
>   to implement is now possible without leaving the source code. 
>   Many toolchains supporting ELF are very powerful, and in the 
>   hands of an experienced user, behavior supported by symbol 
>   meta-information can already be achieved. In this case, symbol 
>   meta-information will at least reduce the number of steps the 
>   programmer must take to implement the desired behavior.
>
> _Record of operations_
>   In relocatable files, the symbol 
>   meta-information table serves as a list of transformations to 
>   be made later in the build process. In executable files, the 
>   table shows which transformations have been made. With the 
>   assistance of a dump program which has understanding of the 
>   format of .symtab_meta, a formatted dump of the table makes it 
>   clear which symbols have supplemental information.
>   When linker script modifications are used to alter the handling 
>   of certain symbols, that file has to be studied by the 
>   programmer, possibly in conjunction with the source code, to 
>   understand what special handling is going to be applied. The 
>   standard boilerplate linker script code required for regular 
>   operation is likely to further obscure which symbols have 
>   supplemental information.
>
> _Clear, defined purpose_
>   Each symbol meta-information type has a 
>   specific purpose. When putting symbols into sections with the 
>   aim of having them later be treated in some special way by the 
>   linker script, it may not always be clear what is trying to be 
>   achieved without examining the relationship between the section 
>   and symbol at different stages of the build process.
>
> _No limitations_
>   A type of symbol meta-information can be 
>   implemented such that its value describes an offset into the 
>   string table, or the section number of a section containing 
>   additional information. Therefore, since the true value is not 
>   limited to the size of the value in the symbol meta-information 
>   table itself, there are many possibilities for what can be 
>   accomplished using the meta-information.
>
> *5.2 Symbol meta-information as an extension to the ELF gABI*
>
> As for why this functionality should be added to the generic ABI, 
> and not a processor-specific or vendor-specific ABI, we see this 
> functionality helping other targets and vendors solve problems 
> previously requiring non-standard and inventive solutions.
>
> Initial versions of this functionality are already implemented 
> for the MSP430 target within the MSP430-GCC fork, and for TI ARM 
> targets in Texas Instruments’ Clang/LLVM fork. By making this 
> available in the gABI and introducing the changes to the upstream 
> mainline branches, other targets and vendors can leverage the 
> generic functionality immediately. The overall meta-information 
> mechanism can then be extended in generic, processor-specific, or 
> vendor-specific ways, as required, to further improve the 
> toolchain's feature-set.
>
>
> ==================================================================
>
>
> ELF Symbol Meta-Information Implementation Details
>
> August 2020
>
> This document describes the precise changes to be made to the ELF 
> gABI to implement Symbol Meta-Information.
>
>
>
> *_4 Object Files_*
>
> ====
> Sections
> ====
>
>
> -------------------------------------------
>
>
> Table 1: 
> Section types, sh_type
> +------------------+-------+
> |      Name        | Value |
> +------------------+-------+
> +------------------+-------+
> | SHT_SYMTAB_META  |  19   |
> +------------------+-------+
>
> -------------------------------------------
>
> _SHT_SYMTAB_META_
>   This section contains the symbol 
>   meta-information entries for the file. The section might begin 
>   with a header, which contains some supplemental information.
>
>
> Figure 1: 
> .symtab_meta Header
> (
> typedef struct {
>   unsigned char symtab_hash[20];
> } Elf32_SMhdr;
>
> typedef struct {
>   unsigned char symtab_hash[20];
> } Elf64_SMhdr;
> )
>
>
> _symtab_hash_
> _._symtab_meta
>   This section holds additional “meta-information” 
>   about symbols in .symtab. The different types of 
>   meta-information are described in “Symbol Meta-Information”.
>
> .strtab_meta
>   If required, this section holds strings used as a 
>   value to certain types of symbol meta-information. It can be 
>   omitted if no symbol meta-information types require it.
>
>
> -------------
>
> *  Symbol Meta-Information*
> --
> You received this message because you are subscribed to the Google
> Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to generic-abi...@googlegroups.com
> <mailto:generic-abi...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/generic-abi/a9742026-e0c7-4f72-a85e-9febfa715708o%40googlegroups.com
> <https://groups.google.com/d/msgid/generic-abi/a9742026-e0c7-4f72-a85e-9febfa715708o%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jozef Lawrynowicz

unread,
Aug 27, 2020, 12:30:29 PM8/27/20
to Suprateeka R Hegde, gener...@googlegroups.com
On Thu, Aug 27, 2020 at 07:03:06PM +0530, Suprateeka R Hegde wrote:
> Symbol meta information, such as the one proposed, is not generic to a
> platform compliant with either System V base or extended specs. Its a
> new age requirement. I have solved this in a simple way for our AI
> Accelerator ELF image.

Can you clarify whether it is the proposed symbol meta-information types
which are not generic to a System V compliant platform, or the mechanism
itself?

I understand that the proposed types are geared towards embedded
microcontrollers, and perhaps don't have much use on a Linux system, for
example, but they could have a generic use amongst many different
platforms or vendors.

We wanted to standardize this behaviour since the proposed
meta-information types are useful for both MSP430 and ARM targets (these
targets are our specific interest, but I'm sure they will be useful for
others), and have been implemented in downstream GNU and LLVM
toolchains.
So adding it to the ELF spec seemed like a good way to have the
functionality formalized in a generic way.

>
> Symbols, in the context of a compiler/linker toolchain for an AI
> Accelerator, or a new age ASIC, differ from the symbol and attributes we
> have known for ages.
>
> What I did:
>
> 1. Add a new symbol type under the processor/os specific extension
> 2. The new type of symbol is a pointer to a location where I keep a
> versioned (for compatibility) data structure containing all the gory
> meta information I want to asscoiate with the symbol. The compiler,
> linker and all the tools, understand the structure. All tarditional
> concepts like dynamic relocation, etc., all hold good.
> 3. Thats all!
>
> Actually, what I did is not phenomenally differenty compared to the way
> we store debug information associated with symbols.

Thanks for the info. There are indeed many ways to technically achieve
what we want (I enumerated some in the proposal), but some of those are
a bit hacky and things wouldn't be done that way if the spec wasn't so
rigid. It also seemed like ELF is running out of space
to define new information about symbols, and a new mechanism for
augmenting symbols could be beneficial.

>
> That said, your write-up is quiote detailed. My opinion is that, its not
> generic. Stay tuned for a consensus from the committe. Meanwhile, I
> suggest you to look at the GNU gABI, that has seen and accepted such new
> age requirements.

I suppose if it is rejected here, I will look at the GNU gABI. Perhaps
LLVM will be able to base it's implementation from that.

Thanks,
Jozef
>
> --
> Supra
>
> On 27-Aug-2020 04:34 pm, Jozef Lawrynowicz wrote:
> > Hi,
> >
> > I'd like to propose an ELF extension named "Symbol Meta-Information",
> > which is a new mechanism used to describe additional information about
> > symbols.
> >
> > I've attached a nicely formatted PDF version of the full proposal. The
> > specific implementation details which would augment the existing ELF
> > spec is also attached.
> >
> > Alternatively, HTML versions are available here:
> > http://www.mittosystems.com/metainfo/elf-symbol-meta-information-proposal.html
> > http://www.mittosystems.com/metainfo/elf-symbol-meta-information-implementation.html
> >
> > Below is also a plain-text version of the documents, although it might
> > not be as pleasant to read as the other versions. 
> >
> > I look forward to hearing your thoughts,
> > Jozef Lawrynowicz
> >
> >
> >

Joseph Myers

unread,
Aug 27, 2020, 4:44:04 PM8/27/20
to Generic System V Application Binary Interface
On Thu, 27 Aug 2020, Jozef Lawrynowicz wrote:

> SMT_PRINTF_FMT
> The value indicates a byte offset into the
> .strtab_meta section. The section header table index of
> .strtab_meta is extracted from the sh_info value of
> .symtab_meta, using the ELFxx_SMH_STR accessor.
> The null-terminated string extracted from the string table is a
> de-duplicated list of format specifiers used by calls to
> printf-like functions, in the function whose symbol is pointed
> to by this entry.
> For example, the following C code:
> printf (“%d / %d = %f\n”, ...);
> would generate the following string in .strtab_meta:
> “%d%f”.

I think this is rather under-specified. A format conversion specification
has six parts in ISO C: the initial '%', any flags, any field width, any
precision, any length modifier, the conversion specifier character.
POSIX extends this by allowing the initial '%' to be of the form '%n$'
instead to specify the position of the argument converted, and also allows
'*m$' as a form of width and precision, similarly.

Do you intend the .strtab_meta entry to contain all those parts of the
conversion specification, or only some of them? Giving examples that
include more than just the initial '%' and final conversion specifier
character in the printf format string would help illustrate what should go
in .strtab_meta in such a case. (Various of those optional parts of the
conversion specification can significantly affect the amount of code
needed by printf - for example, the POSIX "'" flag for locale-specific
numeric grouping characters, or the H, D and DD length modifiers for
decimal floating point. Though I'm guessing the sort of implementations
this feature is aimed at don't support those printf features at all,
rather than supporting them but wanting to omit the code at link time when
not needed by a given application.)

How should it be indicated that an object file uses such a function with a
runtime-writable format string and so the specifiers used cannot be known
at link time? Is that supposed to be assumed by default for any object
referencing such a function without a .strtab_meta entry for it?

--
Joseph S. Myers
jos...@codesourcery.com

Jozef Lawrynowicz

unread,
Aug 28, 2020, 8:27:16 AM8/28/20
to gener...@googlegroups.com
The intention would be to omit any digits used by the width and
precision specifiers, and the '.' used by the precision specifier,
but otherwise store all parts of the format string, de-duplicating
parts of it as necessary.

The exact behavior of how to encode the format string certainly does need
clearer specification. '%' doesn't actually need to be stored in the condensed
string, so the behavior should instead be like the below examples:

printf ("%-*.*hhd %#hx", ...);
yields the following NUL terminated string
-*hhd#hx

printf ("%+ld % 8.8lld %-6.6lld", ...);
yields the following NUL terminated string
+ld lld-

printf ("%*2$.*3$lld %4$*5$.*6$ld", ...);
yields the following NUL terminated string
*$lldld

It's assumed that the number of the positional argument used with "%n$"
or "*m$" isn't important, but the presence of '$' is required as it
indicates an additional feature which might need to be supported by
printf.

There also needs to be a clarification on how to group the parts of the
format string when performing the de-duplication. For example,
"%ld %lld %lf" should *not* be condensed into "ldllf", as the linker may
want to know which length modifier is used by a given conversion format
specifier. Instead it should be condensed to "ldlldlf".

The length modifier and format specifier are considered an atomic part
of the format string for the purposes of de-duplication.

You are correct that the implementations we had in mind for this feature
aren't concerned with those extra POSIX features and instead are choosing
between printf versions supporting some selection of:
- a minimal set of integer/string/char format specifiers,
- the full set of non-float format specifiers,
- all format specifiers.

>
> How should it be indicated that an object file uses such a function with a
> runtime-writable format string and so the specifiers used cannot be known
> at link time? Is that supposed to be assumed by default for any object
> referencing such a function without a .strtab_meta entry for it?

TI have a vendor-specific type to handle that, which tries to see if the
variable format string can be traced back to some constant value. But the
behavior appears too specialized to suggest implementing generically.

Again, this should have been more carefully considered in the original proposal,
but we would store a '?' in .strtab_meta, in place of the condensed format
string.
The linker would then have to just assume that all printf features
allowed by the standard are in use.

Something else that I need to consider is how scanf-like functions should be
treated - with the same SMT_PRINTF_FMT meta-information type, or a new one?

Thanks for the feedback, I can certainly improve this part of the spec.

Jozef
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/generic-abi/QPgYf3-_Iyw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to generic-abi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/alpine.DEB.2.21.2008272034470.20906%40digraph.polyomino.org.uk.

Ali Bahrami

unread,
Aug 28, 2020, 3:24:31 PM8/28/20
to gener...@googlegroups.com
On 8/27/20 5:04 AM, Jozef Lawrynowicz wrote:
> I'd like to propose an ELF extension named "Symbol Meta-Information", which is a new mechanism used to describe additional information about symbols.
>
> I've attached a nicely formatted PDF version of the full proposal. The specific implementation details which would augment the existing ELF spec is also attached.
>
> Alternatively, HTML versions are available here:
> http://www.mittosystems.com/metainfo/elf-symbol-meta-information-proposal.html <https://urldefense.com/v3/__http://www.mittosystems.com/metainfo/elf-symbol-meta-information-proposal.html__;!!GqivPVa7Brio!PngAl0BtPum6ATIixdKBpSA5mFhXOEfF8yRABmcZhPXM8SFyjsfaqCLtsE2uqCrd$>
> http://www.mittosystems.com/metainfo/elf-symbol-meta-information-implementation.html <https://urldefense.com/v3/__http://www.mittosystems.com/metainfo/elf-symbol-meta-information-implementation.html__;!!GqivPVa7Brio!PngAl0BtPum6ATIixdKBpSA5mFhXOEfF8yRABmcZhPXM8SFyjsfaqCLtsJzIAFY6$>
>


Hi Jozef,

I want to say that this is a very thorough proposal. It's
evident that carefully studied a lot of different ELF platforms
(Solaris syminfo!) and the document is very much in the style of
existing ELF ABI documentation. I appreciate it.

I also want to say that I believe that the problems you're trying
to solve are real, and that I understand the desire to standardize
at the lowest possible level. The question isn't about that, but
rather, whether they are general enough to serve all platforms, or
if they're more specialized.

In my time in the ELF cave, I've added a couple of generic features,
and I've added several OSABI_SOLARIS features, so I'm not immune to
the temptation to improve at those layers. However, in the last 15
years, I've also had to deal with the cost of failed extensions (not
mine, yet. but there's still time!), removing or minimizing their
harm, and this has been a hard lesson about the value of being very
careful about new things, and about over engineering in general.
Now days, I always ask myself, "How is this going to look 5, 10, 20,
and 40 years from now?". That's a long time, and the world has
a way of changing. ELF is ~40 years old now, and it's probably fair
to say that this is true because it does what is needed, but not
a lot more.

I'm sorry to say that I don't think these are generic features.
Supra suggested taking it to the GNU OSABI, and that is indeed
one way forward. However, I think some of my concerns would apply
equally to OSABI_GNU, so I'll provide my notes about that below,
in the hope that it might help you tune the ideas, or possibly find
other ways to solve these problems, Note that these are all just
my opinions, and I certainly don't expect you to agree in all cases.

Thanks.

- Ali

-----

- In discussing symbol metadata, we need to distinguish between
information that is essential to basic linking, and those that
support more esoteric ends specific to some project or other.
Given a mechanism for different groups to add flags to support
their specific projects, it is inevitable that they will seek to add
more such things. That will leave linkers in a bad position, where
have to support an ever growing set of things that were not
necessarily designed to work together, many of which might not
be needed by that platform. Even worse, if you implement something
you don't need, it will probably end up having bugs if and when
anyone actually tries to use it.

. I really think that to the extent that a linker script can solve
the problems you've described, that's better than putting the
information in ELF. ELF flags should only be considered for
those remaining few things that seem to be really general to
multiple platforms/communities, and certain to live forever.

The proposal states that linker scripts are opaque to the programmer,
and so, putting this sort of information in the ELF object is
better for them. Most programmers I know have very little understanding
of ELF, it's abilities, or content, so I don't think this is true.
For most purposes, ELF objects can only be changed by rebuilding them,
and require the use of special dumping tools to examine their content.
Compare that to linker scripts, which are written in text, and which
can be modified in the field to cause old objects to link in new
ways.

Linker scripts are certainly not easy to read. I really can't, except
by staring hard, lots of googling, and some guesswork. And the
boilerplate issue is certainly true. I think all of this suggests
that linker scripts need improvement, as opposed to bypassing them.

- The SHA-1 hash doesn't seem future proof, and I also wonder if
hashes for other sections wouldn't be wanted. In terms of using
it to detect corruption by tools that don't understand .symtab_meta,
how can they modify a section they don't understand, and why wouldn't
they just regenerate the hash if they did? If this is really just to
catch such cases, then we're burdening ELF with a feature long term,
to solve a momentary problem,

In my opinion, you shouldn't worry about tools that don't understand
.symtab_meta, because you can't win that way. Solaris has that problem
sometimes (objcopy has at times broken our extensions), The answer
is always to reach out and get that tool fixed. In fact, this illustrates
a big cost to extending ELF, and is a reason to prefer a higher
level approaches.

- I agree that using special symbol names to achieve a desired
effect is a bad idea for the ELF ABI, but it can be effective
in a scenario with linker-scripts. A well chosen prefix provides
a strong clue as to what created these symbols, and as such, for what.

- 2.2: In addition to your other reasons, adding new symbol types or
bindings creates a big burden for linkers, and needs to be extremely
rare. Such things are forever, whether they end up being useful,
or not.

One of the reason that compilers like gcc and llvm that work on
so many platforms, and have thrived, is that their output can be
fed to linkers created by others outside that community. New types
and bindings break that, so it's very big deal. And of course, as
the proposal says, there's not much room left there anyway.

- 2.2: Thanks for considering Solaris Syminfo. In addition to
your reasons, it's just not a good fit, unless direct bindings
become part of the gABI. Much of syminfo exists to support filter
objects and direct bindings. As things stand, it's too Solaris
specific to be generally useful. Other linkers wouldn't know what
to do with it.

- 3.1, table 2: This says that sh_info combines the
format version number of the symbol meta-information table,
and the section header index of the .strtab_meta string table.
The 32-bit encoding for the string table index is 24 bits, which
is smaller than the 32-bits allowed for section indexes when
extended section indexes are in play.

I'm skeptical of putting version numbers in the sh_info
anyway. Perhaps the slot 0 of your section would be a better
place to stash that, since symbol 0 is special anyway.

- SMT_NONE: This indicates an invalid or incomplete entry.

No one knowingly creates invalid or incomplete data. Don't you
mean "No meta data is specified".

- SMT_RETAIN: This has a shelf life problem, where the object
might specify that something should have been retained because
that seemed right at the time, but then circumstances changed.
Certainly the details of garbage collection could change in the
decades to come. Linker scripts seem better positioned to
evolve in the field.

- SMT_LOCATION; Aren't things like this usually managed by
controlling segment mapping, or with an SHN_ABS symbols?

- SMT_NOINIT: This seems like an arguable micro-optimization.
Zeroed bss is cheap, unless we're talking about massive
data, in which case it should probably be allocated at
runtime,

- SMT_PRINTF_FMT: I don't think that the gABI should become
dependent on a high level feature like printf(). This seems
semantically well above the level at which ELF operates.

Jozef Lawrynowicz

unread,
Aug 28, 2020, 5:35:43 PM8/28/20
to gener...@googlegroups.com
Hi Ali,

Thanks for providing your insight and the detailed review.

It's now clear from the feedback that there isn't enough support for the
metainfo mechanism, or the the proposed new types, to continue with
trying to get this added to the ELF gABI.

I certainly understand the concern about the bit-rot, whether in the
sense of the specification itself, or the implementations of it, for a
mechanism as flexible as this.

The fact that most of the proposed types are probably only
useful for embedded microcontrollers is clearly another red flag.

I thought that when I saw posts such as "mark function symbols following
different call abi"
(https://groups.google.com/forum/#!topic/generic-abi/Bfb2CwX-u4M),
and the issues with undocumented processor-specific takeover of the
bits in the st_other field of a symbol, that there could be a generic
need for a mechanism for specifying additional information about
symbols.

I guess it's probably worth seeing if there's any interest in this from
the GNU OSABI, but I imagine the eventual way to getting this upstreamed
is just to define in a psABI, like some Texas Instruments supplement
for their ARM devices. Other targets could then just use that as a
reference.

TI are well on their way making use of this mechanism for some
additional vendor-specific ways, so the eventual desire is to maintain
some sort of compatibility between upstream ARM GNU, upstream ARM
LLVM/Clang, and their downstream ARM LLVM/Clang toolchain.

I've added some further comments below.

Thanks,
Jozef
The idea was that the programmer interfaces with the functionality using
attributes in the source code, so they don't have to know anything about
ELF to make use of the functionality implemented by the metainfo types.

>
> Linker scripts are certainly not easy to read. I really can't, except
> by staring hard, lots of googling, and some guesswork. And the
> boilerplate issue is certainly true. I think all of this suggests
> that linker scripts need improvement, as opposed to bypassing them.

That's an interesting way of looking at it.
I'm only familiar with GNU linker scripts, but the linear parsing of the
script, and the internal linked list structure used to store all the
directives and sections certainly made it tricky to implement the
SMT_LOCATION type, for placement at a specific VMA!

I guess the general complication of linker scripts comes from the fact
that they need to support many different output formats. One could
probably write a cleaner linker script format if only ELF was
supported, for example.

>
> - The SHA-1 hash doesn't seem future proof, and I also wonder if
> hashes for other sections wouldn't be wanted. In terms of using
> it to detect corruption by tools that don't understand .symtab_meta,
> how can they modify a section they don't understand, and why wouldn't
> they just regenerate the hash if they did? If this is really just to
> catch such cases, then we're burdening ELF with a feature long term,
> to solve a momentary problem,
>
> In my opinion, you shouldn't worry about tools that don't understand
> .symtab_meta, because you can't win that way. Solaris has that problem
> sometimes (objcopy has at times broken our extensions), The answer
> is always to reach out and get that tool fixed. In fact, this illustrates
> a big cost to extending ELF, and is a reason to prefer a higher
> level approaches.

The SHA-1 hash is there to catch modifications to .symtab, by programs
which won't update the corresponding symbol indicies in .symtab_meta.

For example, if a user runs an old version of "strip" (perhaps installed
by default by their OS's package manager) on a .symtab_meta-containing
object file, which is being used within a new toolchain supporting
.symtab_meta, the linker is going to read in garbage from .symtab_meta,
as the symbol indicies stored in .symtab_meta have been corrupted by the
old "strip" and will no longer point to the proper symbol.

But yes it might only be needed for a few years whilst the versions of
the tools supporting .symtab_meta are propagated out to those OS's
which are slow to update.

>
> - I agree that using special symbol names to achieve a desired
> effect is a bad idea for the ELF ABI, but it can be effective
> in a scenario with linker-scripts. A well chosen prefix provides
> a strong clue as to what created these symbols, and as such, for what.
>
> - 2.2: In addition to your other reasons, adding new symbol types or
> bindings creates a big burden for linkers, and needs to be extremely
> rare. Such things are forever, whether they end up being useful,
> or not.
>
> One of the reason that compilers like gcc and llvm that work on
> so many platforms, and have thrived, is that their output can be
> fed to linkers created by others outside that community. New types
> and bindings break that, so it's very big deal. And of course, as
> the proposal says, there's not much room left there anyway.
>
> - 2.2: Thanks for considering Solaris Syminfo. In addition to
> your reasons, it's just not a good fit, unless direct bindings
> become part of the gABI. Much of syminfo exists to support filter
> objects and direct bindings. As things stand, it's too Solaris
> specific to be generally useful. Other linkers wouldn't know what
> to do with it.

Syminfo and symbol metainfo - similar in name only!

>
> - 3.1, table 2: This says that sh_info combines the
> format version number of the symbol meta-information table,
> and the section header index of the .strtab_meta string table.
> The 32-bit encoding for the string table index is 24 bits, which
> is smaller than the 32-bits allowed for section indexes when
> extended section indexes are in play.
>
> I'm skeptical of putting version numbers in the sh_info
> anyway. Perhaps the slot 0 of your section would be a better
> place to stash that, since symbol 0 is special anyway.

Thanks for raising this, I'll review it in our spec.

>
> - SMT_NONE: This indicates an invalid or incomplete entry.
>
> No one knowingly creates invalid or incomplete data. Don't you
> mean "No meta data is specified".
>
> - SMT_RETAIN: This has a shelf life problem, where the object
> might specify that something should have been retained because
> that seemed right at the time, but then circumstances changed.
> Certainly the details of garbage collection could change in the
> decades to come. Linker scripts seem better positioned to
> evolve in the field.
>
> - SMT_LOCATION; Aren't things like this usually managed by
> controlling segment mapping, or with an SHN_ABS symbols?

In general, if you want to place a section at a specific address in GNU
linker scripts you have to create a new output section for it, since that
is the only way to specify an absolute address. So it appears
unnecessarily complicated to the user, when they may just want to place
one variable at some specific address.

Perhaps extending the linker script language so that you can provide an
address for an input section would also be beneficial...

I'm not aware of how SHN_ABS can be leveraged by the programmer, I'd
have to look into that.

>
> - SMT_NOINIT: This seems like an arguable micro-optimization.
> Zeroed bss is cheap, unless we're talking about massive
> data, in which case it should probably be allocated at
> runtime,

Cycles are precious on MSP430, and allocating heap space adds slow down
that many programmers don't want.

This feature has actually been implemented for a while on MSP430 and
recently on ARM, using a ".noinit" section.

>
> - SMT_PRINTF_FMT: I don't think that the gABI should become
> dependent on a high level feature like printf(). This seems
> semantically well above the level at which ELF operates.
>

Yes, for this reason it should have really been kept as a
vendor-specific type, although I thought this might actually have the
most generic use out of the available types, but the gABI is clearly not
the place for it.
> --
> You received this message because you are subscribed to a topic in the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/generic-abi/QPgYf3-_Iyw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to generic-abi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/01ff1676-d18e-b2c3-5cb5-979243edc784%40Oracle.COM.
o view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/01ff1676-d18e-b2c3-5cb5-979243edc784%40Oracle.COM.

Ali Bahrami

unread,
Aug 31, 2020, 12:03:54 PM8/31/20
to gener...@googlegroups.com
Hi Jozef,

On 8/28/20 3:35 PM, Jozef Lawrynowicz wrote:

> I guess the general complication of linker scripts comes from the fact
> that they need to support many different output formats. One could
> probably write a cleaner linker script format if only ELF was
> supported, for example.

Undoubtedly. I find the basic linker script syntax itself
pretty difficult to read anyway, but I think that's largely
a matter of not enough familiarity.


>>
>> - 3.1, table 2: This says that sh_info combines the
>> format version number of the symbol meta-information table,
>> and the section header index of the .strtab_meta string table.
>> The 32-bit encoding for the string table index is 24 bits, which
>> is smaller than the 32-bits allowed for section indexes when
>> extended section indexes are in play.
>>
>> I'm skeptical of putting version numbers in the sh_info
>> anyway. Perhaps the slot 0 of your section would be a better
>> place to stash that, since symbol 0 is special anyway.
>
> Thanks for raising this, I'll review it in our spec.

I failed to point out the biggest problem with putting
the version into the section header. At runtime, the runtime
linker finds data like this via .dynamic section entries that
point at the data. The section headers aren't used by the runtime
linker, so any data there is not visible.

The Solaris Solaris .SUNW_syminfo stashes the version in index 0:

https://docs.oracle.com/cd/E37838_01/html/E36783/chapter7-17.html

Index 0 is used to store the current version of the Syminfo table,
which is SYMINFO_CURRENT. Since symbol table entry 0 is always
reserved for the UNDEF symbol table entry, this usage does not
pose any conflicts.

And in <sys/link.h>

/*
* Syminfo version values.
*/
#define SYMINFO_NONE 0 /* Syminfo version */
#define SYMINFO_CURRENT 1
#define SYMINFO_NUM 2

As you see, we haven't needed to revise the original layout yet,
but if we did, we would give version 1 a new name, and rename
CURRENT to be 2, and then the runtime linker would gain some
new code to distinguish between them.



>> - SMT_LOCATION; Aren't things like this usually managed by
>> controlling segment mapping, or with an SHN_ABS symbols?
>
> In general, if you want to place a section at a specific address in GNU
> linker scripts you have to create a new output section for it, since that
> is the only way to specify an absolute address. So it appears
> unnecessarily complicated to the user, when they may just want to place
> one variable at some specific address.
>
> Perhaps extending the linker script language so that you can provide an
> address for an input section would also be beneficial...
>
> I'm not aware of how SHN_ABS can be leveraged by the programmer, I'd
> have to look into that.

I think I was wrong in suggesing ABS symbols would help you here.
ABS symbols allow giving an arbitrary value a name, but that's
not really what you're after.

In terms of how ELF linkers work, the way that one gets a variable at
a fixed address is to establish a loadable segment (PT_LOAD) that
covers that address, and then to assign the datum of interest to the
offset in that segment that would correspond to the target address.

Although I could imagine a compiler and link-editor conspiring to
create that segment automatically, rather than having the programmer
specify these things with a linker script (or in Solaris, a mapfile),
there are advantages to having the programmer specify it. Often, there's
more than one such datum, laid out next to each other in some manner,
and a single segment would work for all of them. Of course, I suppose
a sufficiently clever compiler could recognize that and combine them,


>
>>
>> - SMT_NOINIT: This seems like an arguable micro-optimization.
>> Zeroed bss is cheap, unless we're talking about massive
>> data, in which case it should probably be allocated at
>> runtime,
>
> Cycles are precious on MSP430, and allocating heap space adds slow down
> that many programmers don't want.
>
> This feature has actually been implemented for a while on MSP430 and
> recently on ARM, using a ".noinit" section.

Certainly embedded programming is different. On a multi-user OS, such
memory has to be zeroed for security reasons --- we don't want data
left over from other processes to somehow become visible for an
unrelated process. I wouldn't oppose a non-initialized version of
bss, but I think many (most?) systems would still zero it.

I presume that you can't just assign such data to the data segment
because you don't want the image to be larger? If you did, the
initialization would happen at link time rather than runtime, so
the cycles would not be precious. (I'm sure this question is a dead
giveaway that I'm not an embedded programmer, but I would think
that for small/moderate size data, it would work?).

Thanks for an interesting discussion.

- Ali

Jozef Lawrynowicz

unread,
Sep 1, 2020, 6:47:51 AM9/1/20
to gener...@googlegroups.com
Ah, I see. Clearly there are some additional factors to consider for
symbol metainfo's use with a dynamic linker, thanks.
The issue there is that "noinit" actually corresponds to two different
"special sections". There is a BSS type "noinit"
(currently implemented using ".noinit"), which the loader and startup code
ignores, and a DATA/RODATA hybrid type "noinit" (".persistent"), which the
loader initializes, but the startup code ignores.

"persistent" requires some type of writeable, non-volatile memory - MSP430 has
FRAM to make use of this.

The linker can work out which is appropriate from the meta-information
by just looking at whether the data object is in .data, .rodata or .bss.

So I instead considered proposing a new ELF symbol type such as STT_NOINIT,
however technically symbols with either STT_OBJECT or STT_COMMON could
require "noinit". Although with STT_COMMON not being used by many targets
anymore, perhaps STT_NOINIT could always imply an STT_OBJECT.

There's also the (pedantic) conceptual issue of implementing it as a
type - the symbol isn't a "noinit" type of symbol, it is a "data object"
type of symbol which should not be initialized.

>
> I presume that you can't just assign such data to the data segment
> because you don't want the image to be larger? If you did, the
> initialization would happen at link time rather than runtime, so
> the cycles would not be precious. (I'm sure this question is a dead
> giveaway that I'm not an embedded programmer, but I would think
> that for small/moderate size data, it would work?).

The size of the program image isn't a concern, but for initialization of
writeable data when the program is loaded, you would need
some special type of memory, like FRAM mention above.

But presumably the "noinit" functionality is mainly there so that you have some
pre-allocated space you can write to with values which are calculated at
runtime. When the value of the data is known you are technically not saving any
cycles whether the startup code initializes the data or your custom code does,
so "noinit" would not be required.

>
> Thanks for an interesting discussion.
>

Likewise!

Jozef
> - Ali
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/generic-abi/QPgYf3-_Iyw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to generic-abi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/e0e749e0-c01c-8484-1951-d2ce9c414080%40Oracle.COM.

Michael Matz

unread,
Sep 1, 2020, 11:54:03 AM9/1/20
to gener...@googlegroups.com
Hello,

On Tue, 1 Sep 2020, Jozef Lawrynowicz wrote:

> > > > - SMT_NOINIT: This seems like an arguable micro-optimization.
...
> > Certainly embedded programming is different. On a multi-user OS, such
> > memory has to be zeroed for security reasons --- we don't want data
> > left over from other processes to somehow become visible for an
> > unrelated process. I wouldn't oppose a non-initialized version of
> > bss, but I think many (most?) systems would still zero it.
>
> The issue there is that "noinit" actually corresponds to two different
> "special sections". There is a BSS type "noinit" (currently implemented
> using ".noinit"), which the loader and startup code ignores, and a
> DATA/RODATA hybrid type "noinit" (".persistent"), which the loader
> initializes, but the startup code ignores.
>
> "persistent" requires some type of writeable, non-volatile memory -
> MSP430 has FRAM to make use of this.
>
> The linker can work out which is appropriate from the meta-information
> by just looking at whether the data object is in .data, .rodata or .bss.
>
> So I instead considered proposing a new ELF symbol type such as
> STT_NOINIT, however technically symbols with either STT_OBJECT or
> STT_COMMON could require "noinit". Although with STT_COMMON not being
> used by many targets anymore, perhaps STT_NOINIT could always imply an
> STT_OBJECT.
>
> There's also the (pedantic) conceptual issue of implementing it as a
> type - the symbol isn't a "noinit" type of symbol, it is a "data object"
> type of symbol which should not be initialized.

This all points towards this being modelled as symbol type to be at the
wrong level. You already use sections for convey the dont-init-this at
runtime meaning. That would naturally lead to specifying this properly in
ELF: you need a new section type/flag, and a new program header type (or a
flag for PT_LOAD that expresses the fact that the bss-like difference
between file and memory length isn't to be zeroed but can be left alone).
Then you assign all these symbols to such section (as you're doing
already), and you're done.

I.e. I don't see why symbol meta information would be necessary for this
(or symbol types).


Ciao,
Michael.

Florian Weimer

unread,
Sep 1, 2020, 1:41:50 PM9/1/20
to gener...@googlegroups.com
* Michael Matz:

> This all points towards this being modelled as symbol type to be at the
> wrong level. You already use sections for convey the dont-init-this at
> runtime meaning. That would naturally lead to specifying this properly in
> ELF: you need a new section type/flag, and a new program header type (or a
> flag for PT_LOAD that expresses the fact that the bss-like difference
> between file and memory length isn't to be zeroed but can be left alone).
> Then you assign all these symbols to such section (as you're doing
> already), and you're done.

Yes, I agree, SMT_NOINIT as a per-symbol attribute does not give the
linker the necessary flexibility. A special section is required
anyway, so that the non-initialized symbols can be grouped together.

Fangrui Song

unread,
Sep 1, 2020, 2:56:54 PM9/1/20
to gener...@googlegroups.com
Linker scripts are good at describing sections but bad at describing
symbol metadata information. How would you encode Elf64_SymMetaInfo
transformation in a linker script?

(I took a quick glance at Solari's linkers and libraries guide.
SYMBOL_SCOPE / SYMBOL_VERSION Directives (enhanced version scripts?) may
be appealing.)
If you just aim for compatibility with old GNU tools:

GNU objcopy and GNU ld -r reset sh_link to 0 for an unrecognized section
type. SHT_LLVM_ADDRSIG actually relies on this artifact to interoperate
with potentially involved GNU tools in the build system. So you don't
need a hash to achieve the goal.

Jozef Lawrynowicz

unread,
Sep 2, 2020, 5:19:17 AM9/2/20
to gener...@googlegroups.com
Hi,
Thanks for the feedback, it does seem that formalizing the ".noinit" and
".persistent" sections, and instead relying on the compiler to place a
symbol with the "noinit" attribute in the correct section, could be the
way forward.

The GNU linker at least, does not need to do any additional handling of
"noinit" symbols if they have already been placed in ".noinit" or
".persistent" sections.

Jozef
>
>
> Ciao,
> Michael.
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/generic-abi/QPgYf3-_Iyw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to generic-abi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/alpine.LSU.2.20.2009011546220.21087%40wotan.suse.de.

Jozef Lawrynowicz

unread,
Sep 2, 2020, 5:38:13 AM9/2/20
to 'Fangrui Song' via Generic System V Application Binary Interface
We had not yet considered how to apply symbol metadata from within the
linker script. "SMT_RETAIN" metadata is easily applied using existing
linker script directives (e.g. KEEP() in the GNU linker). "SMT_LOCATION"
metadata could certainly benefit from having a simple interface in the
linker script, thanks for the pointer.
Yes that is very useful if we are only concerned with GNU tools, but
there could be other tools used for modifying the object, e.g. from
proprietary toolchains.

Does LLVM/Clang do anything similar with unrecognized section types?

Thanks,
Jozef
> --
> You received this message because you are subscribed to a topic in the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/generic-abi/QPgYf3-_Iyw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to generic-abi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/20200901185649.guyvs7x33oxwyz6r%40google.com.

Ali Bahrami

unread,
Sep 2, 2020, 12:24:17 PM9/2/20
to gener...@googlegroups.com
On 9/2/20 3:38 AM, Jozef Lawrynowicz wrote:
> (I took a quick glance at Solari's linkers and libraries guide.
> SYMBOL_SCOPE / SYMBOL_VERSION Directives (enhanced version scripts?) may
> be appealing.)

Not very important to this discussion, but what GNU calls
version scripts are basically the symbol versioning part of
the original Solaris mapfile language, which we now call
"version 1", with GNU additions.

The v1 mapfiles came with SysvR4. The symbol versioning part
of that was a Sun addition that had little connection to the
basic mapfile language from AT&T.

http://www.linker-aliens.org/blogs/ali/entry/the_problem_s_with_solaris/

Meanwhile, GNU implemented linker scripts, which are based (I think)
on SysvR3. As hard as it is to understand linker scripts, it's
hard to argue that GNU didn't dodge a bullet by ignoring SysvR4
mapfiles. Later, GNU adopted symbol versioning, based on the
Sun implementation.

https://lists.debian.org/lsb-spec/1999/12/msg00017.html

Even later, we blew up the old v1 mapfile language and designed
a new one, because the old one was just too horrible to extend,
and the inability to do that was blocking a large number of
things we really needed.

http://www.linker-aliens.org/blogs/ali/entry/a_new_mapfile_syntax_for/

There have been some more recent additions to support
name globbing (like GNU version scripts), regular
expressions. and symbol renaming:

http://www.linker-aliens.org/blogs/ali/entry/regex_and_glob_for_mapfiles/

That opened the door to being able to read some GNU version
scripts, which is handy for building FOSS.

Overall, the one part of the original v1 mapfiles that we were
relatively happy with was the symbol versioning part, and so,
SYMBOL_SCOPE and SYMBOL_VERSION are essentially the same thing,
but with a more uniform syntax. Thinking of them as enhanced
version scripts is pretty fair.

- Ali
Reply all
Reply to author
Forward
0 new messages