New arch-independent OS-independent PT_* segment type for memory tags

53 views
Skip to first unread message

Luis Machado

unread,
Oct 25, 2021, 8:32:45 AM10/25/21
to Generic System V Application Binary Interface
Hi,

AArch64/CHERI needs to eventually dump memory tags/capability tags to a core file. After some discussion, it seems the best way to do it is through a new PT_* segment type.

It seems a generic arch-independent and OS-independent PT_MEMTAG constant is the best option for this use case. Then each arch/OS will define a particular subtype that suits their needs (by using the p_flags field).

For example, AArch64 will have a PT_MEMTAG + PT_ARM_MEMTAG_MTE pair, Sparc may have a PT_MEMTAG + PT_SPARC_MEMTAG_ADI pair and CHERI may have a PT_MEMTAG + PT_CHERI_MEMTAG_CAPABILITY type.

Given the above, could we please have one such constant allocated that will be named PT_MEMTAG?

Regards,
Luis

Florian Weimer

unread,
Oct 25, 2021, 9:08:03 AM10/25/21
to Luis Machado, Generic System V Application Binary Interface
* Luis Machado:

> It seems a generic arch-independent and OS-independent PT_MEMTAG
> constant is the best option for this use case.

But the segment contents would still be architecture-specific?

Thanks,
Florian

Luis Machado

unread,
Oct 25, 2021, 10:46:29 AM10/25/21
to gener...@googlegroups.com, Florian Weimer
Yes. The contents would be arch-specific/os-specific dumps of memory tags.

The goal of having a generic PT_MEMTAG segment type is to allow tools to
handle/display the segment without having to teach such tools about all
the possible subtypes and arch-specific/os-specific knowledge.

One could dump raw hex and be able to use the information, for example.
The entries would always be PT_MEMTAG, as opposed to PT_ARM_MEMTAG_MTE,
PT_SPARC_MEMTAG_ADI and so on.

Peter Collingbourne

unread,
Oct 25, 2021, 8:32:47 PM10/25/21
to gener...@googlegroups.com
Hi Luis,

I'm not sure I understand this proposal. p_flags is a set of flags prefixed PF_*, so I don't think you can OR PT_* constants into it. Or do you intend for architectures to do e.g.

#define PF_ARM_MEMTAG_MTE 8

In that case I think we will need to specify an arch-specific carveout to the flag space in p_flags.

Also what would be the relationship between the p_*sz fields and the memory being described? MTE for example has (depending on representation) a 32-to-1 or 16-to-1 relationship between tags and data, so for example we could say that p_filesz contains the size of the metadata representation and p_memsz contains the size of the memory being described (e.g. for MTE, p_memsz could be 16 * p_filesz) but I think this would need to be specified in order for this new segment type to be useful for generic tools.

Peter

Luis Machado

unread,
Oct 25, 2021, 9:11:39 PM10/25/21
to gener...@googlegroups.com
Hi Peter,

On 10/25/21 9:32 PM, 'Peter Collingbourne' via Generic System V
Application Binary Interface wrote:
> On Mon, Oct 25, 2021 at 5:32 AM Luis Machado <luis.m...@linaro.org
> <mailto:luis.m...@linaro.org>> wrote:
>
> Hi,
>
> AArch64/CHERI needs to eventually dump memory tags/capability tags
> to a core file. After some discussion, it seems the best way to do
> it is through a new PT_* segment type.
>
> It seems a generic arch-independent and OS-independent PT_MEMTAG
> constant is the best option for this use case. Then each arch/OS
> will define a particular subtype that suits their needs (by using
> the p_flags field).
>
> For example, AArch64 will have a PT_MEMTAG + PT_ARM_MEMTAG_MTE pair,
> Sparc may have a PT_MEMTAG + PT_SPARC_MEMTAG_ADI pair and CHERI may
> have a PT_MEMTAG + PT_CHERI_MEMTAG_CAPABILITY type.
>
> Given the above, could we please have one such constant allocated
> that will be named PT_MEMTAG?
>
>
> Hi Luis,
>
> I'm not sure I understand this proposal. p_flags is a set of flags
> prefixed PF_*, so I don't think you can OR PT_* constants into it. Or do
> you intend for architectures to do e.g.

Let me try to clarify things. I originally envisioned dumping memory
tags (for ARM MTE) as an additional note type (NT_MEMTAG). But, although
this is convenient to do from the debugger's perspective, the kernel
developers consider the use of a new PT_* segment more appropriate. Plus
we have some type size limitations with NT_* headers when handling
64-bit memory spaces.

The initial idea was to have arch-specific PT_* segments. For example,
PT_ARM_MEMTAG_MTE for ARM MTE and PT_SPARC_MEMTAG_ADI for SPARC's ADI.

For the CHERI architecture though, the new PT_* segment isn't
arch-specific and maybe not even OS-specific. But it does have
differences between 32-bit and 64-bit architectures, where 32-bit
architectures have a 64-bit stride for each CHERI capability tag and
64-bit architectures have a 128-bit stride.

Even though ARM MTE and SPARC ADI should be fine with an arch-specific
p_type and then deriving the required information from this p_type,
CHERI wouldn't work so well. I mean, we could define
PT_CHERI_MEMTAG_TAG_64 and PT_CHERI_MEMTAG_TAG_128, but having multiple
such p_type constants seems not ideal. It is a bit difficult to manage
these constants and avoid overlaps.

Then came the idea of having a single new p_type (PT_MEMTAG) and then
attempting to encode the metadata by (ab)using some of the unused
fields, like p_flags and potentially p_offset.

p_flags would hold a subtype that tells us more information about a
particular memory tagging mechanism. For CHERI, we could use that to
store the stride length, for example.

So...

>
> #define PF_ARM_MEMTAG_MTE 8

Yes. We'd use p_flags as a way to store additional metadata for a
particular PT_MEMTAG segment. We wouldn't use the p_flags field for
flags at all.

>
> In that case I think we will need to specify an arch-specific carveout
> to the flag space in p_flags.

I suppose. We already define various such carveouts for the register
sets. Maybe we could use that?

>
> Also what would be the relationship between the p_*sz fields and the
> memory being described? MTE for example has (depending on

p_filesz holds the amount of tag bytes. For MTE we're packing the tags
2-per-byte, so the relationship is 32-to-1. For CHERI 128-bit
architectures we'd have a relationship of 128-to-1.

p_memsz holds the original length of the memory range that contained the
tags. We need this so debuggers can navigate through the various memory
ranges to extract tags.

> representation) a 32-to-1 or 16-to-1 relationship between tags and data,
> so for example we could say that p_filesz contains the size of the
> metadata representation and p_memsz contains the size of the memory
> being described (e.g. for MTE, p_memsz could be 16 * p_filesz) but I
> think this would need to be specified in order for this new segment type
> to be useful for generic tools.
If we want to get to that level of usability, yes. Honestly I just
wanted tools like readelf to not dump some mysterious "PT_LOPROC +
<constant>" information. If we see a PT_MEMTAG output, that tells us
pretty clearly what the contents are.

Also, if we decide to stablish a relationship between p_filesz and
p_memsz, that takes away some of the freedom for implementors to
compress tag data with whatever mechanism they think is appropriate.

While use cases for MTE usually don't require a lot of tagged regions,
CHERI architectures will have lots and lots of capability tags
throughout memory. So compression may be useful there.

Given this is still under discussion, feedback would be greatly
appreciated. I feel it would be best if we could have a single mechanism
to describe all/most of the memory tag mechanisms we have available so
far, as opposed to having each system define their own constants.

Hopefully I managed to answer your questions.

Michael Matz

unread,
Oct 27, 2021, 11:05:52 AM10/27/21
to 'Peter Collingbourne' via Generic System V Application Binary Interface
Hello,

On Mon, 25 Oct 2021, 'Peter Collingbourne' via Generic System V
Application Binary Interface wrote:

> On Mon, Oct 25, 2021 at 5:32 AM Luis Machado <luis.m...@linaro.org>
> wrote:
>
> > Hi,
> >
> > AArch64/CHERI needs to eventually dump memory tags/capability tags to a
> > core file. After some discussion, it seems the best way to do it is through
> > a new PT_* segment type.
> >
> > It seems a generic arch-independent and OS-independent PT_MEMTAG constant
> > is the best option for this use case. Then each arch/OS will define a
> > particular subtype that suits their needs (by using the p_flags field).
> >
> > For example, AArch64 will have a PT_MEMTAG + PT_ARM_MEMTAG_MTE pair, Sparc
> > may have a PT_MEMTAG + PT_SPARC_MEMTAG_ADI pair and CHERI may have a
> > PT_MEMTAG + PT_CHERI_MEMTAG_CAPABILITY type.
> >
> > Given the above, could we please have one such constant allocated that
> > will be named PT_MEMTAG?
> >
>
> Hi Luis,
>
> I'm not sure I understand this proposal. p_flags is a set of flags prefixed
> PF_*, so I don't think you can OR PT_* constants into it. Or do you intend
> for architectures to do e.g.
>
> #define PF_ARM_MEMTAG_MTE 8
>
> In that case I think we will need to specify an arch-specific carveout to
> the flag space in p_flags.

That's how I understood Luis' proposal (with the "PT_MEMTAG + type" merely
being the syntax used by readelf for printing; i.e. it could also be
"PT_MEMTAG(type)" or suchlike). Note that there are already CPU and OS
carveouts for p_flags:

#define PF_MASKOS 0x0ff00000 /* OS-specific */
#define PF_MASKPROC 0xf0000000 /* Processor-specific */

It would seem to me that with the proposal 8/4 bits are enough to specify
the real type.

> Also what would be the relationship between the p_*sz fields and the
> memory being described? MTE for example has (depending on
> representation) a 32-to-1 or 16-to-1 relationship between tags and data,
> so for example we could say that p_filesz contains the size of the
> metadata representation and p_memsz contains the size of the memory
> being described (e.g. for MTE, p_memsz could be 16 * p_filesz) but I
> think this would need to be specified in order for this new segment type
> to be useful for generic tools.

I would assume most segment header fields should describe ELF container
properties. About the only one that might be open to interpretation would
be p_memsz. That interpretation would be either per
(say) p_flags.pf_tag_type or just be the obvious choice: same as p_filesz.
I guess some of these tag byte blobs would also be self-describing (i.e.
contain a header), so that the info from the ELF segment header isn't
too important to specify per arch/os.

I imagine that it would be enough for the proposal to simply say that
p_memsz is implementation defined or the same as p_filesz.


Ciao,
Michael.

Cary Coutant

unread,
Oct 28, 2021, 6:07:07 PM10/28/21
to Generic System V Application Binary Interface
I'm unconvinced that PT_MEMTAG makes sense as a standard gABI feature,
but even as an extension, I'm worried about the direction you're
going.

Using a single type code to identify a segment that has similar
purpose but not a common interpretation -- i.e., if the best a dumper
can do is hex dump the contents -- doesn't make sense to me. Further,
there really isn't an established mechanism for using a hierarchical
type/subtype, and repurposing the p_flags field as a subtype is (I
think) a bad idea: now, you have to teach the dumpers that, for this
segment type, the p_flags field isn't actually a flags field. Using
any of the other fields isn't any better, unless you can infer what
you need to know from the natural uses of those fields -- for example,
from the relationship between p_filesz and p_memsz.

Based on what I understand from this thread so far, I'd suggest
reserving a value in the LOPROC/HIPROC space that can be shared across
all processors (it's a big space). In the processor-specific space,
readers can then check the e_machine value to further disambiguate the
format (essentially serving as the subtype), and you'll have the
common PT_ value you want. From what you said earlier, it sounds like
the 64- vs. 128-bit aspect can be inferred from the ELF file container
size (ELFCLASS32 vs. ELFCLASS64).

Even if you do need more than one PT_ value, I really don't see why
it's such a big inconvenience.

However, if you do pick a single value in the processor-specific range
to be shared across all processor architectures, I think I'd be
willing to list that in the ELF spec. I think it's a reasonable place
to document inter-architecture commonalities, even if they aren't
gABI-blessed.

-cary
> --
> You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/generic-abi/16bfdf1b-bfc9-517f-9565-0e02fe518795%40linaro.org.

Luis Machado

unread,
Oct 28, 2021, 8:06:18 PM10/28/21
to gener...@googlegroups.com, Cary Coutant
Hi Cary,

On 10/28/21 7:06 PM, Cary Coutant wrote:
> I'm unconvinced that PT_MEMTAG makes sense as a standard gABI feature,
> but even as an extension, I'm worried about the direction you're
> going.

The number of architectures supporting some form of memory tagging is
still not that great for sure. There are 2 concrete examples (ARM's MTE
and SPARC's ADI) and a few research examples with CUCL's CHERI
architecture and ARM's Morello where tags are used as part of capabilities.

I have discussed this with developers from ARM, Oracle (to some extent)
and CHERI. There seems to be agreement on using a new segment type for
dumping the tags.

However, there is less consensus on whether we should use a new segment
type for each individual variant of memory tagging system or if we
should go with a single common segment type like PT_MEMTAG.

Personally I like the use of a common PT_MEMTAG segment. But I'm open to
changing my mind about it

>
> Using a single type code to identify a segment that has similar
> purpose but not a common interpretation -- i.e., if the best a dumper
> can do is hex dump the contents -- doesn't make sense to me. Further,

I suppose that is a bit subjective. Having segments with a similar
purpose and semantics can be enough to get them grouped together under a
common type. It also simplifies the implementation, as the code that
handles the common type will be the same. Arch-specific code can then
dump a blob of data to the PT_MEMTAG segment, formatted/compressed in an
appropriate way.

But I see where this may not be generic enough to go into the gABI. The
contents and potentially some of the fields would be opaque with
arch-specific meaning.

> there really isn't an established mechanism for using a hierarchical
> type/subtype, and repurposing the p_flags field as a subtype is (I
> think) a bad idea: now, you have to teach the dumpers that, for this
> segment type, the p_flags field isn't actually a flags field. Using

I agree, and that's not ideal. The structure lacks flexibility to be
able to hold metadata properly, and that's why I'd like to get some
level of consensus. If (ab)using these fields is not an option, then I'd
like to discuss other ways to accommodate the needs for architectures
for dumping memory tags.

> any of the other fields isn't any better, unless you can infer what
> you need to know from the natural uses of those fields -- for example,
> from the relationship between p_filesz and p_memsz.

When I mentioned tools that can list/dump these PT_MEMTAG segments, I
had objdump/readelf in mind, tools that provide a quick way to see the
contents of a particular ELF file.

I don't think there is value in interpreting the tags contained in a
PT_MEMTAG segment, in which case these tools wouldn't need to go
reinterpreting p_flags/p_offset etc. But developers might find it useful
if the tools show a dump and at least a readable name for the segment.
(PT_LOPROC + 0x00000001) certainly isn't a desirable name.

It is also not desirable to have to build a cross tool so it can show
the name of a particular segment properly.

On inferring the data from p_filesz/p_memsz and other fields, that might
work. But dictating those relationships removes some of the flexibility
for architectures to compress the tag dump as they see fit. Then again,
it might be a bit of overengineering.

If we can make it so all the fields have well-defined interpretations,
like inferring the tag stride from p_memsz/p_filesz, do you think it
would be acceptable?

>
> Based on what I understand from this thread so far, I'd suggest
> reserving a value in the LOPROC/HIPROC space that can be shared across
> all processors (it's a big space). In the processor-specific space,
> readers can then check the e_machine value to further disambiguate the
> format (essentially serving as the subtype), and you'll have the
> common PT_ value you want. From what you said earlier, it sounds like
> the 64- vs. 128-bit aspect can be inferred from the ELF file container
> size (ELFCLASS32 vs. ELFCLASS64).

While that might work for most architectures, it wouldn't work for an
architecture that supports multiple types of memory tags. You could have
ELFCLASS64 / PT_MEMTAG (say, PT_LOPROC + 1) / EM_AARCH64 and still have
to disambiguate between, say, two different types of memory tags (MTE
and capability tags, for example). It gets more complex to determine the
final memory tag dump type, since you need to check more fields. I think
it is more error prone in the end.

Alternatively, we could drop the common type and start populating the
PT_LOPROC range with multiple small scope constants. As you said, it is
a big space. For example:

PT_LOPROC + 1 for ARM's MTE
PT_LOPROC + 2 for SPARC's ADI
PT_LOPROC + 3 for CHERI's capability tags
...

That means we need to update the code whenever a new constant is added
to the list. But otherwise the types are pretty clear and non-ambiguous.
And you don't need to (ab)use the program header structure's fields.

>
> Even if you do need more than one PT_ value, I really don't see why
> it's such a big inconvenience.

I wouldn't say it is a big inconvenience, but seems cleaner for me.
Maybe also a subjective matter.

>
> However, if you do pick a single value in the processor-specific range
> to be shared across all processor architectures, I think I'd be
> willing to list that in the ELF spec. I think it's a reasonable place
> to document inter-architecture commonalities, even if they aren't
> gABI-blessed.

That's good to know. Before we get anything upstreamed, I think it is
best to get this potentially documented in the gABI. After the constants
are picked and upstreamed, it gets more difficult to modify the design.

Florian Weimer

unread,
Oct 29, 2021, 4:31:27 AM10/29/21
to Luis Machado, gener...@googlegroups.com, Cary Coutant
* Luis Machado:

> Hi Cary,
>
> On 10/28/21 7:06 PM, Cary Coutant wrote:
>> I'm unconvinced that PT_MEMTAG makes sense as a standard gABI feature,
>> but even as an extension, I'm worried about the direction you're
>> going.
>
> The number of architectures supporting some form of memory tagging is
> still not that great for sure.

We have more page-associated data that isn't carried over into core
dumps: protection keys (POWER and x86), sub-page protection flags (POWER
and probably others), and various mmap and madvice settings.

So it's only rare if you restrict yourself to memory tagging in the very
narrow sense.

Thanks,
Florian

Luis Machado

unread,
Oct 29, 2021, 8:40:37 AM10/29/21
to Florian Weimer, gener...@googlegroups.com, Cary Coutant
That's true. But no common definition has been put in place so far.
Usually extra data gets dumped to notes, but memory tags have
potentially much more data and similarities to PT_LOAD segments that it
seems to make sense to have a common definition.

Are you implying we should broaden the scope of these new segments to
include the other data you listed above?

Florian Weimer

unread,
Oct 29, 2021, 8:44:09 AM10/29/21
to Luis Machado, gener...@googlegroups.com, Cary Coutant
* Luis Machado:

> That's true. But no common definition has been put in place so
> far. Usually extra data gets dumped to notes, but memory tags have
> potentially much more data and similarities to PT_LOAD segments that
> it seems to make sense to have a common definition.
>
> Are you implying we should broaden the scope of these new segments to
> include the other data you listed above?

I think it would make sense to define a generic way to associate
additional data with a load segment. At least the load
segment/ancillary data association is fully generic.

Thanks,
Florian

Luis Machado

unread,
Oct 29, 2021, 8:51:28 AM10/29/21
to gener...@googlegroups.com, Florian Weimer, Cary Coutant
Making things more generic runs into the lack of flexibility of the
program header structure. It would probably require a header in the data
dumps, something we have been trying to avoid due to the extra complexity.

Florian Weimer

unread,
Oct 29, 2021, 8:59:49 AM10/29/21
to Luis Machado, gener...@googlegroups.com, Cary Coutant
* Luis Machado:
Right, that's why suggest to make at leeast the way of associating
things generic.

We could add a new PT_LOAD_INFO segment that contains a the number of a
PT_LOAD segment, followed by a nested tag, followed by arbitrary data
described by that nested tag.

Thanks,
Florian

Ali Bahrami

unread,
Nov 2, 2021, 12:57:20 AM11/2/21
to gener...@googlegroups.com
[I sent a reply a couple of days ago, but botched the
"from" address, so it didn't register, and I didn't get
a bounce. Sorry for the delay]

The proposal is for a generic PT_MEMTAG program header,
where flags are assigned to indicate the actual memtag type.
The examples given were ARM, SPARC, and CHERI, but there's
the possibility of more, plus it's been suggested that a
given architecture might support more than one of these.
And the use of the generic type is expected to be that
dumping programs would use that as a hint to show an
octal/hex dump if they don't understand the actual format
well enough to decode it?

I have several thoughts/questions:

- The ELF dumper that I maintain (elfdump)
produces hex dumps for a few things that it
doesn't understand. In my experience, this
is pretty useless to the reader, and in fact,
I've been slowly eliminating it where I can.
I don't think that's a very compelling selling
point.

- I don't think ELF should carry around types or
flags solely to give dump programs hints. In
the rare case where such output would be useful,
there's nothing wrong with the dump program having
a specific check for it.

- There aren't that many flag bits, and using a
bunch of them for sub-memtag types means that
we could run out of flag room eventually.

- I'm confused about how a given processor would
support more than one such memtag format. Surely
that's unlikely in most cases, so burdening all
implementations with all those flags seems
high overhead.

If there's a way to generalize these notions so that one
format (no subtype) could handle all of them, and any likely
other ones that come along, that a generic PT_MEMTAG would
make sense. That seems like a tall order though --- there's
such a thing as over generalization, plus we just can't know
what sort of crazy new invention is just around the corner
that would not be able to conform.

If however, these are really different (though conceptually
similar) records, then I see no real benefit to pulling them
together in one generic type, and then introducing sub-types.
In such a case, it's better for each psABI to just define
what it needs directly, as a PT_xxx_ (where xxx is
initially ARM, SPARC, and CHERI) value in the LOPROC/HIPROC.
Platforms that support more than one can define more than
one such PT_xxx value.

- Ali

Luis Machado

unread,
Nov 2, 2021, 9:48:37 AM11/2/21
to gener...@googlegroups.com, Ali Bahrami
Hi,

Thanks for the feedback!

On 11/2/21 1:57 AM, Ali Bahrami wrote:
> [I sent a reply a couple of days ago, but botched  the
> "from" address, so it didn't register, and I didn't get
> a bounce. Sorry for the delay]
>
>    The proposal is for a generic PT_MEMTAG program header,
> where flags are assigned to indicate the actual memtag type.
> The examples given were ARM, SPARC, and CHERI, but there's
> the possibility of more, plus it's been suggested that a
> given architecture might support more than one of these.
> And the use of the generic type is expected to be that
> dumping programs would use that as a hint to show an
> octal/hex dump if they don't understand the actual format
> well enough to decode it?
>
> I have several thoughts/questions:
>
>     - The ELF dumper that I maintain (elfdump)
>       produces hex dumps for a few things that it
>       doesn't understand. In my experience, this
>       is pretty useless to the reader, and in fact,
>       I've been slowly eliminating it where I can.
>       I don't think that's a very compelling selling
>       point.

That's fair, but my goal with a generic PT_MEMTAG p_type is not the hex
dump itself, but rather showing a helpful name like as opposed to
something like PT_LOPROC + 0x1.

If you have a single p_type, then it means the code only needs to be
updated once. Otherwise, with every new p_type, you need to update the
code to handle that. More than once I've seen tools lagging behind and
not supporting the latest constants (GNU objdump/readelf for example).

It's not a big issue, but an unecessary burden.

Besides that, debuggers are the tools that need to know most of these
details, since they need to read/write this data. If we pick multiple
p_type's, then it is more complexity for debuggers to try to figure out
what kind of data they are working with.

>
>     - I don't think ELF should carry around types or
>       flags solely to give dump programs hints. In
>       the rare case where such output would be useful,
>       there's nothing wrong with the dump program having
>       a specific check for it.

As I mentioned before, this is not solely for dump programs. Actually,
the debuggers are the most important tools in this particular situation.
The example about dumping/showing the p_type is just a small improvement
for convenience.

>
>     - There aren't that many flag bits, and using a
>       bunch of them for sub-memtag types means that
>       we could run out of flag room eventually.

That's fair, but I find it unlikely we will run out of flag bits for
this particular purpose. And I'm not so sure using the field as flags is
the best solution. It would have to carry a constant rather than a flag bit.

>
>     - I'm confused about how a given processor would
>       support more than one such memtag format. Surely
>       that's unlikely in most cases, so burdening all
>       implementations with all those flags seems
>       high overhead.

One such example is the Morello (CHERI) architecture, with capability
tags, being able to support MTE tags as well. Though unlikely, I don't
think it is reasonable to just ignore this data point.

>
> If there's a way to generalize these notions so that one
> format (no subtype) could handle all of them, and any likely
> other ones that come along, that a generic PT_MEMTAG would
> make sense. That seems like a tall order though --- there's
> such a thing as over generalization, plus we just can't know
> what sort of crazy new invention is just around the corner
> that would not be able to conform.

I agree that over generalization can be bad, but so is forcing code
duplication and maintenance for the sake of not generalizing even a
little. Some tools have departed from the usual C, so increased
generalization (not over generalization) is getting more and more common.

>
> If however, these are really different (though conceptually
> similar) records, then I see no real benefit to pulling them
> together in one generic type, and then introducing sub-types.
> In such a case, it's better for each psABI to just define
> what it needs directly, as a PT_xxx_ (where xxx is
> initially ARM, SPARC, and CHERI) value in the LOPROC/HIPROC.
> Platforms that support more than one can define more than
> one such PT_xxx value.

Sure. We have considered that approach. It is actually easier to go that
route, but we may end up with code that is not as easily maintainable as
it could have been.

The risk of using multiple very specific constants is having duplicated
code to handle essentially the same semantics. Whether duplication is
good or bad is subjective, of course. Your mileage may vary.

>
> - Ali
>

Ali Bahrami

unread,
Nov 2, 2021, 6:12:41 PM11/2/21
to Luis Machado, gener...@googlegroups.com

On 11/2/21 7:48 AM, Luis Machado wrote:
> That's fair, but my goal with a generic PT_MEMTAG p_type is not the hex dump itself, but rather showing a helpful name like as opposed to something like PT_LOPROC + 0x1.
>
> If you have a single p_type, then it means the code only needs to be updated once. Otherwise, with every new p_type, you need to update the code to handle that. More than once I've seen tools
lagging behind and not supporting the latest constants (GNU objdump/readelf for example).

OK, I see. Don't you think though, that if a p_type shows
up as a hex number, and that bothers more than a couple of
people, that the tool that displays it would get fixed pretty
quickly? If not, doesn't that tend to suggest that it's not
really a problem that bothers anyone? Given that recognizing
such sections well enough to print a proper name is little
more than an entry in a table, or a switch statement, I think
it's going to get fixed quickly.

Going a step further, why shouldn't the person introducing
that new type also fix up things like objdump/readelf and
submit patches to their maintainers?


> It's not a big issue, but an unecessary burden.

>
> Besides that, debuggers are the tools that need to know most of these details, since they need to read/write this data. If we pick multiple p_type's, then it is more complexity for debuggers to try
to figure out what kind of data they are working with.

I don't see any difference to the cost for debuggers.

In either case, they need to locate the data, and understand
it well enough to use it. Finding it as a separate psABI
section, or as a subtype of a generic one, seems about
the same amount of work.

Perhaps a bad analogy, but whether we have 3 boxes of "stuff",
or 3 sub-boxes nested inside an outer box, the stuff in each
box is roughly the same, and maintaining it would be about
the same amount of work in each case.


> Sure. We have considered that approach. It is actually easier to go that route, but we may end up with code that is not as easily maintainable as it could have been.

Can you say more about why processing data from a standalone
section is less maintainable than processing data that comes
as a sub-type of a common memtag section? I really don't see
much of a hit to maintenance there. In either case, the way
to make the code maintainable is to provide a single implementation
that can be called by multiple consumers.

>
> The risk of using multiple very specific constants is having duplicated code to handle essentially the same semantics. Whether duplication is good or bad is subjective, of course. Your mileage may vary

I don't like code duplication, but I think a reference implementation
(libmemtag?) that provides support for the various flavors, possibly
using common code where appropriate internally, might let you process
these sections from common code, while maintaining distinct ELF
types in the psABI.

Thanks!

- Ali

Luis Machado

unread,
Nov 15, 2021, 7:18:33 AM11/15/21
to Generic System V Application Binary Interface
Hi,

Thanks for all the feedback. We've decided to go with multiple
arch-specific constants from the PT_LOPROC/PT_HIPROC range
(PT_ARM_MEMTAG_MTE, PT_CHERI_MEMTAG etc) as opposed to a single generic
constant in that range (PT_MEMTAG).

Do we need to reserve them (to prevent clashes) or should we just define
the ones we need and use them?
> --
> You received this message because you are subscribed to the Google
> Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to generic-abi...@googlegroups.com
> <mailto:generic-abi...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/generic-abi/117e5a77-c5ec-4187-9719-289f79298122n%40googlegroups.com
> <https://groups.google.com/d/msgid/generic-abi/117e5a77-c5ec-4187-9719-289f79298122n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Florian Weimer

unread,
Nov 15, 2021, 7:38:06 AM11/15/21
to Luis Machado, Generic System V Application Binary Interface
* Luis Machado:

> Do we need to reserve them (to prevent clashes) or should we just
> define the ones we need and use them?

I think you need to document them in the psABI supplement for the
architecture, so there's no gABI impact.

It would also be nice to submit patches to the various elf.h curators
(binutils, glibc in the GNU context, perhaps elfutils). This will
inform those projects that dumpers need updating to decode those
constants.

Thanks,
Florian

Luis Machado

unread,
Nov 15, 2021, 7:42:32 AM11/15/21
to gener...@googlegroups.com, Florian Weimer
On 11/15/21 9:37 AM, Florian Weimer wrote:
> * Luis Machado:
>
>> Do we need to reserve them (to prevent clashes) or should we just
>> define the ones we need and use them?
>
> I think you need to document them in the psABI supplement for the
> architecture, so there's no gABI impact.

That's planned. I wanted to make sure we didn't have to coordinate
things with the gABI first. Thanks for confirming.

>
> It would also be nice to submit patches to the various elf.h curators
> (binutils, glibc in the GNU context, perhaps elfutils). This will
> inform those projects that dumpers need updating to decode those
> constants

Will do. I'll handle GDB and Binutils myself, but will also get in touch
with other consumers/users so everything is in sync. Kernel changes are
also on the way.

Mark Wielaard

unread,
Nov 15, 2021, 7:43:45 AM11/15/21
to gener...@googlegroups.com, Luis Machado
On Mon, 2021-11-15 at 13:37 +0100, Florian Weimer wrote:
> I think you need to document them in the psABI supplement for the
> architecture, so there's no gABI impact.
>
> It would also be nice to submit patches to the various elf.h curators
> (binutils, glibc in the GNU context, perhaps elfutils).

No need to submit a patch for elfutils specifically. elfutils will sync
with the glibc elf.h update when it happens.

Cheers,

Mark
Reply all
Reply to author
Forward
0 new messages