"B" extension and misa register?

David Weaver

unread,

Jun 2, 2022, 12:05:19 PM6/2/22

to RISC-V ISA Dev

Everything I earlier read suggested that misa[1] (its "B" bit) should be set to ‘1’ when the bit-manipulation extension is implemented. However, I was just verbally told that misa[1] should not be set, even if all subsets of “B” extension are implemented(!?), that there is or was going to be some other mechanism (that did not yet exist) for detecting which portions of the “B” extension are implemented.

That makes it sound like the whole "misa" story is unravelling and we need a more elaborate mechanism to identify which extension (and extension subsets) are implemented.

Can someone point me at a document that clarifies the current state of this? Thanks!

Greg Favor

unread,

Jun 2, 2022, 12:19:57 PM6/2/22

to David Weaver, RISC-V ISA Dev

On Thu, Jun 2, 2022 at 9:05 AM David Weaver <dweaver...@gmail.com> wrote:

Everything I earlier read suggested that misa[1] (its "B" bit) should be set to ‘1’ when the bit-manipulation extension is implemented. However, I was just verbally told that misa[1] should not be set, even if all subsets of “B” extension are implemented(!?),

"All subsets of B" is a moving target. For example, a first set of B extensions were ratified early last year, and then some further ones were ratified at the end of the year. And there could very well be more in the future.

that there is or was going to be some other mechanism (that did not yet exist) for detecting which portions of the “B” extension are implemented.

A "low-level unified discovery method" is being developed to support discovery of things like this and more (across the RISC-V ISA).

That makes it sound like the whole "misa" story is unravelling and we need a more elaborate mechanism to identify which extension (and extension subsets) are implemented.

It has been recognized that the number of letters in the alphabet pale in comparison to the number of extensions that have been ratified and will be ratified in coming years. (Plus the usefulness is unclear of, for example, a misa.B that is set only when some definition of "all" B extensions are supported.) Which is one of the motivations for development of the aforementioned standard discovery method. (The tech-config TG started development of this, but it is currently being re-org'ed a little and restarted.)

Greg

BGB

unread,

Jun 2, 2022, 3:08:07 PM6/2/22

to isa...@groups.riscv.org

On 6/2/2022 11:19 AM, Greg Favor wrote:
> On Thu, Jun 2, 2022 at 9:05 AM David Weaver <dweaver...@gmail.com>
> wrote:
>
> Everything I earlier read suggested that misa[1] (its "B" bit)
> should be set to ‘1’ when the bit-manipulation extension is
> implemented. However, I was just verbally told that misa[1]

> should /_not_/ be set, even if all subsets of “B” extension are

> implemented(!?),
>
>
> "All subsets of B" is a moving target. For example, a first set of B
> extensions were ratified early last year, and then some further ones
> were ratified at the end of the year. And there could very well be
> more in the future.
>
> that there is or was going to be some other mechanism (that did
> not yet exist) for detecting which portions of the “B” extension
> are implemented.
>
>
> A "low-level unified discovery method" is being developed to support
> discovery of things like this and more (across the RISC-V ISA).
>
> That makes it sound like the whole "misa" story is unravelling and
> we need a more elaborate mechanism to identify which extension
> (and extension subsets) are implemented.
>
>
> It has been recognized that the number of letters in the alphabet pale
> in comparison to the number of extensions that have been ratified and
> will be ratified in coming years. (Plus the usefulness is unclear of,
> for example, a misa.B that is set only when some definition of "all" B
> extensions are supported.) Which is one of the motivations for
> development of the aforementioned standard discovery method. (The
> tech-config TG started development of this, but it is currently being
> re-org'ed a little and restarted.)
>

Yeah, the use of letters doesn't scale very well.
It works for terse names and config strings, but has its limits.

The pattern seems to be that letters are pretty much used up, so nearly
everything else ends up as Zxxx and similar.

Could have maybe made it case-sensitive, and then say:
Lower letters represent the last letter of an extension name;
Upper case letters mean that more letters follow.
This would have allowed for multi-letter extension names (without
needing to prefix everything with 'Z').

Then in RISC-V where many of the extensions are very large and complex.
So, say, RV64IM or similar is reasonably modest.
Go much past this point, and the complexity of the extensions *explodes*.

Even something like 'A': To implement A, one needs to deviate from
strict Load/Store, because it has operations which operate directly on
memory. These also go beyond what is strictly needed for implementing
spinlocks or mutexes.

The 'F' and 'D' extensions add FPU registers, along with a whole
boatload of new instructions beyond the minimum required for
floating-point math, eg:
ADD/SUB/MUL, compare, and type-conversion.

Zfinx and Zdinx are at least a little more reasonable here.
Nevermind issues of whether or not a minimal FPU would be standards
conformant.

Another option would be dividing extensions into smaller pieces, and
then identifying them with short names or FOURCC's, though how to probe
for extensions is less clear.

Bit Masks:
Requires selecting into N bit-mask registers, N likely to get ever
bigger as the number of extensions increases.

This doesn't scale ideally, and requires some centralized way to define
which bits correspond to which extensions.

Array of 16-bit magic numbers (another possibility):
Much less dense than bit-masks;
Likely to effectively require a specialized ROM space (potentially kB).

Array of FOURCC's:
Basically the above, just with FOURCC's, needing twice the storage.

Both 16-bit numbers and FOURCC's still have the drawback of parties
needing to make an effort to avoid clashes.
This still implies having a registry somewhere of known extension IDs,
however parties can operate mostly independently with a relatively low
probability of FOURCC clash.

However, FOURCC's are at least a bit more "open ended" than bit flags or
similar.

However, needing to avoid clashes in terms of opcode assignments is
still a problem, as this faces a similar issue to the bit-flags.
Sadly, there is no real viable way to "FOURCC the opcode space".

Well, unless maybe someone did something like an "encoding space
bank-select" mechanism:
Load magic numbers which specify encodings into some special registers;
Which encodings are interpreted is changed based on the magic numbers;
Program then dynamically reloads these special registers based on
which instructions it wants to use.
However, this would suck...

I once did something like this as an early attempt at a 64-bit version
of SuperH, which could bank-select between the 32-bit encodings and some
modified encodings for working with 64-bit values. This approach was
awkward and would have wasted a considerable number of clock cycles
mostly switching between sub-modes.

I later abandoned this for a partial redesign of the encoding space,
which lost some features, but gained the ability to fit everything into
a single encoding space (but would have still required mode-switching to
be able to run 32-bit code). However, this still turned into a horrible
mess (*1).

I eventually "hit the reset button" on the instruction encodings, which
effectively led to my current ISA design (after a few more encoding
tweaks and reorganizations). Though, at this point it is pretty much
beaten beyond recognition.

*1: Its original parent ISA was SH-4, which had fixed-length 16-bit
instructions. In some cases, prefixes instructions were used to modify
the following instruction, but this led to a bunch of separate and
non-orthogonal encoding spaces.

The original form of my current ISA was built around a similar concept:
16/32, with the 32-bit space as prefixed 16-bit instructions;
However, this later changed, with the 32-bit encodings redefined as
the canonical ISA format;
And, then switched around encodings such that encodings were more
consistent between encoding blocks;
Effectively abandoning the concept that the 32-bit encodings were
modifier prefixes.

Though, most of my encodings ended up based around a ZZnm scheme rather
ZnmZ (as in SuperH), but whether or not this was the best option in
retrospect is debatable (say, if one has a few "Znii" encodings, then Rn
is no longer in the same place).

Had I stuck with the original SuperH layout, then most of the 32-bit
instructions would likely have (eventually) ended up as:
ZnmZ_ZeoZ rather than ZZnm_ZeoZ (mostly organized around 4-bit
fields; with each character representing a hex digit field).

Still not exactly perfect, but sorta works I guess...

Still ended up with 16-bit encodings which were a little less "dog
chewed" compared with the RISC-V 'C' extension though...
Like, someone looked at Thumb and was like "hold my beer" or something...

Personally though, I am not particularly a fan of instruction formats
which look like a dog's chew toy.

But, alas...

> Greg
>
> --
> You received this message because you are subscribed to the Google
> Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to isa-dev+u...@groups.riscv.org.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2BQh7TnFi1M-eNG0js5N51h7i36F5ukFko0fHCLE6G_-YY7bKw%40mail.gmail.com
> <https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2BQh7TnFi1M-eNG0js5N51h7i36F5ukFko0fHCLE6G_-YY7bKw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Reply all

Reply to author

Forward