lightweight hart id

ron minnich

unread,

Nov 16, 2016, 4:57:28 PM11/16/16

to RISC-V ISA Dev

on many CPUs getting a core id involves reading a register. riscv envisions, as I understand it, an SBI call.

Any chance of making mhartid readonly at lower privelege levels? As you may have noticed I'm not a huge an of requiring any but the minimum sbi calls, for a number of reasons. I don't see the need to get the value of such a basic thing via an sbicall. Just asking.

Another option, given that the SBI address seems to be on part of a page, is to dump per-hart config strings on the page containing the SBI vector.

Or a better idea than any of these is fine too :-)

Andrew Waterman

unread,

Nov 16, 2016, 5:06:35 PM11/16/16

to ron minnich, RISC-V ISA Dev

Exposing mhartid directly would be a virtualization hole. It could
conceivably be exposed in the nonvirtualized case. You'd still get at
it via the SBI call; it would just return more quickly.

Not a direct answer to your question, but the Linux port gets around
this by storing the hart id in the thread_info struct, which it always
knows how to get to from the stack pointer. So it only ever needs to
execute the SBI call on boot.

> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAP6exY%2B8h7h%2BoA%2BvwpZTHTCrVnwhbwekRPQ5JrXE4h9bmXNnBw%40mail.gmail.com.

ron minnich

unread,

Nov 16, 2016, 6:30:54 PM11/16/16

to Andrew Waterman, RISC-V ISA Dev

On Wed, Nov 16, 2016 at 2:06 PM Andrew Waterman <and...@sifive.com> wrote:

Not a direct answer to your question, but the Linux port gets around
this by storing the hart id in the thread_info struct, which it always
knows how to get to from the stack pointer. So it only ever needs to
execute the SBI call on boot.

yeah, we do the same in plan 9. I just keep wondering about something simpler.

Paolo Bonzini

unread,

Nov 17, 2016, 7:21:44 AM11/17/16

to Andrew Waterman, ron minnich, RISC-V ISA Dev

On 16/11/2016 23:06, Andrew Waterman wrote:
> Exposing mhartid directly would be a virtualization hole. It could
> conceivably be exposed in the nonvirtualized case. You'd still get at
> it via the SBI call; it would just return more quickly.

It's always possible for virtualization extensions to include shadow
registers that provide the virtualized view to S mode.

I wouldn't worry too much about exposing mhartid directly. H mode would
then add something like an hmhartid register, and when S mode reads
mhartid the processor would actually return the value of hmhartid (which
in turn is completely inaccessible to S mode).

This would work with both "ARM-style" H mode and the "s390-style" scheme
I outlined a while ago.

Paolo

Jacob Bachmeyer

unread,

Nov 17, 2016, 6:09:04 PM11/17/16

to Paolo Bonzini, Andrew Waterman, ron minnich, RISC-V ISA Dev

Paolo Bonzini wrote:
> On 16/11/2016 23:06, Andrew Waterman wrote:
>
>> Exposing mhartid directly would be a virtualization hole. It could
>> conceivably be exposed in the nonvirtualized case. You'd still get at
>> it via the SBI call; it would just return more quickly.
>>
>
> It's always possible for virtualization extensions to include shadow
> registers that provide the virtualized view to S mode.
>
> I wouldn't worry too much about exposing mhartid directly. H mode would
> then add something like an hmhartid register, and when S mode reads
> mhartid the processor would actually return the value of hmhartid (which
> in turn is completely inaccessible to S mode).
>
> This would work with both "ARM-style" H mode and the "s390-style" scheme
> I outlined a while ago.
>

The problem with this as written is that it proposes making exceptions
to the CSR address layout convention. I propose an alternative:

Define an "shartid" CSR in the S-mode read-only group, initialized at
power-on-reset to mhartid, and (on implementations supporting
virtualized S-mode harts) writable as "hshartid" in the H-mode
read/write shadows group. (Implementations lacking the ability to
virtualize S-mode harts could simply alias mhartid as shartid.)
Analogous "hhartid" and "mhhartid" CSRs can also be used on
implementations that virtualize H-mode harts.

-- Jacob

Paolo Bonzini

unread,

Nov 18, 2016, 7:57:38 AM11/18/16

to jcb6...@gmail.com, Andrew Waterman, ron minnich, RISC-V ISA Dev

On 18/11/2016 00:09, Jacob Bachmeyer wrote:
> Paolo Bonzini wrote:
>> On 16/11/2016 23:06, Andrew Waterman wrote:
>>
>>> Exposing mhartid directly would be a virtualization hole. It could
>>> conceivably be exposed in the nonvirtualized case. You'd still get at
>>> it via the SBI call; it would just return more quickly.
>>>
>>
>> It's always possible for virtualization extensions to include shadow
>> registers that provide the virtualized view to S mode.
>>
>> I wouldn't worry too much about exposing mhartid directly. H mode would
>> then add something like an hmhartid register, and when S mode reads
>> mhartid the processor would actually return the value of hmhartid (which
>> in turn is completely inaccessible to S mode).
>>
>> This would work with both "ARM-style" H mode and the "s390-style" scheme
>> I outlined a while ago.
>
> The problem with this as written is that it proposes making exceptions
> to the CSR address layout convention.

Of course, this makes complete sense.

Paolo

kr...@berkeley.edu

unread,

Nov 18, 2016, 8:14:37 AM11/18/16

to jcb6...@gmail.com, Paolo Bonzini, Andrew Waterman, ron minnich, RISC-V ISA Dev

Can someone explain what problem would be solved by adding hardware
for this?

Krste

| --
| You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

| To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/582E388C.8080504%40gmail.com.

Paolo Bonzini

unread,

Nov 18, 2016, 8:18:02 AM11/18/16

to kr...@berkeley.edu, jcb6...@gmail.com, Andrew Waterman, ron minnich, RISC-V ISA Dev

On 18/11/2016 14:14, kr...@berkeley.edu wrote:
>
> Can someone explain what problem would be solved by adding hardware
> for this?

The hart id becomes accessible with a single instruction. In fact, the
hart id should be accessible even from U mode as fast as possible, so it
is insufficient to place it in the SBI.

This is a real world problem, Intel for example is adding an RDPID
instruction for this (and even without it, the Linux kernel is hacking
the LSL instruction to do getcpu() faster than a regular system call).

Paolo

kr...@berkeley.edu

unread,

Nov 18, 2016, 8:35:19 AM11/18/16

to Paolo Bonzini, kr...@berkeley.edu, jcb6...@gmail.com, Andrew Waterman, ron minnich, RISC-V ISA Dev

>>>>> On Fri, 18 Nov 2016 14:17:59 +0100, Paolo Bonzini <bon...@gnu.org> said:
| On 18/11/2016 14:14, kr...@berkeley.edu wrote:
||
|| Can someone explain what problem would be solved by adding hardware
|| for this?
| The hart id becomes accessible with a single instruction. In fact, the
| hart id should be accessible even from U mode as fast as possible, so it
| is insufficient to place it in the SBI.

"As fast as possible" might really hurt system performance.

How many cycles is OK: <1, ~1, <10, <100, <1000, <10,000.

Put another way, what %age of execution time is spent on SBI calls in
a RISC-V systems to read the mhartid register?

How much cost/performance/energy for remaining execution time are you
willing to sacrifice to make this instruction go faster (0.1%, 1%,
10%)?

| This is a real world problem, Intel for example is adding an RDPID
| instruction for this (and even without it, the Linux kernel is hacking
| the LSL instruction to do getcpu() faster than a regular system call).
| Paolo

How much slower does OS run on Intel platforms without the rdpid instruction?

How much faster does OS run with the rdpid instruction?

I'm not really responding directly to Paolo's email as much as
pointing out to list that everything has a tradeoff, and spending
HW/SW implementation energy elsewhere, e.g., on fast SBI calls, might
result in much more efficient systems that mandating extra hardware
registers for everything now behind an SBI call.

Krste

ron minnich

unread,

Nov 18, 2016, 9:17:17 AM11/18/16

to kr...@berkeley.edu, Paolo Bonzini, jcb6...@gmail.com, Andrew Waterman, RISC-V ISA Dev

riscv may yet prove to be the exception, but every system I've ever used that started out with an SBI-like mechanism has ended up with giant bloatware running in the equivalent of M-mode. I'm thinking here of PC BIOS, Sun open boot, Alpha SRM, Intel UEFI, ... the list never ends, because every new firmware that provides a runtime callback mechanism takes this path. I most recently ran into it on Blue Gene/Q IIRC, and on that platform we worked around the issues by not using it (as did many). It bothers me, a little, that it seems baked into the architecture spec. It reminds me of PALcode, which similarly got to be quite a mess, as well as a way for vendors (usually DEC) to lock in customers to their own platform.

You all have the best intentions in the world and it's a clean sheet, true. But if you look at how these callback systems evolve, over time you end up with many megabytes of code, problems with SMP safety, bugs that can't be fixed since vendors end up putting critical functions in places that can't be changed, and performance issues with jitter and noise induced as higher protection levels unwittingly callback to firmware without realizing the cost -- a particular problem with x86-based systems used in realtime and suffering from ACPI-induced noise.

Just look at the discussions we've seen here: ELF parsers in firmware, kexec() in firmware ... it's starting already. Hence I think that any SBI-like mechanism intended for routine use is a problem in about 10 years, because it opens the door to all kinds of abuse.

So I got to wondering what it would take to run with 0 SBI calls.

I almost want to have a way to throw a switch in coreboot, by having an SBI call that disables all SBI calls, so that

- we don't come to depend on them

- we can catch errors caused by calling them.

It would be nice if the rules for SBI included "no backward branches" since it rules out too much cleverness. I surprise myself: I'm actually serious about this idea.

Nevertheless we've worked it out, so you can consider my request dropped :-)

Samuel Falvo II

unread,

Nov 18, 2016, 11:16:32 AM11/18/16

to ron minnich, Krste Asanovic, Paolo Bonzini, Jacob Bachmeyer, Andrew Waterman, RISC-V ISA Dev

On Fri, Nov 18, 2016 at 6:17 AM, ron minnich <rmin...@gmail.com> wrote:
> riscv may yet prove to be the exception, but every system I've ever used
> that started out with an SBI-like mechanism has ended up with giant
> bloatware running in the equivalent of M-mode. I'm thinking here of PC BIOS,
> Sun open boot, Alpha SRM, Intel UEFI, ... the list never ends, because every

I fear that this is irreversible and unavoidable in the long term; the
universe always tends towards entropy.

However, that said, I'm not sure what exposing a hart ID will do to
avoid this situation. The entropy either exists in a binary blob, or
in a silicon wafer.

I'm not saying this to justify Paolo's point of view, nor even
Krste's. Ultimately, it boils down to the community using the
features in the first place. WHY do we need a hartid or PID available
at the user-mode level? An application should never care what
processor it's running on; only that it is assigned one to begin with.
Any U-mode software which requires knowledge of the current processor
ID is, I argue, by definition something of a special snowflake, and
should be treated exactly as such.

The bloat is not the U-mode hartid register. The bloat is needing
that information in the first place. If we truly desire staving off
feature-creep, which I think this squarely is, we should be at least
as comfortable recommending alternatives to requested features as we
are considerate in adding new features at all. RISC-V ISA's
orthogonal use of design features was one of the key reasons the ISA
appealed to me when I selected it for use with the Kestrel project. I
think we should work hard to ensure this remains the case.

> new firmware that provides a runtime callback mechanism takes this path. I

Not quite relevant, but a topic to consider for the future -- "runtime
callback mechanism" -- we need to stop using this phrase, I think,
because this is easily confused with evented software architectures.
When I think "callback," I think of some other module calling a
user-supplied subroutine of my choosing in response to something. Can
we use something like "upcall" or "system call" instead?

> Just look at the discussions we've seen here: ELF parsers in firmware,
> kexec() in firmware ... it's starting already. Hence I think that any
> SBI-like mechanism intended for routine use is a problem in about 10 years,
> because it opens the door to all kinds of abuse.

It's been my observation that the only features which should appear in
the SEE should be those which are at or nearly at the level of a
single CPU instruction. Anything else should be considered BIOS-level
at a minimum, or more preferably, located at the supervisor-level.

You might argue, BIOS *is* the SEE. I argue no, it's not. BIOS is a
collection of device drivers that allows a system to boot, along with
the minimum software required to load the second S-mode image (the
first being BIOS itself). The features needed to make this happen
includes:

* Reporting the motherboard's config-string or device tree, if one exists.
* Providing a minimal approach to identifying which block devices
exist in the system, if a CS/DT doesn't exist.
* Providing the minimal interface to these block device drivers:
- Volume changed?
- Read block
- Write block
- Report device capacity
* Providing read/write access to a real-time clock if one is available.
* Providing the logic to select a viable boot device, read its master
boot sector, and jump to the code loaded therefrom.

It should never have to be more sophisticated than this. If you do
want some additional sophistication for some reason, you should use
something like Atari ST's "cookie jar" or follow the AmigaOS approach
to discovering and making available statically-linked libraries via
jump tables. (It pains me to be a broken record; yet, I see routinely
how good ideas from the past are just discarded solely for the reason
that they have no Unix heritage.) The code to make this happen is
less than 1KB in size on any CPU I've written this logic for. But,
honestly, I wouldn't recommend even this UNLESS you also embed your
host OS into ROM as well.

The SEE, however, is an M-mode binary blob[1], which is responsible
for providing the following:

* Software implementations for missing CPU instructions used by the
BIOS during its bootstrap,
* A means of booting into the BIOS in S-mode (or H-mode as the case may be).

I can't think of anything else the SEE should offer at this point.

Notes:

1. The SEE ROM image is fundamentally tied to the precise CPU you
have installed, while the BIOS is tied fundamentally to the
motherboard you're bringing up. The OS you boot, then, is tied to the
storage devices you boot from. Simple.[2]

2. By Amiga's standard, this is actually too simple, but it has the
benefit of having its executive in ROM and can therefore afford a bit
more dynamic responsiveness to attached peripherals. This approach is
at least as capable as any Macintosh or PC-based boot sequence I've
observed.

--
Samuel A. Falvo II

Stefan O'Rear

unread,

Nov 18, 2016, 11:24:50 AM11/18/16

to ron minnich, Krste Asanovic, Paolo Bonzini, Jacob Bachmeyer, Andrew Waterman, RISC-V ISA Dev

On Fri, Nov 18, 2016 at 6:17 AM, ron minnich <rmin...@gmail.com> wrote:

> So I got to wondering what it would take to run with 0 SBI calls.

Likewise, but I don't have the answer yet either.

-s

Paolo Bonzini

unread,

Nov 18, 2016, 11:50:35 AM11/18/16

to Samuel Falvo II, ron minnich, Krste Asanovic, Jacob Bachmeyer, Andrew Waterman, RISC-V ISA Dev

On 18/11/2016 17:16, Samuel Falvo II wrote:
> I'm not saying this to justify Paolo's point of view, nor even
> Krste's. Ultimately, it boils down to the community using the
> features in the first place. WHY do we need a hartid or PID available
> at the user-mode level? An application should never care what
> processor it's running on; only that it is assigned one to begin with.
> Any U-mode software which requires knowledge of the current processor
> ID is, I argue, by definition something of a special snowflake, and
> should be treated exactly as such.

I agree, but unfortunately it's also going to be a performance-sensitive
snowflake.

A simple case is to allocate data from memory that comes from the
current NUMA node (which in turn is a function of the current processor
id). Since multiple CPUs contend anyway on memory used by each NUMA
node, they must protect their accesses anyway with locks. So it's okay
if this code gets preempted and uses the wrong CPU number. Assuming
that migrating a process to a completely different NUMA node migration
is rare, things work nicely performance-wise, too.

Another case was the "restartable sequence" proposal for Linux
(https://lwn.net/Articles/650333/) which however hasn't gone in yet.

Paolo

Samuel Falvo II

unread,

Nov 18, 2016, 12:06:06 PM11/18/16

to Paolo Bonzini, ron minnich, Krste Asanovic, Jacob Bachmeyer, Andrew Waterman, RISC-V ISA Dev

On Fri, Nov 18, 2016 at 8:50 AM, Paolo Bonzini <bon...@gnu.org> wrote:
>
> A simple case is to allocate data from memory that comes from the
> current NUMA node (which in turn is a function of the current processor
> id).

My concern here is that at time T0, you're on CPU 0, and you allocate
a node in pool 0. Then, after a pre-emption (time T1), you're now
running on CPU 1, and now you're still stuck accessing that node in
pool 0. That allocated node does not migrate with the process/thread.
Unless you explicitly set a processor affinity somewhere, you may as
well select a pool at random.

But, if you have a processor affinity set, then you clearly already
know the processor ID as a cached value.

Hence my confusion. I just don't see this as a problem. :/

Paolo Bonzini

unread,

Nov 18, 2016, 12:29:21 PM11/18/16

to Samuel Falvo II, ron minnich, Krste Asanovic, Jacob Bachmeyer, Andrew Waterman, RISC-V ISA Dev

On 18/11/2016 18:06, Samuel Falvo II wrote:
> On Fri, Nov 18, 2016 at 8:50 AM, Paolo Bonzini <bon...@gnu.org> wrote:
>>
>> A simple case is to allocate data from memory that comes from the
>> current NUMA node (which in turn is a function of the current processor
>> id).
>
> My concern here is that at time T0, you're on CPU 0, and you allocate
> a node in pool 0. Then, after a pre-emption (time T1), you're now
> running on CPU 1, and now you're still stuck accessing that node in
> pool 0. That allocated node does not migrate with the process/thread.
> Unless you explicitly set a processor affinity somewhere, you may as
> well select a pool at random.

You can always do best effort. If you're moved to a different NUMA
node, you're screwed anyway, and anyway the scheduler will penalize that
a lot.

The restartable sequence case is more interesting.

Paolo

Jacob Bachmeyer

unread,

Nov 18, 2016, 6:16:05 PM11/18/16

to ron minnich, kr...@berkeley.edu, Paolo Bonzini, Andrew Waterman, RISC-V ISA Dev

ron minnich wrote:
> riscv may yet prove to be the exception, but every system I've ever
> used that started out with an SBI-like mechanism has ended up with
> giant bloatware running in the equivalent of M-mode. I'm thinking here
> of PC BIOS, Sun open boot, Alpha SRM, Intel UEFI, ... the list never
> ends, because every new firmware that provides a runtime callback
> mechanism takes this path. I most recently ran into it on Blue Gene/Q
> IIRC, and on that platform we worked around the issues by not using it
> (as did many). It bothers me, a little, that it seems baked into the
> architecture spec. It reminds me of PALcode, which similarly got to be
> quite a mess, as well as a way for vendors (usually DEC) to lock in
> customers to their own platform.

I would suggest prohibiting that kind of abuse (for vendor lock-in) by
strictly specifying the SBI and adding: "any undocumented or
non-freely-implementable extensions (except for internal trap vectors
used by SBI S-mode code) to this interface make an implementation
non-conforming". Ideally, such a non-conformant implementation could
not even be called "RISC-V". Yes, this is extreme, but vendor lock-in
needs to be prevented.

> You all have the best intentions in the world and it's a clean sheet,
> true. But if you look at how these callback systems evolve, over time
> you end up with many megabytes of code, problems with SMP safety, bugs
> that can't be fixed since vendors end up putting critical functions in
> places that can't be changed, and performance issues with jitter and
> noise induced as higher protection levels unwittingly callback to
> firmware without realizing the cost -- a particular problem with
> x86-based systems used in realtime and suffering from ACPI-induced noise.
>
> Just look at the discussions we've seen here: ELF parsers in firmware,
> kexec() in firmware ... it's starting already. Hence I think that any
> SBI-like mechanism intended for routine use is a problem in about 10
> years, because it opens the door to all kinds of abuse.

I see two related uses for the SBI: hardware abstraction (IPI, hartid,
PLIC control, remote fences) and standardized hypercalls (config string
access, virtual I/O, sbi_sexec(), etc.). There is a well-known quip
that the price of freedom is eternal vigilance and that is what I see as
needed here--to prevent those abuses and keep the SBI from growing to
excess.

> So I got to wondering what it would take to run with 0 SBI calls.
>
> I almost want to have a way to throw a switch in coreboot, by having
> an SBI call that disables all SBI calls, so that
> - we don't come to depend on them
> - we can catch errors caused by calling them.

I would suggest that that should be in the system configuration, instead
of an SBI call--just like I advocate that implementations supporting
H-mode should have a hardware jumper/DIP-switch to disable H-mode
completely.

Alternately, a supervisor that wishes to eschew SBI could simply unmap
the page.

> It would be nice if the rules for SBI included "no backward branches"
> since it rules out too much cleverness. I surprise myself: I'm
> actually serious about this idea.

That could be very good for the hardware abstraction side, at least for
SBI code that actually runs in S-mode. A supervisor could check the SBI
page(s) and complain at boot if the "no backward branches" rule is
violated. On the other hand, I could see some hardware needing SBI code
more complex than that, but able to justify those costs. Maybe we need
some kind of "slim SBI" option that an implementation could advertise?

-- Jacob

Samuel Falvo II

unread,

Nov 18, 2016, 8:30:23 PM11/18/16

to Jacob Bachmeyer, ron minnich, Krste Asanovic, Paolo Bonzini, Andrew Waterman, RISC-V ISA Dev

On Fri, Nov 18, 2016 at 3:16 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> I would suggest prohibiting that kind of abuse (for vendor lock-in) by
> strictly specifying the SBI and adding: "any undocumented or
> non-freely-implementable extensions (except for internal trap vectors used
> by SBI S-mode code) to this interface make an implementation non-conforming"

On the surface, I really like this idea. I really miss this from
AmigaOS developer documents, and I still think its the right way to do
things.

The alternative approach, where you constantly strive to be backward
compatible with known-broken software, is simply not scalable. It
works for Windows and IBM mainframes only because of their respective
market dominance and their *massive* workforces. The primary
difference between IBM and MS, though, is that IBM is generally MUCH
more thoughtful about their designs, and they're always forward
looking[1], while MS just implements APIs willy-nilly. This leads me
to my primary concern with respect to the SBI: I'm _deeply_ concerned
about premature freezing of any SBI postulated. That includes the SBI
in the current privilege spec document.

(In fact, I firmly believe the SBI should exist as a separate document
all-together; however, that's a separate topic of discussion.)

On the one hand, I'm seeing the SBI almost as a dumping ground for new
features, a land-grab for stuff everybody wants to see for their own
projects, yet isn't impactful enough to be included in the official
RISC-V specifications. I think this is more or less healthy at an
early stage, but when it comes time to discuss the SBI in earnest, we
really need to crack down on that. This includes me, of course.

On the other hand, we have the case where the SBI is just not
complete, as we haven't foreseen some new marketable feature or trend.
This is arguably worse, as new features will be introduced in
incompatible ways until the SBI specification is revised. Then, we
need to figure out how to communicate which SBI features are available
for use.

Basically, I'm just wondering if we need to eat a few incompatible
systems before we actually *learn* what is actually necessary to
implement in the SBI. So far, a lot of big-design-up-front seems to
be happening, but that can lead to fragile designs later on.

Hyper-speculation about what should and should not go into the SBI can
lead to ITU-like specifications documents, where you've got to invest
hundreds of pages to specify something that, under ideal conditions,
supervisor software would only use relatively rarely. I can easily
see SBIs that I would not use for many years to come, for example.
E.g., IPIs: none of my computer designs are multi-processor so far, so
I question the value of supporting such SBIs until I actually support
multi-processor main boards. That's just code bloat from my POV.

Again, not trying to squelch SBI discussions; rather, trying to focus
them on what is known to be essential, and/or to convince folks that
maybe undertaking a few projects we know will end up incompatible is
necessary before we have enough experience to know what belongs in the
SBI or not.

Just my 2 bits; I'll shush now. ;)

________________
1. There are some cases, though, where what looks like
short-sightedness is actually a case of wrong assumptions. For
example, they expected core memory to be the primary kind of RAM for
many decades to come, which is why the S/360 was limited to 16MB
address space, and why their PSW is arranged as it is.

Jacob Bachmeyer

unread,

Nov 18, 2016, 9:42:53 PM11/18/16

to Samuel Falvo II, ron minnich, Krste Asanovic, Paolo Bonzini, Andrew Waterman, RISC-V ISA Dev

Samuel Falvo II wrote:
> On Fri, Nov 18, 2016 at 3:16 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> I would suggest prohibiting that kind of abuse (for vendor lock-in) by
>> strictly specifying the SBI and adding: "any undocumented or
>> non-freely-implementable extensions (except for internal trap vectors used
>> by SBI S-mode code) to this interface make an implementation non-conforming"
>>
>
> On the surface, I really like this idea. I really miss this from
> AmigaOS developer documents, and I still think its the right way to do
> things.
>
> The alternative approach, where you constantly strive to be backward
> compatible with known-broken software, is simply not scalable. It
> works for Windows and IBM mainframes only because of their respective
> market dominance and their *massive* workforces. The primary
> difference between IBM and MS, though, is that IBM is generally MUCH
> more thoughtful about their designs, and they're always forward

> looking, while MS just implements APIs willy-nilly. This leads me

> to my primary concern with respect to the SBI: I'm _deeply_ concerned
> about premature freezing of any SBI postulated. That includes the SBI
> in the current privilege spec document.
>
> (In fact, I firmly believe the SBI should exist as a separate document
> all-together; however, that's a separate topic of discussion.)
>
> On the one hand, I'm seeing the SBI almost as a dumping ground for new
> features, a land-grab for stuff everybody wants to see for their own
> projects, yet isn't impactful enough to be included in the official
> RISC-V specifications. I think this is more or less healthy at an
> early stage, but when it comes time to discuss the SBI in earnest, we
> really need to crack down on that. This includes me, of course.
>

I have tried to avoid that trap, but I can see where some of my
proposals (and possibly the original plan for the SBI) could be seen
that way.

I think of the SBI as having two parts: hardware abstraction and
hypercalls. The former are simple things like IPI, PLIC control, remote
fences, while the latter includes the "heavier" features--virtual I/O,
sbi_sexec(), and so on.

> On the other hand, we have the case where the SBI is just not
> complete, as we haven't foreseen some new marketable feature or trend.
> This is arguably worse, as new features will be introduced in
> incompatible ways until the SBI specification is revised. Then, we
> need to figure out how to communicate which SBI features are available
> for use.
>

I chose my words carefully in that, forbidding only "undocumented or
non-freely-implementable" extensions. I envision the SBI being
extensible however you like, but you are not allowed to preclude others
from implementing an interface you define. If you do so, your product
is not "RISC-V".

Providing a list of supported features is easy--the configuration
string! ... Except, er, that the SBI calls are linked by name before
the supervisor is actually running and the SBI does not offer dynamic
linking. We could define some way for the SEE to indicate "that SBI
call is unknown--avoid using it" to a supervisor at load-time....

> Basically, I'm just wondering if we need to eat a few incompatible
> systems before we actually *learn* what is actually necessary to
> implement in the SBI. So far, a lot of big-design-up-front seems to
> be happening, but that can lead to fragile designs later on.
>
> Hyper-speculation about what should and should not go into the SBI can
> lead to ITU-like specifications documents, where you've got to invest
> hundreds of pages to specify something that, under ideal conditions,
> supervisor software would only use relatively rarely. I can easily
> see SBIs that I would not use for many years to come, for example.
> E.g., IPIs: none of my computer designs are multi-processor so far, so
> I question the value of supporting such SBIs until I actually support
> multi-processor main boards. That's just code bloat from my POV.
>

The IPI calls are trivial stubs on UP hardware, but you make a good
point. The SBI is limited to 512 slots due to its calling convention.
While the SBI is linked by name, so a larger set could be "available",
any one S-mode process can only have 512 SBI entry points. I think that
this limit can help us, as a reminder that the uses for the SBI are and
must be limited.

-- Jacob

Michael Clark

unread,

Nov 19, 2016, 3:25:50 AM11/19/16

to Samuel Falvo II, Paolo Bonzini, ron minnich, Krste Asanovic, Jacob Bachmeyer, Andrew Waterman, RISC-V ISA Dev

The other thought raised earlier about fast access to tid or hartid would be for relative loads and stores in a cluster of cores in one clock domain, similar to some of the converged CPU/APU/GPU/TPU/NPU architectures. Many of the modern SOC designs share FP units at the geometry stage between the CPU and the GPU.

There is the hwacha architecture or some derivative architectures that may send the program counter to a vector cluster which could use the tid or hartid for memory read offsets coming from multiple vector units that are coalesced in a wide memory controller (e.g. 4096-bit HBM). From looking at hwacha, this sounds like some of the kinds of applications that the RISC-V Base ISA could form the foundation for. Of course there is certainly a limit to what is appropriate for the Base ISA and the Privileged Spec i.e. these types of things are likely extensions. OpenCL implementations come to mind. MPI also needs fast tid access.

Many of these high performance application wouldn’t be running a pre-emptable scheduler like Linux (or have SMM and ACPI interfering). They would have real time schedulers, or even small hardware schedulers.

There are also applications with FPGAs and multiple 10Gb network cards (intrusion detection comes to mind) where multiple device queues would have hart affinity and each queue is wired to one hart id to balance traffic. Many network appliances running BSD or Linux often have cores that are not under the control of the OS, rather they have been removed from the OS scheduler and instead have their interrupts and timers wired up for realtime packet processing and such. There are many different applications that RISC-V will be potentially used for.

These where just reasons why one may want to access the hart id quickly… e.g. {m,h,s,u}hartid. I don’t know how expensive this is. In a straightforward implementation they could alias the same register.

~mc

> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAEz%3DsomkPW05YUyZbLY6NGSHp3iY9CedAC%2BFU6WpFpgFdy_oXg%40mail.gmail.com.

Stefan O'Rear

unread,

Nov 19, 2016, 1:07:01 PM11/19/16

to ron minnich, RISC-V ISA Dev

On Wed, Nov 16, 2016 at 1:57 PM, ron minnich <rmin...@gmail.com> wrote:

> on many CPUs getting a core id involves reading a register. riscv envisions,
> as I understand it, an SBI call.

Thinking about this for a bit I think the solution is more subtle than
that. Generally speaking you don't want a hart id for its own sake.
You want a hart id in order to locate per-hart data structures, which
means you want a pointer which can be set per-hart, and the ABI
already reserves a GPR for exactly that purpose: x4/tp. S-mode trap
entry code will use sscratch, directly or indirectly, to set tp, and
U-mode code gets tp set from above by context switches.

You only actually need the hartid once when setting up the per-hart
struct, so my proposal is:

Eliminate sbi_hart_id and add a hart_id parameter to the S-mode entry point.

("shartid" is a bad idea _because_ it is state that would require
hypervisor context switching that is effectively redundant with
sscratch+tp)

-s

Paolo Bonzini

unread,

Nov 19, 2016, 1:52:21 PM11/19/16

to Stefan O'Rear, ron minnich, RISC-V ISA Dev

Il 19/nov/2016 19:07, "Stefan O'Rear" <sor...@gmail.com> ha scritto:
>
> On Wed, Nov 16, 2016 at 1:57 PM, ron minnich <rmin...@gmail.com> wrote:
> > on many CPUs getting a core id involves reading a register. riscv envisions,
> > as I understand it, an SBI call.
>
> Thinking about this for a bit I think the solution is more subtle than
> that. Generally speaking you don't want a hart id for its own sake.
> You want a hart id in order to locate per-hart data structures, which
> means you want a pointer which can be set per-hart, and the ABI
> already reserves a GPR for exactly that purpose: x4/tp. S-mode trap
> entry code will use sscratch, directly or indirectly, to set tp, and
> U-mode code gets tp set from above by context switches.

U-mode would use tp for per-thread data structures, not per-hart. I suppose that the OS, if it desired to provide a fast access to the hartid, could place it on migration at a well-known offset from tp, but this additional complication may be a side effect with security implications.

> You only actually need the hartid once when setting up the per-hart
> struct, so my proposal is:
>
> Eliminate sbi_hart_id and add a hart_id parameter to the S-mode entry point.

Speaking of which, what does the spec say of SMP bring-up?

Paolo

> ("shartid" is a bad idea _because_ it is state that would require
> hypervisor context switching that is effectively redundant with
> sscratch+tp)
>
> -s
>

> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CADJ6UvOdr_U6-6jJ_fFd7asR_XWG1E_rtg3dLoFqVUYpofV3Hw%40mail.gmail.com.

Stefan O'Rear

unread,

Nov 19, 2016, 2:11:44 PM11/19/16

to Paolo Bonzini, ron minnich, RISC-V ISA Dev

On Sat, Nov 19, 2016 at 10:52 AM, Paolo Bonzini <paolo....@gmail.com> wrote:
> U-mode would use tp for per-thread data structures, not per-hart. I suppose
> that the OS, if it desired to provide a fast access to the hartid, could

A lot of the discussion around here uses "hart" to mean "execution
context"; I made that mistake here and I meant "S-mode execution
context" and "U-mode execution context", the latter of which reduces
to "thread".

> place it on migration at a well-known offset from tp, but this additional
> complication may be a side effect with security implications.

My intention is to use the results of the preemptable sequences
programme once that finishes, not to preempt it. So no action there
at this time.

>> Eliminate sbi_hart_id and add a hart_id parameter to the S-mode entry
>> point.
>
> Speaking of which, what does the spec say of SMP bring-up?

As far as I can tell, nothing. It's also not clear to me whether
RISC-V Linux has ever been successfully tested in an SMP
configuration.

The M-mode-as-expected-by-BBL view of SMP bringup is that all cores
start executing at the same entry point at the same time with
different values in mhartid, and the firmware passes them off to
S-mode as they come in.

This doesn't feel particularly friendly to CPU hotplug use cases, but
I'm not up enough on that field to know what would be.

-s

Paolo Bonzini

unread,

Nov 19, 2016, 2:39:00 PM11/19/16

to Stefan O'Rear, ron minnich, RISC-V ISA Dev

I think only a bootstrap hart should start executing the kernel. There should be an SBI service to stop execution of the current processor, and one to start execution of a stopped processor from another. A stopped processor stays in M- or H-mode until woken up.

The set of hart ids at startup should be sent through the configuration string/device tree. If hotplug is ever going to be supported, one would need an event mechanism as part of the SEE, but that is way in the future.

Paolo

> -s
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CADJ6UvOWjSUBk5wt%3DfeG_Yd2vG95hm2GEkSNvoi3TujDne_akQ%40mail.gmail.com.

Andrew Waterman

unread,

Nov 19, 2016, 3:25:14 PM11/19/16

to Stefan O'Rear, ron minnich, RISC-V ISA Dev

On Sat, Nov 19, 2016 at 10:06 AM, Stefan O'Rear <sor...@gmail.com> wrote:
> On Wed, Nov 16, 2016 at 1:57 PM, ron minnich <rmin...@gmail.com> wrote:
>> on many CPUs getting a core id involves reading a register. riscv envisions,
>> as I understand it, an SBI call.
>
> Thinking about this for a bit I think the solution is more subtle than
> that. Generally speaking you don't want a hart id for its own sake.
> You want a hart id in order to locate per-hart data structures, which
> means you want a pointer which can be set per-hart, and the ABI
> already reserves a GPR for exactly that purpose: x4/tp. S-mode trap
> entry code will use sscratch, directly or indirectly, to set tp, and
> U-mode code gets tp set from above by context switches.
>
> You only actually need the hartid once when setting up the per-hart
> struct, so my proposal is:
>
> Eliminate sbi_hart_id and add a hart_id parameter to the S-mode entry point.

I wholeheartedly concur with this proposal.

>
> ("shartid" is a bad idea _because_ it is state that would require
> hypervisor context switching that is effectively redundant with
> sscratch+tp)
>
> -s
>

> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CADJ6UvOdr_U6-6jJ_fFd7asR_XWG1E_rtg3dLoFqVUYpofV3Hw%40mail.gmail.com.

Jacob Bachmeyer

unread,

Nov 19, 2016, 6:18:52 PM11/19/16

to Paolo Bonzini, Stefan O'Rear, ron minnich, RISC-V ISA Dev

Paolo Bonzini wrote:
>
> Il 19/nov/2016 20:11, "Stefan O'Rear" <sor...@gmail.com

> <mailto:sor...@gmail.com>> ha scritto:

> >
> > On Sat, Nov 19, 2016 at 10:52 AM, Paolo Bonzini
> <paolo....@gmail.com <mailto:paolo....@gmail.com>> wrote:
> > > Speaking of which, what does the spec say of SMP bring-up?
> >
> > As far as I can tell, nothing. It's also not clear to me whether
> > RISC-V Linux has ever been successfully tested in an SMP
> > configuration.
> >
> > The M-mode-as-expected-by-BBL view of SMP bringup is that all cores
> > start executing at the same entry point at the same time with
> > different values in mhartid, and the firmware passes them off to
> > S-mode as they come in.
> >
> > This doesn't feel particularly friendly to CPU hotplug use cases, but
> > I'm not up enough on that field to know what would be.
>
> I think only a bootstrap hart should start executing the kernel.
>

I have been thinking about a proposal for more-or-less that for loading
a monitor on "PC-ish"
(portable/desktop/workstation/server/large-network-equipment) systems--a
specialized "bootstrap service processor" (envisioned as an RV32E core)
with its own SRAM configures the main DRAM controller and loads the
system monitor from flash into DRAM, then starts the main processor.
The big detail I am currently hung up on is handling discovery of
processor modules. I think that a standard system architecture that
specifies enough to mix-and-match processor modules from different
vendors (presumably with varying specialized accelerators) would be nice.

> There should be an SBI service to stop execution of the current
> processor, and one to start execution of a stopped processor from
> another. A stopped processor stays in M- or H-mode until woken up.
>

Could these needs be met with the WFI opcode to stop a processor and
using an IPI to wake one?

> The set of hart ids at startup should be sent through the
> configuration string/device tree. If hotplug is ever going to be
> supported, one would need an event mechanism as part of the SEE, but
> that is way in the future.
>

I have sketched an outline of an SBI event mechanism, using virtual
stream devices.

-- Jacob

Jacob Bachmeyer

unread,

Nov 19, 2016, 6:28:10 PM11/19/16

to Andrew Waterman, Stefan O'Rear, ron minnich, RISC-V ISA Dev

Andrew Waterman wrote:
> On Sat, Nov 19, 2016 at 10:06 AM, Stefan O'Rear <sor...@gmail.com> wrote:
>
>> On Wed, Nov 16, 2016 at 1:57 PM, ron minnich <rmin...@gmail.com> wrote:
>>
>>> on many CPUs getting a core id involves reading a register. riscv envisions,
>>> as I understand it, an SBI call.
>>>
>> Thinking about this for a bit I think the solution is more subtle than
>> that. Generally speaking you don't want a hart id for its own sake.
>> You want a hart id in order to locate per-hart data structures, which
>> means you want a pointer which can be set per-hart, and the ABI
>> already reserves a GPR for exactly that purpose: x4/tp. S-mode trap
>> entry code will use sscratch, directly or indirectly, to set tp, and
>> U-mode code gets tp set from above by context switches.
>>
>> You only actually need the hartid once when setting up the per-hart
>> struct, so my proposal is:
>>
>> Eliminate sbi_hart_id and add a hart_id parameter to the S-mode entry point.
>>
>
> I wholeheartedly concur with this proposal.
>

Instead of an explicit parameter, would we be better off to place the
hart_id in x4/tp at the supervisor entry point? Could the SEE store any
other meaningful value in that register or is it an ideal place for this
parameter?

This leads to another question: how exactly does a RISC-V SMP system
start? Does every available hart enter the supervisor "simultaneously"
or is one chosen as the "initial hart" and the others must be explicitly
"woken up"? Or some other option?

-- Jacob

Stefan O'Rear

unread,

Nov 19, 2016, 6:34:12 PM11/19/16

to Jacob Bachmeyer, Paolo Bonzini, ron minnich, RISC-V ISA Dev

On Sat, Nov 19, 2016 at 3:18 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Paolo Bonzini wrote:
>> I think only a bootstrap hart should start executing the kernel.
>
> I have been thinking about a proposal for more-or-less that for loading a
> monitor on "PC-ish"
> (portable/desktop/workstation/server/large-network-equipment) systems--a
> specialized "bootstrap service processor" (envisioned as an RV32E core) with
> its own SRAM configures the main DRAM controller and loads the system
> monitor from flash into DRAM, then starts the main processor. The big
> detail I am currently hung up on is handling discovery of processor modules.
> I think that a standard system architecture that specifies enough to
> mix-and-match processor modules from different vendors (presumably with
> varying specialized accelerators) would be nice.

The focus of this discussion is SBI (which to me means the _entire
interface_, not just SBILIB). DRAM controller configuration is
outside the purview of the SBI, so I'd rather not discuss it right
now.

Basically if it doesn't make sense to provide to a VM it probably
doesn't make sense to have in SBI.

>> There should be an SBI service to stop execution of the current processor,
>> and one to start execution of a stopped processor from another. A stopped
>> processor stays in M- or H-mode until woken up.
>
> Could these needs be met with the WFI opcode to stop a processor and using
> an IPI to wake one?

Yeah, using WFI for S-mode code to indicate required availability
states to higher levels is something we've discussed, possibly also
playing games with SIE (the register) to indicate depth-of-sleep. I'm
not sold on it, I'd kinda rather something more explicit.

>> The set of hart ids at startup should be sent through the configuration
>> string/device tree. If hotplug is ever going to be supported, one would need
>> an event mechanism as part of the SEE, but that is way in the future.
>>
>
> I have sketched an outline of an SBI event mechanism, using virtual stream
> devices.

Still need to look at that, sorry.

There's probably room for several design points here. Mainframe/IaaS
wants a rich monitor, embedded systems will want S-mode to handle
everything, and the "PC/smartphone" space could go either way but
years of experience with ACPI and EFI have made me nervous about
anything resembling a rich monitor on a PC.

-s

Stefan O'Rear

unread,

Nov 19, 2016, 6:38:00 PM11/19/16

to Jacob Bachmeyer, Andrew Waterman, ron minnich, RISC-V ISA Dev

On Sat, Nov 19, 2016 at 3:28 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Instead of an explicit parameter, would we be better off to place the
> hart_id in x4/tp at the supervisor entry point? Could the SEE store any
> other meaningful value in that register or is it an ideal place for this
> parameter?

It could be done, but that's more of a discussion for boot protocols.
I'd rather use a0, a1, etc so we're not inventing entirely new calling
conventions. Maybe something auxv-ish or config string-ish.
Definitely not random registers, and I'd really rather not get into
details until we have implementation experience.

> This leads to another question: how exactly does a RISC-V SMP system start?
> Does every available hart enter the supervisor "simultaneously" or is one
> chosen as the "initial hart" and the others must be explicitly "woken up"?
> Or some other option?

Spec is silent. BBL enters the supervisor simultaneously on every
hart, hart has to use sbi_hart_id to find out which is which. Paolo's
last message has a counterproposal that I kind of like.

-s

Jacob Bachmeyer

unread,

Nov 19, 2016, 7:27:13 PM11/19/16

to Stefan O'Rear, Andrew Waterman, ron minnich, RISC-V ISA Dev

Stefan O'Rear wrote:
> On Sat, Nov 19, 2016 at 3:28 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Instead of an explicit parameter, would we be better off to place the
>> hart_id in x4/tp at the supervisor entry point? Could the SEE store any
>> other meaningful value in that register or is it an ideal place for this
>> parameter?
>>
>
> It could be done, but that's more of a discussion for boot protocols.
> I'd rather use a0, a1, etc so we're not inventing entirely new calling
> conventions. Maybe something auxv-ish or config string-ish.
> Definitely not random registers, and I'd really rather not get into
> details until we have implementation experience.
>

That was not a randomly-chosen register. We only have so many argument
registers, and I was asking if there is anything other than the hart_id
that could be meaningfully placed in the "thread pointer" register
(x4/tp) at the SBI entry point.

If we define some kind of per-hart structure passed from the SEE, x4/tp
would be a logical place to store the pointer and the hart_id would
obviously be in that structure somewhere, but I think that is a bit much
to expect in terms of the SEE setting up the execution context, thus the
suggestion to simply put the hart_id in the thread pointer register and
let the supervisor sort it out.

-- Jacob

Stefan O'Rear

unread,

Nov 19, 2016, 7:34:55 PM11/19/16

to Jacob Bachmeyer, Andrew Waterman, ron minnich, RISC-V ISA Dev

On Sat, Nov 19, 2016 at 4:27 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> That was not a randomly-chosen register.

Sorry, I did not mean to imply that it was randomly-chosen.

If we're going to be passing it in a register tp is not a bad choice.
Let's discuss this more when we're ready to have a dedicated thread
for kernel entry-point calling conventions, this is too big a subject
for a side conversation on hart IDs.

-s

Michael Clark

unread,

Nov 19, 2016, 7:53:30 PM11/19/16

to Stefan O'Rear, ron minnich, RISC-V ISA Dev

The question is which fast paths one has in mind.

VMs running on isolated harts e.g. the non-oversubscribed or say Amazon case; there would be nearly nil hypervisor content switch cost between different VMs on the same hart, they would be pinned. Hypervisor is kind of a bad example when looking at tid usage (MPI, OpenCL). The apps that need it fast will be on bare metal.

Likewise quite a few custom data structures use thread offsets in members of structures that are on the heap. This is very different to thread local storage which is a pointer. I imagine there are per_cpu heap structures in the kernel like this too. Another example is MPMC Queues (Multiple Producer Multiple Consumer). I’ve looked at quite a few implementations and they tend to have to create for example std::thread::hardware_concurrency threads and then create a map from pthread_self (tp) to an integer.

Which then means the hartid in thread local data. i.e. lw t0, hartid(tp) although possibly further indirected through a TCB.

It depends on whether we envisage the “ultra-fine” fast path in MPI and OpenCL type code that wants a tid for fast memory access in thread parallel compute kernels.

The VM context switch case is likely quite infrequent i.e.100HZ at most, whereas MPI and OpenCL tid based array accesses are in 10000HZ (kernel launches per second). I guess they can chew a register.

It deserves some thought as to the frequency of which “fast paths” and which applications, and if uhartid is just an alias to one machine register (in the simplest case) then it is just an access issue.

I guess it depends on whether RISC-V will be optimised for OpenCL or MPI and whether this will be frequent:

int tid = get_global_id(0)

Jacob Bachmeyer

unread,

Nov 19, 2016, 7:54:16 PM11/19/16

to Stefan O'Rear, Paolo Bonzini, ron minnich, RISC-V ISA Dev

Stefan O'Rear wrote:
> On Sat, Nov 19, 2016 at 3:18 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Paolo Bonzini wrote:
>>
>>> The set of hart ids at startup should be sent through the configuration
>>> string/device tree. If hotplug is ever going to be supported, one would need
>>> an event mechanism as part of the SEE, but that is way in the future.
>> I have sketched an outline of an SBI event mechanism, using virtual stream
>> devices.
>>
>
> Still need to look at that, sorry.
>

It was the "SBI virtio" proposal--a revision is needed that actually
specifies the asynchronous I/O interface, and probably changes the name,
since "virtio" has lead to confusion with a "virtio" spec for
virtualized devices that uses a different model.

> There's probably room for several design points here. Mainframe/IaaS
> wants a rich monitor, embedded systems will want S-mode to handle
> everything, and the "PC/smartphone" space could go either way but
> years of experience with ACPI and EFI have made me nervous about
> anything resembling a rich monitor on a PC.
>

Very much agreed. I would put PCs and smartphones on opposite sides of
the line, since the architecture I am planning to propose includes
hardware support for a diagnostics interface, a cost which I believe is
more likely to be justifiable on machines that normally have peripherals
(even if that amounts to "console port" on a server) and boot from
LBA-addressed storage. A system lacking LBA-addressed block storage is
clearly embedded, and a system that is entirely self-contained (like a
smartphone or tablet) may or may not have use for the standard
diagnostics interface. The planned standard physical RVDIAG port
certainly would not fit on a smartphone or tablet, although electrically
it needs only 5 pins or so, one of which is ground.

As for the monitor, I strongly support strictly limiting the interface
between the system software and the monitor to the standard SBI/HBI. A
UEFI-dependent operating system could be booted by first loading a
RISC-V-standards-conformant UEFI environment, which then performs a UEFI
boot, all within S-mode. The RISC-V monitor then deals only with the
UEFI environment and does not care that the "kernel" it loaded has
essentially replaced itself. OpenFirmware could be handled similarly if
anyone wants it. Just another bootloader.

I do want a useful firmware prompt, however. Think "GRUB in ROM without
menus" for the minimum that I would like to see. It should be possible
to boot a machine from valid boot media using only a keyboard, even if
NVRAM is blank or corrupted. I resent that time I have wasted dealing
with broken UEFI implementations.

-- Jacob

Stefan O'Rear

unread,

Nov 19, 2016, 7:59:42 PM11/19/16

to Michael Clark, ron minnich, RISC-V ISA Dev

On Sat, Nov 19, 2016 at 4:53 PM, Michael Clark <michae...@mac.com> wrote:
> Which then means the hartid in thread local data. i.e. lw t0, hartid(tp) although possibly further indirected through a TCB.

[...]

> I guess it depends on whether RISC-V will be optimised for OpenCL or MPI and whether this will be frequent:

True, there are valid use cases. I'm coming at this from an
assumption that SIMT harts will have SIMT-specific extensions anyway,
and trying to minimize the complexity cost that everyone pays — a new
S-mode register, and especially a writable one, has to clear a very
high bar.

-s

Jacob Bachmeyer

unread,

Nov 19, 2016, 8:26:12 PM11/19/16

to Stefan O'Rear, Michael Clark, ron minnich, RISC-V ISA Dev

To be clear, I suggest that it would only be writable through an H-mode
shadow CSR that would only exist on implementations that actually
support virtualized S-mode harts. Implementations lacking that
virtualization support could simply have a read-only alias of mhartid.

-- Jacob

Paolo Bonzini

unread,

Nov 20, 2016, 8:39:07 AM11/20/16

to jcb6...@gmail.com, Stefan O'Rear, ron minnich, RISC-V ISA Dev

On 20/11/2016 00:18, Jacob Bachmeyer wrote:
>> There should be an SBI service to stop execution of the current
>> processor, and one to start execution of a stopped processor from
>> another. A stopped processor stays in M- or H-mode until woken up.
>>
>
> Could these needs be met with the WFI opcode to stop a processor and
> using an IPI to wake one?

M-mode can do that, but the "stop a processor" service should be in
M-mode before executing WFI and should ensure that interrupts are
processed in M-mode.

Otherwise, there is no guarantee that the interrupt will be taken
correctly. There are so many things that can be wrong: the page tables
may have been overwritten, the exception handler may not be valid, the
instruction after WFI (which mepc will point to) may not be valid, and
so on.

Likewise, the restart functionality would ensure that the page tables
are valid (or specify a value for VM, though as said before I'd rather
leave enter the supervisor with paging disabled, and leaving it in full
control of enabling/disabling it).

Paolo

kr...@berkeley.edu

unread,

Nov 20, 2016, 3:13:59 PM11/20/16

to jcb6...@gmail.com, Stefan O'Rear, Michael Clark, ron minnich, RISC-V ISA Dev

There seems to an assumption in this email thread that mandating that
{s/u}hartid is in a CSR will make access to it more efficient than
reading it from a memory structure.

This assumption is probably false for most future high-performance
processors.

It does not take a great leap of faith to believe that future
high-performance processors will be highly optimized to read
frequently accessed values from memory. As this is needed everywhere,
it will be highly tuned.

It does take a much larger dose of optimism to believe that future CPU
architects will optimize access to this CSR, as opposed to introducing
pipeline hiccups. Even if the s/uhartid was accessed so frequently
that optimizing its access made sense, those future CPU architects
will be cursing whoever mandated its existence.

I am sure that read-mostly info passed from SEE to OS can be placed in
carefully optimized and appropriately protected memory structures, and
will perform much better than any CSR implementation in practice, as
well as improving flexbility for future software stacks and reducing
design/verification/performance overhead on all future hardware
implementations.

Krste

| --
| You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

| To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5830FBB1.7060201%40gmail.com.

Allen J. Baum

unread,

Nov 20, 2016, 3:36:08 PM11/20/16

to kr...@berkeley.edu, jcb6...@gmail.com, Stefan O'Rear, Michael Clark, ron minnich, RISC-V ISA Dev

I've seen this from the Intel side. CSR (MSRs in Intel processors, typically) access is a HW intensive resource (there are a LOT of them) and typically this is performed using a serial access chain that winds through the chips, possibly going through multiple clock domains on its way.

Several hundred cycles to get a register may be optimistic. Waaaay worse than that if multiple HARTs are trying to read a shared resource (since it is outside the core and there is latency and contention delay on top of the serial access chain) though this is not necessarily the case for hart_id, though the simplest implementation would be, oddly).

I agree with Krste.
The only CSR that needs really fast access (that I'm aware of) is a high performance timer; there are applications that need that to be precise and fast, else they perturb the measurement. Predictable access may be even more important than that, and might be sufficient.

The various interrupt status registers are the next CSRs that may need faster access; I don't know how important that is. Its clearly more important in environments that have frequent interrupts (networking applications?), but other than that I'm not qualified to say.

>|| and trying to minimize the complexity cost that everyone pays - a new

>|| S-mode register, and especially a writable one, has to clear a very
>|| high bar.
>
>| To be clear, I suggest that it would only be writable through an H-mode
>| shadow CSR that would only exist on implementations that actually
>| support virtualized S-mode harts. Implementations lacking that
>| virtualization support could simply have a read-only alias of mhartid.
>
>| -- Jacob
>
>| --
>| You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>| To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
>| To post to this group, send email to isa...@groups.riscv.org.
>| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>| To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5830FBB1.7060201%40gmail.com.
>
>--
>You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
>To post to this group, send email to isa...@groups.riscv.org.
>Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

>To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/22578.1027.101855.902025%40KAiMac.local.

--
**************************************************
* Allen Baum tel. (908)BIT-BAUM *
* 248-2286 *
**************************************************

Jacob Bachmeyer

unread,

Nov 20, 2016, 5:00:07 PM11/20/16

to Allen J. Baum, kr...@berkeley.edu, Stefan O'Rear, Michael Clark, ron minnich, RISC-V ISA Dev

Allen J. Baum wrote:
> I've seen this from the Intel side. CSR (MSRs in Intel processors, typically) access is a HW intensive resource (there are a LOT of them) and typically this is performed using a serial access chain that winds through the chips, possibly going through multiple clock domains on its way.
>
> Several hundred cycles to get a register may be optimistic. Waaaay worse than that if multiple HARTs are trying to read a shared resource (since it is outside the core and there is latency and contention delay on top of the serial access chain) though this is not necessarily the case for hart_id, though the simplest implementation would be, oddly).
>
> I agree with Krste.
> The only CSR that needs really fast access (that I'm aware of) is a high performance timer; there are applications that need that to be precise and fast, else they perturb the measurement. Predictable access may be even more important than that, and might be sufficient.
>

I have previously suggested a means for the high-performance realtime
timer to be a core-local register.

> The various interrupt status registers are the next CSRs that may need faster access; I don't know how important that is. Its clearly more important in environments that have frequent interrupts (networking applications?), but other than that I'm not qualified to say.
>

The trap-handling CSRs--*scratch, *cause, *epc, *badaddr--are
essentially general-purpose registers, some of which are written by
hardware when taking a trap. I reiterate my previous suggestion that
*ip be moved out of this group and that this range of CSR addresses
(0xX4X) be dedicated to side-effect-free CSRs.

-- Jacob

Samuel Falvo II

unread,

Nov 20, 2016, 6:10:35 PM11/20/16

to Allen J. Baum, Krste Asanovic, Jacob Bachmeyer, Stefan O'Rear, Michael Clark, ron minnich, RISC-V ISA Dev

On Sun, Nov 20, 2016 at 12:36 PM, Allen J. Baum
<allen...@esperantotech.com> wrote:
> The various interrupt status registers are the next CSRs that may need faster access; I don't know how important that is

This is vitally important. This needs to be comparable to the timer
registers, because hard real-time operating systems often depends on
latencies measured in clock cycles. A CPU which cannot provide
statically predictable and rapid access to at least xstatus, xie,
xcause, and/or xbadaddr _cannot_ support real-time operating systems,
and will have measurable, and depending on the construction of the
embedded device, observable effects on operation and latency.

I would argue that any CSR in the S-mode register set should be
accessed with haste, since they affect the operational performance of
an operating system directly. If you add I/O registers to your CSRs,
that's different; maybe those can afford some additional latency. But
when it comes to those affecting interrupts and other OS-level
resources, those should definitely be handled with some priority.

Reply all

Reply to author

Forward