Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

ARM BKPT <imm8> equivalent on RISC-V

901 views
Skip to first unread message

Liviu Ionescu

unread,
Nov 28, 2017, 7:33:27 PM11/28/17
to RISC-V ISA Dev, Krste Asanovic, Andrew Waterman, Yunsup Lee, Megan Wachs, Drew Barbier
I'm starting a new thread, since the related discussion in the debug list apparently was not very well understood.

According to the current ISA specs, there is a single instruction, EBREAK, that breaks execution to a debugger.


Other architectures have multiple such instructions, for example the ARM BKPT has an 8-bit immediate value. Value #0 is the default breakpoint instruction, all other values are available for various specific needs, for example for semihosting, a technique supported by many debuggers (like J-Link, OpenOCD, QEMU, etc) that allows to forward some of the POSIX calls to the debugger, helping write unit tests.


The question is, in the absence of multiple EBREAKs, how to define a sequence of instructions that include the RISC-V unique EBREAK, but also allows to safely pass an additional value to the debugger?


Megan suggested to reserve some values for the `mscratch` register, and have the debugger test these values; assuming this CSR is used to store a pointer, small values non multiple of 4, like 1,2,3,5,6,7,... might be used for this purpose.

Currently only one additional EBREAK is needed, for semihosting, so, if I have to choose, I would take 7.


Any other suggestions?


Thank you,

Liviu



Andrew Waterman

unread,
Nov 28, 2017, 7:48:09 PM11/28/17
to Liviu Ionescu, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.

Bruce Hoult

unread,
Nov 28, 2017, 8:43:07 PM11/28/17
to Liviu Ionescu, RISC-V ISA Dev, Krste Asanovic, Andrew Waterman, Yunsup Lee, Megan Wachs, Drew Barbier
What is the reason for not using the same ECALL interface for semihosting as for running under an actual *nix OS?




--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5CEC97CF-A405-46F0-A873-20A646265C14%40livius.net.

Michael Clark

unread,
Nov 28, 2017, 11:41:31 PM11/28/17
to Bruce Hoult, Liviu Ionescu, RISC-V ISA Dev, Krste Asanovic, Andrew Waterman, Yunsup Lee, Megan Wachs, Drew Barbier


> On 29/11/2017, at 2:43 PM, Bruce Hoult <br...@hoult.org> wrote:
>
> What is the reason for not using the same ECALL interface for semihosting as for running under an actual *nix OS?

I don’t know. That was my suggestion. Get the debugger to hook mtvec and have the handler route via the debug transport or otherwise handle locally.

It’s an elegant solution because it allows one to maintain binary compatibility.

AFAICT the intent is redirecting POSIX calls so it’s not exactly a case of bare-metal e.g. like bbl, linux or any other low level code that doesn’t actually make POSIX calls.

> On Wed, Nov 29, 2017 at 3:33 AM, Liviu Ionescu <i...@livius.net> wrote:
> I'm starting a new thread, since the related discussion in the debug list apparently was not very well understood.
>
> According to the current ISA specs, there is a single instruction, EBREAK, that breaks execution to a debugger.
>
>
> Other architectures have multiple such instructions, for example the ARM BKPT has an 8-bit immediate value. Value #0 is the default breakpoint instruction, all other values are available for various specific needs, for example for semihosting, a technique supported by many debuggers (like J-Link, OpenOCD, QEMU, etc) that allows to forward some of the POSIX calls to the debugger, helping write unit tests.
>
>
> The question is, in the absence of multiple EBREAKs, how to define a sequence of instructions that include the RISC-V unique EBREAK, but also allows to safely pass an additional value to the debugger?
>
>
> Megan suggested to reserve some values for the `mscratch` register, and have the debugger test these values; assuming this CSR is used to store a pointer, small values non multiple of 4, like 1,2,3,5,6,7,... might be used for this purpose.
>
> Currently only one additional EBREAK is needed, for semihosting, so, if I have to choose, I would take 7.
>
>
> Any other suggestions?
>
>
> Thank you,
>
> Liviu
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAMU%2BEkwnbB818ppuqeGLcOiMzJokkK9ot_R2MKZ6v29vJs%2BjcA%40mail.gmail.com.

Liviu Ionescu

unread,
Nov 29, 2017, 3:53:31 AM11/29/17
to Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>
> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.

yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.

I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.

otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.

things get pretty messy, only to compensate in software what could have been an EBREAK with at least a 2-4 bits immediate value field. :-(



regards,

Liviu

Andrew Waterman

unread,
Nov 29, 2017, 4:13:13 AM11/29/17
to Liviu Ionescu, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
On Wed, Nov 29, 2017 at 12:53 AM, Liviu Ionescu <i...@livius.net> wrote:
>
>
>> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>>
>> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.
>
> yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.
>
> I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.
>
> otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.

Right - this needs to be done with interrupts disabled.

>
> things get pretty messy, only to compensate in software what could have been an EBREAK with at least a 2-4 bits immediate value field. :-(

I can see your point, but changing the ISA is all the more messy.

>
>
>
> regards,
>
> Liviu
>

Liviu Ionescu

unread,
Nov 29, 2017, 4:27:21 AM11/29/17
to Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 29 Nov 2017, at 11:12, Andrew Waterman <and...@sifive.com> wrote:
>
> On Wed, Nov 29, 2017 at 12:53 AM, Liviu Ionescu <i...@livius.net> wrote:
>>
>>
>>> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>>>
>>> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.
>>
>> yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.
>>
>> I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.
>>
>> otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.
>
> Right - this needs to be done with interrupts disabled.

Ok.

Megan, can you suggest a new implementation for the `call_host()` function?

>
>>
>> things get pretty messy, only to compensate in software what could have been an EBREAK with at least a 2-4 bits immediate value field. :-(
>
> I can see your point, but changing the ISA is all the more messy.

I can see your point too, probably for this specific case we can live with an elaborated workaround, but, more generally, does it mean that if, in time, more issues with the current (tight) ISA will be discovered, there will never be new versions to fix them, and the required software workarounds to compensate for the problems will become more and more elaborate?


Regards,

Liviu


Alex Marshall

unread,
Nov 29, 2017, 6:56:18 PM11/29/17
to Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
> things get pretty messy, only to compensate in software what could have
> been an EBREAK with at least a 2-4 bits immediate value field. :-(
>
I believe there is actually a rather large number of unused bits in the EBREAK instruction encoding that could have been used to include an immediate field. Theoretically it should be possible to slightly hack up your assembler to allow you to fill those bits, if your RISC-V implementation supports ignoring them (that is, doesn't trap on filled 'reserved' bits).

Why this wasn't the original way EBREAK was designed, I can't say. I also may be incorrect about the reserved bits existing, maybe I've read the encodings incorrectly.

Thanks,
Alex


-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Jacob Bachmeyer

unread,
Nov 29, 2017, 7:15:44 PM11/29/17
to Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Liviu Ionescu wrote:
>> On 29 Nov 2017, at 11:12, Andrew Waterman <and...@sifive.com> wrote:
>>
>> On Wed, Nov 29, 2017 at 12:53 AM, Liviu Ionescu <i...@livius.net> wrote:
>>
>>>> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>>>>
>>>> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.
>>>>
>>> yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.
>>>
>>> I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.
>>>
>>> otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.
>>>
>> Right - this needs to be done with interrupts disabled.
>>
>
> Ok.
>
> Megan, can you suggest a new implementation for the `call_host()` function?
>

Why not just use ECALL for 'call_host()'? In a semi-hosted system, the
"magic" link to the host is the environment, after all.


-- Jacob

Bruce Hoult

unread,
Nov 29, 2017, 7:15:45 PM11/29/17
to Alex Marshall, Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
ECALL and EBREAK are SYSTEM opcode (bits 0..6), with 000 in funct3. That's enough to uniquely identify them (at present) as nothing else uses that combination.

Then rs1 and rdst are specified as both 00000 and imm12 as 000000000000 for ECALL and 000000000001 for EBREAK.

Plenty of room to do something with a few more bits of imm12 while leaving rs1 and rdst nonzero for future expansion.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Nov 29, 2017, 7:31:09 PM11/29/17
to jcb6...@gmail.com, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 02:15, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Megan, can you suggest a new implementation for the `call_host()` function?
>>
>
> Why not just use ECALL for 'call_host()'? In a semi-hosted system, the "magic" link to the host is the environment, after all.

Jacob,

please take a HiFive1 board, write two small programs one with an EBREAK and one with an ECALL, and run them in a debugger.

when you hit EBREAK, the debugger will halt and you'll see that the PC is exactly at the EBREAK instruction. a semihosted debugger will examine the EBREAK and if some specific condition is met (like mscratch is 0x7), will process the request and continue execution after the EBREAK.

when you hit ECALL, the FE310-G000 device will most probably crash, or at best will remain in a loop in the trap handler, if you cared to install one. the debugger has absolutely no idea about this, you have to manually halt execution and examine the PC.


regards,

Liviu


Liviu Ionescu

unread,
Nov 29, 2017, 7:37:57 PM11/29/17
to Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 02:15, Bruce Hoult <br...@hoult.org> wrote:
>
> ECALL and EBREAK are SYSTEM opcode (bits 0..6), with 000 in funct3. That's enough to uniquely identify them (at present) as nothing else uses that combination.
>
> Then rs1 and rdst are specified as both 00000 and imm12 as 000000000000 for ECALL and 000000000001 for EBREAK.
>
> Plenty of room to do something with a few more bits of imm12 while leaving rs1 and rdst nonzero for future expansion.

sure, if there are any free bits in the current encoding, it would be great to reserve some for an extended EBREAK that has an immediate value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably be enough).


regards,

Liviu

Jacob Bachmeyer

unread,
Nov 29, 2017, 10:25:43 PM11/29/17
to Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
There is no way for the debugger to intercept ECALL? That sounds like
an oversight in the HiFive chip, not an architectural shortcoming.


-- Jacob

Jacob Bachmeyer

unread,
Nov 29, 2017, 10:32:20 PM11/29/17
to Liviu Ionescu, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by
using rs1 to refer to an actual register holding a function code. The
immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be
intended as a 12-bit function code.

Using rs1 to select a register holding a parameter is a smaller change
to the ISA than giving magic semantics to mscratch. Further, EBREAK is
available in all modes, while mscratch is accessible only in M-mode.


-- Jacob

Liviu Ionescu

unread,
Nov 30, 2017, 2:18:44 AM11/30/17
to Jacob Bachmeyer, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 05:32, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> sure, if there are any free bits in the current encoding, it would be great to reserve some for an extended EBREAK that has an immediate value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably be enough).
>>
>
> We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by using rs1 to refer to an actual register holding a function code. The immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be intended as a 12-bit function code.

yes, for the future, if Andrew agrees, I welcome such a solution (the mscratch magic is not a solution, it is a workaround), but for now, as long as the specs do not mention it, I doubt it is a safe choice.

> Using rs1 to select a register holding a parameter is a smaller change to the ISA than giving magic semantics to mscratch. Further, EBREAK is available in all modes, while mscratch is accessible only in M-mode.

maybe I'm short sighted, but why would I use semihosting in any other mode but M?


Liviu

Andrew Waterman

unread,
Nov 30, 2017, 2:45:26 AM11/30/17
to Liviu Ionescu, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
On Wed, Nov 29, 2017 at 11:18 PM Liviu Ionescu <i...@livius.net> wrote:


> On 30 Nov 2017, at 05:32, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> sure, if there are any free bits in the current encoding, it would be great to reserve some for an extended EBREAK that has an immediate value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably be enough).
>>
>
> We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by using rs1 to refer to an actual register holding a function code.  The immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be intended as a 12-bit function code.

yes, for the future, if Andrew agrees, I welcome such a solution (the mscratch magic is not a solution, it is a workaround), but for now, as long as the specs do not mention it, I doubt it is a safe choice.

It’s not my decision; it’s just against my advisement.

Liviu Ionescu

unread,
Nov 30, 2017, 3:31:13 AM11/30/17
to Andrew Waterman, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 09:45, Andrew Waterman <and...@sifive.com> wrote:
>
>
> yes, for the future, if Andrew agrees, I welcome such a solution (the mscratch magic is not a solution, it is a workaround), but for now, as long as the specs do not mention it, I doubt it is a safe choice.
>
> It’s not my decision; it’s just against my advisement.

as Jacob noticed,

> mscratch is accessible only in M-mode

so, for other use cases (more elaborate than my M-mode semihosting), the mscratch workaround might not be available.

perhaps a short study on how other architectures use multiple BRK instructions would be useful. Cortex-M reserves an 8-bit value, but personally I used only the semihosting one (but I used it heavily). x86 also has an 8-bit value for INT, but they use it for slightly different purposes.

if anyone else has more experience with BRK use on other architectures, please share.


regards,

Liviu

Cesar Eduardo Barros

unread,
Nov 30, 2017, 6:05:17 AM11/30/17
to jcb6...@gmail.com, Liviu Ionescu, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Em 30-11-2017 01:32, Jacob Bachmeyer escreveu:
> Liviu Ionescu wrote:
>>> On 30 Nov 2017, at 02:15, Bruce Hoult <br...@hoult.org> wrote:
>>>
>>> ECALL and EBREAK are SYSTEM opcode (bits 0..6), with 000 in funct3.
>>> That's enough to uniquely identify them (at present) as nothing else
>>> uses that combination.
>>>
>>> Then rs1 and rdst are specified as both 00000 and imm12 as
>>> 000000000000 for ECALL and 000000000001 for EBREAK.
>>>
>>> Plenty of room to do something with a few more bits of imm12 while
>>> leaving rs1 and rdst nonzero for future expansion.
>>
>> sure, if there are any free bits in the current encoding, it would be
>> great to reserve some for an extended EBREAK that has an immediate
>> value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably
>> be enough).
>
> We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by
> using rs1 to refer to an actual register holding a function code.  The
> immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be
> intended as a 12-bit function code.

One of the coolest properties of the design of the base RISC-V ISA is
that the major opcode (bits 0-6) is enough to determine the instruction
format, and that it is enough to determine which registers should be
read (rs1, rs2, rs3) and written (rd), without decoding anything else.
Let's not lose that property. Immediate mode CSR instructions might do a
useless read of a register on a naive implementation, since they reuse
rs1 as an immediate, but that's not a problem; reusing rd for an
immediate, on the other hand, would require special-casing to suppress
the register write.

The SYSTEM opcode is I-type, and both ECALL and EBREAK have rd=x0; let's
keep it that way, unless the instruction is actually writing to a register.

>
> Using rs1 to select a register holding a parameter is a smaller change
> to the ISA than giving magic semantics to mscratch.  Further,  EBREAK is
> available in all modes, while mscratch is accessible only in M-mode.

Since SYSTEM is I-type, and a naive implementation might already be
reading from rs1, I agree that it's a small change to the ISA. Using rs1
as an immediate is also a small change to the ISA, given that there's
precedent with the CSR immediate instructions, and a naive
implementation can simply discard the eagerly read rs1 register.

You could even do both, using separate funct12 values for "EBREAK
reading from rs1" and "EBREAK treating rs1 as an immediate".

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Liviu Ionescu

unread,
Nov 30, 2017, 6:47:02 AM11/30/17
to Cesar Eduardo Barros, Jacob Bachmeyer, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 13:04, Cesar Eduardo Barros <ces...@cesarb.eti.br> wrote:
>
> One of the coolest properties of the design of the base RISC-V ISA is that the major opcode (bits 0-6) is enough to determine the instruction format, and that it is enough to determine which registers should be read (rs1, rs2, rs3) and written (rd), without decoding anything else. Let's not lose that property.

agree.

> You could even do both, using separate funct12 values for "EBREAK reading from rs1" and "EBREAK treating rs1 as an immediate".

great!

this means we have at least two options to choose from.

from my point of view, a short immediate would be enough. however, without being an ISA guru, using register fields for immediate values doesn't look very nice.

---

if I understand the encoding right, from the imm[11:0] field, only two values are used, 000000000000 for ECALL and 000000000001 for EBREAK.

perhaps we can use 00000000xxx1 for EBREAK, similar to what Jacob suggested, and extend the syntax of EBREAK to EBREAK #N.

if so, I would reserve EBREAK #7 for semihosting.

I guess this is a minor change to ISA, which does not break any compatibility, and would not consume too much of the encoding space. and I guess it'll not make any problems to compress, there are 5+1 bits for the imm field, more than enough.

and, not relying on mscratch, the extended EBREAKs are available in all modes, not only to M.


regards,

Liviu

Bruce Hoult

unread,
Nov 30, 2017, 7:46:56 AM11/30/17
to Liviu Ionescu, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Most architectures, including x86 and Thumb1, that have a literal argument to INT/SVC/BRK/whatever seem to top out at 8 bits.

But ARM has 24 bits! Aarch64 has individual instructions (SVC, MHV, SMC) to make system calls to interrupt levels 1/2/3 or to make calls to debugger at interrupt levels 1/2/3 (DCPS1, DCPS2, DCPS3) as well as BRK (self hosted) and HLT. All have a 16 bit immediate.

If you look at...

https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/unistd.h

.. standard Linux system call numbers currently go up to 292 plus some deprecated or nonstandard ones starting at 1024.

Almost every Linux architecture passes the syscall number in a register.

IBM s390(x) can pass system calls less than 256 in an immediate in the svc instruction, but if the number if greater than 256 them it falls back to using svc 0 with the syscall number in register r1.

ARM OABI uses an immediate in the SVC instruction which as noted has a 24 bit immediate in ARM32 mode but only 8 bit in Thumb1. Thumb2 doesn't add any larger SVC immediate and there is no register fallback. So OABI can't be used with modern Linux.

ARM EABI uses SVC 0 and passes the system call number in r7.

i386 uses INT 0 with the syscall number in eax.
x86_64 uses the syscall instruction with the syscall number in rax


I think it's nice to be able to use a single instruction with the syscall number in an immediate, but you need to make it big enough! Which means at least 9 bits today for standard Linux syscalls, or 11 bits for deprecated ones. Maybe 12 bits would be future-proof.

Copying the s390 fallback to a register would be a good idea. In that case an 8 bit immediate might be enough as the syscalls above 255 look like they're probably rarely used.

Being able to put the syscall number in a register as well as in the immediate would also vastly simplify code to dynamically call different system calls. For example wrappers for logging or whatever. If you can't use a register then you're faced with either a huge switch or else runtime code generation. Ugly. 


Liviu

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

kr...@berkeley.edu

unread,
Nov 30, 2017, 9:05:32 AM11/30/17
to Bruce Hoult, Liviu Ionescu, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee

Thanks Bruce,

I think you've just argued very convincingly for passing the syscall
number in a register.

Krste
| email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-
| dev/.
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/183F2952-832B-405A-BF45-
| 7D03570DD6D2%40livius.net.


| --
| You received this message because you are subscribed to the Google Groups
| "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email
| to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/
| .
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/
| CAMU%2BEkxbQ44BVHCW7w4_t-MKeRR5L%2Bha3Xo01X8EKC2ee2NFDg%40mail.gmail.com.

Liviu Ionescu

unread,
Nov 30, 2017, 9:43:42 AM11/30/17
to Bruce Hoult, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 14:46, Bruce Hoult <br...@hoult.org> wrote:
>
> Most architectures, including x86 and Thumb1, that have a literal argument to INT/SVC/BRK/whatever seem to top out at 8 bits. ... But ARM has 24 bits! ... I think it's nice to be able to use a single instruction with the syscall number in an immediate, but you need to make it big enough! ...

thank you, Bruce.

I was also used to see system calls with literal arguments, and I was surprised that the RISC-V ISA took the minimalistic approach and define a single ECALL and a single EBREAK, delegating everything to software (as if transistors were expensive and software cheap).

but, at least for ECALL, it is possible to pass the syscall number in a register, while for EBREAK it is not.

ECALLs with literal arguments might be useful to support multiple ABIs, or multiple ABI versions.

even for small embedded devices it is common to use different SVCs, for various internal RTOS functions (see ARM RTX, for example).


but not ECALLs are my main concern, I'd be happy to have a few more EBREAKs, to no longer need the mscratch workaround...


regards,

Liviu






Michael Chapman

unread,
Nov 30, 2017, 9:54:32 AM11/30/17
to isa...@groups.riscv.org
EBREAK is just a request from the software for the debugger to intervene.

If you sometimes want to pass parameters, then create a specific
function for that, place all the parameters in registers and put a lable
on the EBREAK instruction.  If you want to emulate system calls in the
debugger, then all you need is for the SYSCALL handler for ECALL to call
that function.

The debugger can examine the label on the EBREAK in the ELF executable
being debugged to determine the service requested be it a SW breakpoint
(no special label), SYSCALL emulation (special label such as
$dbg_syscall) or some other service. Indeed, if you really want 256
different kinds of EBREAKs, then just use 256 different special labels!

There is no point in encoding stuff in instructions in HW on the target
device which does not need to be there.

Megan Wachs

unread,
Nov 30, 2017, 10:02:08 AM11/30/17
to Michael Chapman, RISC-V ISA Dev
Right, now that I know that debuggers like OpenOCD can get this information from GDB, I'd prefer the ELF-labeling solution to changing the ISA or relying on values in mscratch (apologies to those who are on both threads, I think we have the same discussion going on in two places).

Megan

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.



--
Megan A. Wachs
Engineer | SiFive, Inc 
1875 South Grant Street
Suite 600
San Mateo, CA 94402

Liviu Ionescu

unread,
Nov 30, 2017, 10:51:10 AM11/30/17
to Bruce Hoult, Krste Asanovic, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 17:30, Bruce Hoult <br...@hoult.org> wrote:
>
> Passing in a register is definitely the simplest and most future-proof. ... So, it's pretty clear for syscalls.

how do you suggest to handle multiple ABIs?

with multiple ECALLs it is easy, you assign one for each ABI, and which may have completely different syscall numbers, registers, etc, and all work in parallel.

> But what about debugger traps? If you're going to insert breakpoints by replacing user instructions (rather than using hardware breakpoint address registers) then you'd like to replace as few bytes of them and as few instructions as possible.

if you really need to replace multiple user instructions with long magic sequences simply because the ISA does not support multiple BRKs, probably you selected the wrong ISA.

> Ideally you'd want a 16 bit debugger trap instruction with a few bits of literal, so you could replace a single RVC instruction. Or if the instruction is a bigger one, replace it with a 16 bit debugger trap instruction and one or more 16 bit NOPs.

that's correct.

> PLUS .. where do you find a register to put the constant for the debugger? ... Hence the suggestion to use odd-valued magic numbers in mscratch for this communication. But that also needs several instructions .. and looks like m mode only.

yes. also ugly and limited.

for the existing embedded RISC-V devices, which are M-mode only, that would not be a problem, but future devices might also take the ARM path, which is promoting dual mode embedded devices, with lots of security improvements (see large Cortex-M devices, especially M8), and in this case the mscratch workaround cannot be used. :-(


regards,

Liviu

Andrew Waterman

unread,
Nov 30, 2017, 12:22:14 PM11/30/17
to Liviu Ionescu, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
On Thu, Nov 30, 2017 at 7:51 AM Liviu Ionescu <i...@livius.net> wrote:


> On 30 Nov 2017, at 17:30, Bruce Hoult <br...@hoult.org> wrote:
>
> Passing in a register is definitely the simplest and most future-proof. ... So, it's pretty clear for syscalls.

how do you suggest to handle multiple ABIs?

with multiple ECALLs it is easy, you assign one for each ABI, and which may have completely different syscall numbers, registers, etc, and all work in parallel.

> But what about debugger traps? If you're going to insert breakpoints by replacing user instructions (rather than using hardware breakpoint address registers) then you'd like to replace as few bytes of them and as few instructions as possible.

if you really need to replace multiple user instructions with long magic sequences simply because the ISA does not support multiple BRKs, probably you selected the wrong ISA.

> Ideally you'd want a 16 bit debugger trap instruction with a few bits of literal, so you could replace a single RVC instruction. Or if the instruction is a bigger one, replace it with a 16 bit debugger trap instruction and one or more 16 bit NOPs.

that's correct.

> PLUS .. where do you find a register to put the constant for the debugger? ... Hence the suggestion to use odd-valued magic numbers in mscratch for this communication. But that also needs several instructions .. and looks like m mode only.

yes. also ugly and limited.

A fixed-size immediate is limiting. What we’ve done is to introduce a level of indirection to avoid this limitation, which at the same time avoids changing the ISA for a pretty dumb reason.



for the existing embedded RISC-V devices, which are M-mode only, that would not be a problem, but future devices might also take the ARM path, which is promoting dual mode embedded devices, with lots of security improvements (see large Cortex-M devices, especially M8), and in this case the mscratch workaround cannot be used. :-(

There will always be some register available to indicate the purpose of the request. The combination of the opcode, active ABI, and originating privilege mode will—by design—suffice to indicate where to look.




regards,

Liviu

Liviu Ionescu

unread,
Nov 30, 2017, 12:39:11 PM11/30/17
to Andrew Waterman, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 19:21, Andrew Waterman <and...@sifive.com> wrote:
>
> ... mscratch, M-mode only ... There will always be some register available to indicate the purpose of the request. The combination of the opcode, active ABI, and originating privilege mode will—by design—suffice to indicate where to look.

with the mscratch ruled out for being M-mode only, what other options do we have?

is there any other register, available in all modes, that is guaranteed to **not** have a certain range of values, so we can use it to temporarily store the magic?

otherwise the single EBRAK alone is not enough, it can be placed by the debugger as a breakpoint at any location.


regards,

Liviu

Andrew Waterman

unread,
Nov 30, 2017, 12:44:37 PM11/30/17
to Liviu Ionescu, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
I was talking about ECALL more than EBREAK.

U-mode doesn’t (shouldn’t) need to know about semihosting. There are
existing ABIs for U-mode to request services of greater privilege modes.




regards,

Liviu

Liviu Ionescu

unread,
Nov 30, 2017, 1:35:47 PM11/30/17
to Andrew Waterman, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 19:44, Andrew Waterman <and...@sifive.com> wrote:
>
>
> U-mode doesn’t (shouldn’t) need to know about semihosting. There are
> existing ABIs for U-mode to request services of greater privilege modes.

Not necessarily.

I don't know the details of the U-mode in the context of embedded RISC-V devices, but in the Cortex-M world the so-called 'user mode' has nothing to do with a kernel, ABIs, or things like this, it only restricts access to some registers, and as such, breaking to the debugger shouldn't be a problem.

Actually, when asked about semihosting, most ARM users will reply that it is a method to display printf() messages. Yes, it is, but not only; semihosting has two separate calls to output bytes/strings on a 'debug channel', which indeed can be used for trace::printf() messages during regular debug sessions; in addition, it also has a larger set of calls for fully semihosted applications.

I see no reason why semihosting calls not be available in both machine and user mode, how else could I write unit tests that run in user mode?

Please note that during semihosting calls, the application is simply halted, regardless of its mode, since the core executes a regular BRK.


Regards,

Liviu





Michael Chapman

unread,
Nov 30, 2017, 1:43:30 PM11/30/17
to isa...@groups.riscv.org

Cores which support compressed mode already have two EBREAK
instructions. The debugger will used the compressed one for SW
breakpoint. The other one can be used for semihosting and any other
interaction required with the debugger.

For cores without compressed mode, OpenOCD will know whether it placed a
EBREAK there or not. If it placed it there then it is a SW breakpoint.
Otherwise it is a semihosting call to the debugger. Also the EBREAK
could be followed by a word which is an illegal instruction pattern
which is extremely unlike to occur in any application - i.e. would not
ever be the next instruction after a SW breakpoint to make this distinction.


On 30-Nov-17 18:39, Liviu Ionescu wrote:
> ...

Tommy Thorn

unread,
Nov 30, 2017, 2:47:27 PM11/30/17
to Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Just catching up to an old discussion, but it seems to me that there's another
option that is simpler: put the magic number is the instruction word following
the EBREAK, optionally we should use an ADDI x0, x0, MAGIC if we want
to keep it sane.

The code/debugger that handles the EBREAK obviously have the address of
the EBREAK instruction and can inspect the code stream.

Tommy


On Wed, Nov 29, 2017 at 12:53 AM, Liviu Ionescu <i...@livius.net> wrote:


> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>
> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.

yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.

I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.

otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.

things get pretty messy, only to compensate in software what could have been an EBREAK with at least a 2-4 bits immediate value field. :-(



regards,

Liviu

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Nov 30, 2017, 2:54:16 PM11/30/17
to Tommy Thorn, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 21:47, Tommy Thorn <tommy...@esperantotech.com> wrote:
>
> ... put the magic number is the instruction word following
> the EBREAK, optionally we should use an ADDI x0, x0, MAGIC if we want
> to keep it sane.

if `ADDI x0, x0, MAGIC` is a legal instruction, it can appear in a program, and if the debugger places a breakpoint to the instruction just before it, it'll look like a semihosting call.

we need a solution that, in a legal program, **guarantees** that it only matches semihosting calls, and never matches breakpoints placed by the debugger.


regards,

Liviu




Michael Chapman

unread,
Nov 30, 2017, 3:17:14 PM11/30/17
to isa...@groups.riscv.org

Breakpoints are placed by the debugger/openOCD. They know both where
they are. Therefore an EBREAK at some other location is a semihosting call.

Michael Chapman

unread,
Nov 30, 2017, 3:31:30 PM11/30/17
to Liviu Ionescu, isa...@groups.riscv.org

You should use a semihosting call (to print out the PC and a message)
for that!


On 30-Nov-17 21:28, Liviu Ionescu wrote:
>
>> On 30 Nov 2017, at 22:20, Michael Chapman <michael.c...@gmail.com> wrote:
>>
>>
>> Breakpoints are placed by the debugger/openOCD. They know both where
>> they are. Therefore an EBREAK at some other location is a semihosting call.
> Not necessarily, for Debug build configurations I manually add BREAK instructions instead (actually before) infinite loops (like in asserts, unused trap handlers, or should-not-reach-this places); it is much more convenient to have the debugger halt and immediately show you the location than entering an infinite loop, waiting for a while, start wondering what happened and finally manually halt execution.
>
> Regards,
>
> Liviu
>
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>

Jacob Bachmeyer

unread,
Nov 30, 2017, 11:45:29 PM11/30/17
to Liviu Ionescu, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Liviu Ionescu wrote:
>> On 30 Nov 2017, at 05:32, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>>
>>
>>> sure, if there are any free bits in the current encoding, it would be great to reserve some for an extended EBREAK that has an immediate value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably be enough).
>>>
>>>
>> We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by using rs1 to refer to an actual register holding a function code. The immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be intended as a 12-bit function code.
>>
>
> yes, for the future, if Andrew agrees, I welcome such a solution (the mscratch magic is not a solution, it is a workaround), but for now, as long as the specs do not mention it, I doubt it is a safe choice.
>

To clarify, the 12-bit "funct12" code is part of the opcode -- one of
those functions is EBREAK. That field is *not* available for EBREAK
<imm>. Using the rs1 field in EBREAK to provide "EBREAK <reg>" and a
semihost call number in that register is probably a better option.

>> Using rs1 to select a register holding a parameter is a smaller change to the ISA than giving magic semantics to mscratch. Further, EBREAK is available in all modes, while mscratch is accessible only in M-mode.
>>
>
> maybe I'm short sighted, but why would I use semihosting in any other mode but M?
>

Exactly the reason I do not like the mscratch workaround -- EBREAK is
available in all modes, but you are suggesting a use specific to
M-mode. Interestingly, ECALL is also available in all modes and is
expressly intended for performing calls to an environment. Why not use
ECALL for your semihosting platform? (If nothing else, your M-mode trap
handler can recognize an M-mode ECALL and execute an EBREAK at a
specific, known address. The semihosting interface recognizes an EBREAK
at *that* address as a semihost call, handles it, and resumes execution,
while passing other EBREAKs to the user's debugger. This is really a
workaround for insufficient debugging support -- the debugger should be
able to intercept an M-mode ECALL directly.)



-- Jacob

Jacob Bachmeyer

unread,
Nov 30, 2017, 11:54:03 PM11/30/17