ARM BKPT <imm8> equivalent on RISC-V

629 views
Skip to first unread message

Liviu Ionescu

unread,
Nov 28, 2017, 7:33:27 PM11/28/17
to RISC-V ISA Dev, Krste Asanovic, Andrew Waterman, Yunsup Lee, Megan Wachs, Drew Barbier
I'm starting a new thread, since the related discussion in the debug list apparently was not very well understood.

According to the current ISA specs, there is a single instruction, EBREAK, that breaks execution to a debugger.


Other architectures have multiple such instructions, for example the ARM BKPT has an 8-bit immediate value. Value #0 is the default breakpoint instruction, all other values are available for various specific needs, for example for semihosting, a technique supported by many debuggers (like J-Link, OpenOCD, QEMU, etc) that allows to forward some of the POSIX calls to the debugger, helping write unit tests.


The question is, in the absence of multiple EBREAKs, how to define a sequence of instructions that include the RISC-V unique EBREAK, but also allows to safely pass an additional value to the debugger?


Megan suggested to reserve some values for the `mscratch` register, and have the debugger test these values; assuming this CSR is used to store a pointer, small values non multiple of 4, like 1,2,3,5,6,7,... might be used for this purpose.

Currently only one additional EBREAK is needed, for semihosting, so, if I have to choose, I would take 7.


Any other suggestions?


Thank you,

Liviu



Andrew Waterman

unread,
Nov 28, 2017, 7:48:09 PM11/28/17
to Liviu Ionescu, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.

Bruce Hoult

unread,
Nov 28, 2017, 8:43:07 PM11/28/17
to Liviu Ionescu, RISC-V ISA Dev, Krste Asanovic, Andrew Waterman, Yunsup Lee, Megan Wachs, Drew Barbier
What is the reason for not using the same ECALL interface for semihosting as for running under an actual *nix OS?




--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5CEC97CF-A405-46F0-A873-20A646265C14%40livius.net.

Michael Clark

unread,
Nov 28, 2017, 11:41:31 PM11/28/17
to Bruce Hoult, Liviu Ionescu, RISC-V ISA Dev, Krste Asanovic, Andrew Waterman, Yunsup Lee, Megan Wachs, Drew Barbier


> On 29/11/2017, at 2:43 PM, Bruce Hoult <br...@hoult.org> wrote:
>
> What is the reason for not using the same ECALL interface for semihosting as for running under an actual *nix OS?

I don’t know. That was my suggestion. Get the debugger to hook mtvec and have the handler route via the debug transport or otherwise handle locally.

It’s an elegant solution because it allows one to maintain binary compatibility.

AFAICT the intent is redirecting POSIX calls so it’s not exactly a case of bare-metal e.g. like bbl, linux or any other low level code that doesn’t actually make POSIX calls.

> On Wed, Nov 29, 2017 at 3:33 AM, Liviu Ionescu <i...@livius.net> wrote:
> I'm starting a new thread, since the related discussion in the debug list apparently was not very well understood.
>
> According to the current ISA specs, there is a single instruction, EBREAK, that breaks execution to a debugger.
>
>
> Other architectures have multiple such instructions, for example the ARM BKPT has an 8-bit immediate value. Value #0 is the default breakpoint instruction, all other values are available for various specific needs, for example for semihosting, a technique supported by many debuggers (like J-Link, OpenOCD, QEMU, etc) that allows to forward some of the POSIX calls to the debugger, helping write unit tests.
>
>
> The question is, in the absence of multiple EBREAKs, how to define a sequence of instructions that include the RISC-V unique EBREAK, but also allows to safely pass an additional value to the debugger?
>
>
> Megan suggested to reserve some values for the `mscratch` register, and have the debugger test these values; assuming this CSR is used to store a pointer, small values non multiple of 4, like 1,2,3,5,6,7,... might be used for this purpose.
>
> Currently only one additional EBREAK is needed, for semihosting, so, if I have to choose, I would take 7.
>
>
> Any other suggestions?
>
>
> Thank you,
>
> Liviu
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAMU%2BEkwnbB818ppuqeGLcOiMzJokkK9ot_R2MKZ6v29vJs%2BjcA%40mail.gmail.com.

Liviu Ionescu

unread,
Nov 29, 2017, 3:53:31 AM11/29/17
to Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>
> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.

yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.

I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.

otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.

things get pretty messy, only to compensate in software what could have been an EBREAK with at least a 2-4 bits immediate value field. :-(



regards,

Liviu

Andrew Waterman

unread,
Nov 29, 2017, 4:13:13 AM11/29/17
to Liviu Ionescu, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
On Wed, Nov 29, 2017 at 12:53 AM, Liviu Ionescu <i...@livius.net> wrote:
>
>
>> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>>
>> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.
>
> yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.
>
> I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.
>
> otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.

Right - this needs to be done with interrupts disabled.

>
> things get pretty messy, only to compensate in software what could have been an EBREAK with at least a 2-4 bits immediate value field. :-(

I can see your point, but changing the ISA is all the more messy.

>
>
>
> regards,
>
> Liviu
>

Liviu Ionescu

unread,
Nov 29, 2017, 4:27:21 AM11/29/17
to Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 29 Nov 2017, at 11:12, Andrew Waterman <and...@sifive.com> wrote:
>
> On Wed, Nov 29, 2017 at 12:53 AM, Liviu Ionescu <i...@livius.net> wrote:
>>
>>
>>> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>>>
>>> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.
>>
>> yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.
>>
>> I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.
>>
>> otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.
>
> Right - this needs to be done with interrupts disabled.

Ok.

Megan, can you suggest a new implementation for the `call_host()` function?

>
>>
>> things get pretty messy, only to compensate in software what could have been an EBREAK with at least a 2-4 bits immediate value field. :-(
>
> I can see your point, but changing the ISA is all the more messy.

I can see your point too, probably for this specific case we can live with an elaborated workaround, but, more generally, does it mean that if, in time, more issues with the current (tight) ISA will be discovered, there will never be new versions to fix them, and the required software workarounds to compensate for the problems will become more and more elaborate?


Regards,

Liviu


Alex Marshall

unread,
Nov 29, 2017, 6:56:18 PM11/29/17
to Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
> things get pretty messy, only to compensate in software what could have
> been an EBREAK with at least a 2-4 bits immediate value field. :-(
>
I believe there is actually a rather large number of unused bits in the EBREAK instruction encoding that could have been used to include an immediate field. Theoretically it should be possible to slightly hack up your assembler to allow you to fill those bits, if your RISC-V implementation supports ignoring them (that is, doesn't trap on filled 'reserved' bits).

Why this wasn't the original way EBREAK was designed, I can't say. I also may be incorrect about the reserved bits existing, maybe I've read the encodings incorrectly.

Thanks,
Alex


-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Jacob Bachmeyer

unread,
Nov 29, 2017, 7:15:44 PM11/29/17
to Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Liviu Ionescu wrote:
>> On 29 Nov 2017, at 11:12, Andrew Waterman <and...@sifive.com> wrote:
>>
>> On Wed, Nov 29, 2017 at 12:53 AM, Liviu Ionescu <i...@livius.net> wrote:
>>
>>>> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>>>>
>>>> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.
>>>>
>>> yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.
>>>
>>> I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.
>>>
>>> otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.
>>>
>> Right - this needs to be done with interrupts disabled.
>>
>
> Ok.
>
> Megan, can you suggest a new implementation for the `call_host()` function?
>

Why not just use ECALL for 'call_host()'? In a semi-hosted system, the
"magic" link to the host is the environment, after all.


-- Jacob

Bruce Hoult

unread,
Nov 29, 2017, 7:15:45 PM11/29/17
to Alex Marshall, Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
ECALL and EBREAK are SYSTEM opcode (bits 0..6), with 000 in funct3. That's enough to uniquely identify them (at present) as nothing else uses that combination.

Then rs1 and rdst are specified as both 00000 and imm12 as 000000000000 for ECALL and 000000000001 for EBREAK.

Plenty of room to do something with a few more bits of imm12 while leaving rs1 and rdst nonzero for future expansion.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Nov 29, 2017, 7:31:09 PM11/29/17
to jcb6...@gmail.com, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 02:15, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Megan, can you suggest a new implementation for the `call_host()` function?
>>
>
> Why not just use ECALL for 'call_host()'? In a semi-hosted system, the "magic" link to the host is the environment, after all.

Jacob,

please take a HiFive1 board, write two small programs one with an EBREAK and one with an ECALL, and run them in a debugger.

when you hit EBREAK, the debugger will halt and you'll see that the PC is exactly at the EBREAK instruction. a semihosted debugger will examine the EBREAK and if some specific condition is met (like mscratch is 0x7), will process the request and continue execution after the EBREAK.

when you hit ECALL, the FE310-G000 device will most probably crash, or at best will remain in a loop in the trap handler, if you cared to install one. the debugger has absolutely no idea about this, you have to manually halt execution and examine the PC.


regards,

Liviu


Liviu Ionescu

unread,
Nov 29, 2017, 7:37:57 PM11/29/17
to Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 02:15, Bruce Hoult <br...@hoult.org> wrote:
>
> ECALL and EBREAK are SYSTEM opcode (bits 0..6), with 000 in funct3. That's enough to uniquely identify them (at present) as nothing else uses that combination.
>
> Then rs1 and rdst are specified as both 00000 and imm12 as 000000000000 for ECALL and 000000000001 for EBREAK.
>
> Plenty of room to do something with a few more bits of imm12 while leaving rs1 and rdst nonzero for future expansion.

sure, if there are any free bits in the current encoding, it would be great to reserve some for an extended EBREAK that has an immediate value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably be enough).


regards,

Liviu

Jacob Bachmeyer

unread,
Nov 29, 2017, 10:25:43 PM11/29/17
to Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
There is no way for the debugger to intercept ECALL? That sounds like
an oversight in the HiFive chip, not an architectural shortcoming.


-- Jacob

Jacob Bachmeyer

unread,
Nov 29, 2017, 10:32:20 PM11/29/17
to Liviu Ionescu, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by
using rs1 to refer to an actual register holding a function code. The
immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be
intended as a 12-bit function code.

Using rs1 to select a register holding a parameter is a smaller change
to the ISA than giving magic semantics to mscratch. Further, EBREAK is
available in all modes, while mscratch is accessible only in M-mode.


-- Jacob

Liviu Ionescu

unread,
Nov 30, 2017, 2:18:44 AM11/30/17
to Jacob Bachmeyer, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 05:32, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> sure, if there are any free bits in the current encoding, it would be great to reserve some for an extended EBREAK that has an immediate value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably be enough).
>>
>
> We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by using rs1 to refer to an actual register holding a function code. The immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be intended as a 12-bit function code.

yes, for the future, if Andrew agrees, I welcome such a solution (the mscratch magic is not a solution, it is a workaround), but for now, as long as the specs do not mention it, I doubt it is a safe choice.

> Using rs1 to select a register holding a parameter is a smaller change to the ISA than giving magic semantics to mscratch. Further, EBREAK is available in all modes, while mscratch is accessible only in M-mode.

maybe I'm short sighted, but why would I use semihosting in any other mode but M?


Liviu

Andrew Waterman

unread,
Nov 30, 2017, 2:45:26 AM11/30/17
to Liviu Ionescu, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
On Wed, Nov 29, 2017 at 11:18 PM Liviu Ionescu <i...@livius.net> wrote:


> On 30 Nov 2017, at 05:32, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> sure, if there are any free bits in the current encoding, it would be great to reserve some for an extended EBREAK that has an immediate value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably be enough).
>>
>
> We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by using rs1 to refer to an actual register holding a function code.  The immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be intended as a 12-bit function code.

yes, for the future, if Andrew agrees, I welcome such a solution (the mscratch magic is not a solution, it is a workaround), but for now, as long as the specs do not mention it, I doubt it is a safe choice.

It’s not my decision; it’s just against my advisement.

Liviu Ionescu

unread,
Nov 30, 2017, 3:31:13 AM11/30/17
to Andrew Waterman, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 09:45, Andrew Waterman <and...@sifive.com> wrote:
>
>
> yes, for the future, if Andrew agrees, I welcome such a solution (the mscratch magic is not a solution, it is a workaround), but for now, as long as the specs do not mention it, I doubt it is a safe choice.
>
> It’s not my decision; it’s just against my advisement.

as Jacob noticed,

> mscratch is accessible only in M-mode

so, for other use cases (more elaborate than my M-mode semihosting), the mscratch workaround might not be available.

perhaps a short study on how other architectures use multiple BRK instructions would be useful. Cortex-M reserves an 8-bit value, but personally I used only the semihosting one (but I used it heavily). x86 also has an 8-bit value for INT, but they use it for slightly different purposes.

if anyone else has more experience with BRK use on other architectures, please share.


regards,

Liviu

Cesar Eduardo Barros

unread,
Nov 30, 2017, 6:05:17 AM11/30/17
to jcb6...@gmail.com, Liviu Ionescu, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Em 30-11-2017 01:32, Jacob Bachmeyer escreveu:
> Liviu Ionescu wrote:
>>> On 30 Nov 2017, at 02:15, Bruce Hoult <br...@hoult.org> wrote:
>>>
>>> ECALL and EBREAK are SYSTEM opcode (bits 0..6), with 000 in funct3.
>>> That's enough to uniquely identify them (at present) as nothing else
>>> uses that combination.
>>>
>>> Then rs1 and rdst are specified as both 00000 and imm12 as
>>> 000000000000 for ECALL and 000000000001 for EBREAK.
>>>
>>> Plenty of room to do something with a few more bits of imm12 while
>>> leaving rs1 and rdst nonzero for future expansion.
>>
>> sure, if there are any free bits in the current encoding, it would be
>> great to reserve some for an extended EBREAK that has an immediate
>> value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably
>> be enough).
>
> We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by
> using rs1 to refer to an actual register holding a function code.  The
> immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be
> intended as a 12-bit function code.

One of the coolest properties of the design of the base RISC-V ISA is
that the major opcode (bits 0-6) is enough to determine the instruction
format, and that it is enough to determine which registers should be
read (rs1, rs2, rs3) and written (rd), without decoding anything else.
Let's not lose that property. Immediate mode CSR instructions might do a
useless read of a register on a naive implementation, since they reuse
rs1 as an immediate, but that's not a problem; reusing rd for an
immediate, on the other hand, would require special-casing to suppress
the register write.

The SYSTEM opcode is I-type, and both ECALL and EBREAK have rd=x0; let's
keep it that way, unless the instruction is actually writing to a register.

>
> Using rs1 to select a register holding a parameter is a smaller change
> to the ISA than giving magic semantics to mscratch.  Further,  EBREAK is
> available in all modes, while mscratch is accessible only in M-mode.

Since SYSTEM is I-type, and a naive implementation might already be
reading from rs1, I agree that it's a small change to the ISA. Using rs1
as an immediate is also a small change to the ISA, given that there's
precedent with the CSR immediate instructions, and a naive
implementation can simply discard the eagerly read rs1 register.

You could even do both, using separate funct12 values for "EBREAK
reading from rs1" and "EBREAK treating rs1 as an immediate".

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Liviu Ionescu

unread,
Nov 30, 2017, 6:47:02 AM11/30/17
to Cesar Eduardo Barros, Jacob Bachmeyer, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 13:04, Cesar Eduardo Barros <ces...@cesarb.eti.br> wrote:
>
> One of the coolest properties of the design of the base RISC-V ISA is that the major opcode (bits 0-6) is enough to determine the instruction format, and that it is enough to determine which registers should be read (rs1, rs2, rs3) and written (rd), without decoding anything else. Let's not lose that property.

agree.

> You could even do both, using separate funct12 values for "EBREAK reading from rs1" and "EBREAK treating rs1 as an immediate".

great!

this means we have at least two options to choose from.

from my point of view, a short immediate would be enough. however, without being an ISA guru, using register fields for immediate values doesn't look very nice.

---

if I understand the encoding right, from the imm[11:0] field, only two values are used, 000000000000 for ECALL and 000000000001 for EBREAK.

perhaps we can use 00000000xxx1 for EBREAK, similar to what Jacob suggested, and extend the syntax of EBREAK to EBREAK #N.

if so, I would reserve EBREAK #7 for semihosting.

I guess this is a minor change to ISA, which does not break any compatibility, and would not consume too much of the encoding space. and I guess it'll not make any problems to compress, there are 5+1 bits for the imm field, more than enough.

and, not relying on mscratch, the extended EBREAKs are available in all modes, not only to M.


regards,

Liviu

Bruce Hoult

unread,
Nov 30, 2017, 7:46:56 AM11/30/17
to Liviu Ionescu, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Most architectures, including x86 and Thumb1, that have a literal argument to INT/SVC/BRK/whatever seem to top out at 8 bits.

But ARM has 24 bits! Aarch64 has individual instructions (SVC, MHV, SMC) to make system calls to interrupt levels 1/2/3 or to make calls to debugger at interrupt levels 1/2/3 (DCPS1, DCPS2, DCPS3) as well as BRK (self hosted) and HLT. All have a 16 bit immediate.

If you look at...

https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/unistd.h

.. standard Linux system call numbers currently go up to 292 plus some deprecated or nonstandard ones starting at 1024.

Almost every Linux architecture passes the syscall number in a register.

IBM s390(x) can pass system calls less than 256 in an immediate in the svc instruction, but if the number if greater than 256 them it falls back to using svc 0 with the syscall number in register r1.

ARM OABI uses an immediate in the SVC instruction which as noted has a 24 bit immediate in ARM32 mode but only 8 bit in Thumb1. Thumb2 doesn't add any larger SVC immediate and there is no register fallback. So OABI can't be used with modern Linux.

ARM EABI uses SVC 0 and passes the system call number in r7.

i386 uses INT 0 with the syscall number in eax.
x86_64 uses the syscall instruction with the syscall number in rax


I think it's nice to be able to use a single instruction with the syscall number in an immediate, but you need to make it big enough! Which means at least 9 bits today for standard Linux syscalls, or 11 bits for deprecated ones. Maybe 12 bits would be future-proof.

Copying the s390 fallback to a register would be a good idea. In that case an 8 bit immediate might be enough as the syscalls above 255 look like they're probably rarely used.

Being able to put the syscall number in a register as well as in the immediate would also vastly simplify code to dynamically call different system calls. For example wrappers for logging or whatever. If you can't use a register then you're faced with either a huge switch or else runtime code generation. Ugly. 


Liviu

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

kr...@berkeley.edu

unread,
Nov 30, 2017, 9:05:32 AM11/30/17
to Bruce Hoult, Liviu Ionescu, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee

Thanks Bruce,

I think you've just argued very convincingly for passing the syscall
number in a register.

Krste
| email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-
| dev/.
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/183F2952-832B-405A-BF45-
| 7D03570DD6D2%40livius.net.


| --
| You received this message because you are subscribed to the Google Groups
| "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email
| to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/
| .
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/
| CAMU%2BEkxbQ44BVHCW7w4_t-MKeRR5L%2Bha3Xo01X8EKC2ee2NFDg%40mail.gmail.com.

Liviu Ionescu

unread,
Nov 30, 2017, 9:43:42 AM11/30/17
to Bruce Hoult, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 14:46, Bruce Hoult <br...@hoult.org> wrote:
>
> Most architectures, including x86 and Thumb1, that have a literal argument to INT/SVC/BRK/whatever seem to top out at 8 bits. ... But ARM has 24 bits! ... I think it's nice to be able to use a single instruction with the syscall number in an immediate, but you need to make it big enough! ...

thank you, Bruce.

I was also used to see system calls with literal arguments, and I was surprised that the RISC-V ISA took the minimalistic approach and define a single ECALL and a single EBREAK, delegating everything to software (as if transistors were expensive and software cheap).

but, at least for ECALL, it is possible to pass the syscall number in a register, while for EBREAK it is not.

ECALLs with literal arguments might be useful to support multiple ABIs, or multiple ABI versions.

even for small embedded devices it is common to use different SVCs, for various internal RTOS functions (see ARM RTX, for example).


but not ECALLs are my main concern, I'd be happy to have a few more EBREAKs, to no longer need the mscratch workaround...


regards,

Liviu






Michael Chapman

unread,
Nov 30, 2017, 9:54:32 AM11/30/17
to isa...@groups.riscv.org
EBREAK is just a request from the software for the debugger to intervene.

If you sometimes want to pass parameters, then create a specific
function for that, place all the parameters in registers and put a lable
on the EBREAK instruction.  If you want to emulate system calls in the
debugger, then all you need is for the SYSCALL handler for ECALL to call
that function.

The debugger can examine the label on the EBREAK in the ELF executable
being debugged to determine the service requested be it a SW breakpoint
(no special label), SYSCALL emulation (special label such as
$dbg_syscall) or some other service. Indeed, if you really want 256
different kinds of EBREAKs, then just use 256 different special labels!

There is no point in encoding stuff in instructions in HW on the target
device which does not need to be there.

Megan Wachs

unread,
Nov 30, 2017, 10:02:08 AM11/30/17
to Michael Chapman, RISC-V ISA Dev
Right, now that I know that debuggers like OpenOCD can get this information from GDB, I'd prefer the ELF-labeling solution to changing the ISA or relying on values in mscratch (apologies to those who are on both threads, I think we have the same discussion going on in two places).

Megan

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.



--
Megan A. Wachs
Engineer | SiFive, Inc 
1875 South Grant Street
Suite 600
San Mateo, CA 94402

Liviu Ionescu

unread,
Nov 30, 2017, 10:51:10 AM11/30/17
to Bruce Hoult, Krste Asanovic, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 17:30, Bruce Hoult <br...@hoult.org> wrote:
>
> Passing in a register is definitely the simplest and most future-proof. ... So, it's pretty clear for syscalls.

how do you suggest to handle multiple ABIs?

with multiple ECALLs it is easy, you assign one for each ABI, and which may have completely different syscall numbers, registers, etc, and all work in parallel.

> But what about debugger traps? If you're going to insert breakpoints by replacing user instructions (rather than using hardware breakpoint address registers) then you'd like to replace as few bytes of them and as few instructions as possible.

if you really need to replace multiple user instructions with long magic sequences simply because the ISA does not support multiple BRKs, probably you selected the wrong ISA.

> Ideally you'd want a 16 bit debugger trap instruction with a few bits of literal, so you could replace a single RVC instruction. Or if the instruction is a bigger one, replace it with a 16 bit debugger trap instruction and one or more 16 bit NOPs.

that's correct.

> PLUS .. where do you find a register to put the constant for the debugger? ... Hence the suggestion to use odd-valued magic numbers in mscratch for this communication. But that also needs several instructions .. and looks like m mode only.

yes. also ugly and limited.

for the existing embedded RISC-V devices, which are M-mode only, that would not be a problem, but future devices might also take the ARM path, which is promoting dual mode embedded devices, with lots of security improvements (see large Cortex-M devices, especially M8), and in this case the mscratch workaround cannot be used. :-(


regards,

Liviu

Andrew Waterman

unread,
Nov 30, 2017, 12:22:14 PM11/30/17
to Liviu Ionescu, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
On Thu, Nov 30, 2017 at 7:51 AM Liviu Ionescu <i...@livius.net> wrote:


> On 30 Nov 2017, at 17:30, Bruce Hoult <br...@hoult.org> wrote:
>
> Passing in a register is definitely the simplest and most future-proof. ... So, it's pretty clear for syscalls.

how do you suggest to handle multiple ABIs?

with multiple ECALLs it is easy, you assign one for each ABI, and which may have completely different syscall numbers, registers, etc, and all work in parallel.

> But what about debugger traps? If you're going to insert breakpoints by replacing user instructions (rather than using hardware breakpoint address registers) then you'd like to replace as few bytes of them and as few instructions as possible.

if you really need to replace multiple user instructions with long magic sequences simply because the ISA does not support multiple BRKs, probably you selected the wrong ISA.

> Ideally you'd want a 16 bit debugger trap instruction with a few bits of literal, so you could replace a single RVC instruction. Or if the instruction is a bigger one, replace it with a 16 bit debugger trap instruction and one or more 16 bit NOPs.

that's correct.

> PLUS .. where do you find a register to put the constant for the debugger? ... Hence the suggestion to use odd-valued magic numbers in mscratch for this communication. But that also needs several instructions .. and looks like m mode only.

yes. also ugly and limited.

A fixed-size immediate is limiting. What we’ve done is to introduce a level of indirection to avoid this limitation, which at the same time avoids changing the ISA for a pretty dumb reason.



for the existing embedded RISC-V devices, which are M-mode only, that would not be a problem, but future devices might also take the ARM path, which is promoting dual mode embedded devices, with lots of security improvements (see large Cortex-M devices, especially M8), and in this case the mscratch workaround cannot be used. :-(

There will always be some register available to indicate the purpose of the request. The combination of the opcode, active ABI, and originating privilege mode will—by design—suffice to indicate where to look.




regards,

Liviu

Liviu Ionescu

unread,
Nov 30, 2017, 12:39:11 PM11/30/17
to Andrew Waterman, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 19:21, Andrew Waterman <and...@sifive.com> wrote:
>
> ... mscratch, M-mode only ... There will always be some register available to indicate the purpose of the request. The combination of the opcode, active ABI, and originating privilege mode will—by design—suffice to indicate where to look.

with the mscratch ruled out for being M-mode only, what other options do we have?

is there any other register, available in all modes, that is guaranteed to **not** have a certain range of values, so we can use it to temporarily store the magic?

otherwise the single EBRAK alone is not enough, it can be placed by the debugger as a breakpoint at any location.


regards,

Liviu

Andrew Waterman

unread,
Nov 30, 2017, 12:44:37 PM11/30/17
to Liviu Ionescu, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
I was talking about ECALL more than EBREAK.

U-mode doesn’t (shouldn’t) need to know about semihosting. There are
existing ABIs for U-mode to request services of greater privilege modes.




regards,

Liviu

Liviu Ionescu

unread,
Nov 30, 2017, 1:35:47 PM11/30/17
to Andrew Waterman, Alex Marshall, Bruce Hoult, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 19:44, Andrew Waterman <and...@sifive.com> wrote:
>
>
> U-mode doesn’t (shouldn’t) need to know about semihosting. There are
> existing ABIs for U-mode to request services of greater privilege modes.

Not necessarily.

I don't know the details of the U-mode in the context of embedded RISC-V devices, but in the Cortex-M world the so-called 'user mode' has nothing to do with a kernel, ABIs, or things like this, it only restricts access to some registers, and as such, breaking to the debugger shouldn't be a problem.

Actually, when asked about semihosting, most ARM users will reply that it is a method to display printf() messages. Yes, it is, but not only; semihosting has two separate calls to output bytes/strings on a 'debug channel', which indeed can be used for trace::printf() messages during regular debug sessions; in addition, it also has a larger set of calls for fully semihosted applications.

I see no reason why semihosting calls not be available in both machine and user mode, how else could I write unit tests that run in user mode?

Please note that during semihosting calls, the application is simply halted, regardless of its mode, since the core executes a regular BRK.


Regards,

Liviu





Michael Chapman

unread,
Nov 30, 2017, 1:43:30 PM11/30/17
to isa...@groups.riscv.org

Cores which support compressed mode already have two EBREAK
instructions. The debugger will used the compressed one for SW
breakpoint. The other one can be used for semihosting and any other
interaction required with the debugger.

For cores without compressed mode, OpenOCD will know whether it placed a
EBREAK there or not. If it placed it there then it is a SW breakpoint.
Otherwise it is a semihosting call to the debugger. Also the EBREAK
could be followed by a word which is an illegal instruction pattern
which is extremely unlike to occur in any application - i.e. would not
ever be the next instruction after a SW breakpoint to make this distinction.


On 30-Nov-17 18:39, Liviu Ionescu wrote:
> ...

Tommy Thorn

unread,
Nov 30, 2017, 2:47:27 PM11/30/17
to Liviu Ionescu, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Just catching up to an old discussion, but it seems to me that there's another
option that is simpler: put the magic number is the instruction word following
the EBREAK, optionally we should use an ADDI x0, x0, MAGIC if we want
to keep it sane.

The code/debugger that handles the EBREAK obviously have the address of
the EBREAK instruction and can inspect the code stream.

Tommy


On Wed, Nov 29, 2017 at 12:53 AM, Liviu Ionescu <i...@livius.net> wrote:


> On 29 Nov 2017, at 02:47, Andrew Waterman <and...@sifive.com> wrote:
>
> I don’t support adding additional EBREAKs, as placing odd-valued magic numbers in mscratch seems sufficient. There is an additional DBI (debug binary interface) constraint on mscratch, that it must be initialized on boot and in normal operation only hold aligned pointers or 0. But that is vastly preferable to an ISA modification.

yes, this constraint of holding aligned pointers and the set of allowed magic numbers must be clearly documented, with the additional consequence that mscratch must be cleared at reset.

I foresee a small problem with using the mscratch for magic numbers, the need to place the code in an interrupt critical section, i.e. disable interrupts, swap mscratch, ebreak, restore mscratch, enable interrupts. probably no longer reasonable for an inline function.

otherwise we risk entering an interrupt while mscratch does not hold an aligned pointer.

things get pretty messy, only to compensate in software what could have been an EBREAK with at least a 2-4 bits immediate value field. :-(



regards,

Liviu

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Nov 30, 2017, 2:54:16 PM11/30/17
to Tommy Thorn, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 30 Nov 2017, at 21:47, Tommy Thorn <tommy...@esperantotech.com> wrote:
>
> ... put the magic number is the instruction word following
> the EBREAK, optionally we should use an ADDI x0, x0, MAGIC if we want
> to keep it sane.

if `ADDI x0, x0, MAGIC` is a legal instruction, it can appear in a program, and if the debugger places a breakpoint to the instruction just before it, it'll look like a semihosting call.

we need a solution that, in a legal program, **guarantees** that it only matches semihosting calls, and never matches breakpoints placed by the debugger.


regards,

Liviu




Michael Chapman

unread,
Nov 30, 2017, 3:17:14 PM11/30/17
to isa...@groups.riscv.org

Breakpoints are placed by the debugger/openOCD. They know both where
they are. Therefore an EBREAK at some other location is a semihosting call.

Michael Chapman

unread,
Nov 30, 2017, 3:31:30 PM11/30/17
to Liviu Ionescu, isa...@groups.riscv.org

You should use a semihosting call (to print out the PC and a message)
for that!


On 30-Nov-17 21:28, Liviu Ionescu wrote:
>
>> On 30 Nov 2017, at 22:20, Michael Chapman <michael.c...@gmail.com> wrote:
>>
>>
>> Breakpoints are placed by the debugger/openOCD. They know both where
>> they are. Therefore an EBREAK at some other location is a semihosting call.
> Not necessarily, for Debug build configurations I manually add BREAK instructions instead (actually before) infinite loops (like in asserts, unused trap handlers, or should-not-reach-this places); it is much more convenient to have the debugger halt and immediately show you the location than entering an infinite loop, waiting for a while, start wondering what happened and finally manually halt execution.
>
> Regards,
>
> Liviu
>
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>

Jacob Bachmeyer

unread,
Nov 30, 2017, 11:45:29 PM11/30/17
to Liviu Ionescu, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Liviu Ionescu wrote:
>> On 30 Nov 2017, at 05:32, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>>
>>
>>> sure, if there are any free bits in the current encoding, it would be great to reserve some for an extended EBREAK that has an immediate value (ARM Cortex-M uses 8 bits, but I guess 2-3 bits would probably be enough).
>>>
>>>
>> We can easily get 10 bits using the rs1 and rd fields, or XLEN bits by using rs1 to refer to an actual register holding a function code. The immediate field in SYSTEM/PRIV is listed as "funct12", so it seems to be intended as a 12-bit function code.
>>
>
> yes, for the future, if Andrew agrees, I welcome such a solution (the mscratch magic is not a solution, it is a workaround), but for now, as long as the specs do not mention it, I doubt it is a safe choice.
>

To clarify, the 12-bit "funct12" code is part of the opcode -- one of
those functions is EBREAK. That field is *not* available for EBREAK
<imm>. Using the rs1 field in EBREAK to provide "EBREAK <reg>" and a
semihost call number in that register is probably a better option.

>> Using rs1 to select a register holding a parameter is a smaller change to the ISA than giving magic semantics to mscratch. Further, EBREAK is available in all modes, while mscratch is accessible only in M-mode.
>>
>
> maybe I'm short sighted, but why would I use semihosting in any other mode but M?
>

Exactly the reason I do not like the mscratch workaround -- EBREAK is
available in all modes, but you are suggesting a use specific to
M-mode. Interestingly, ECALL is also available in all modes and is
expressly intended for performing calls to an environment. Why not use
ECALL for your semihosting platform? (If nothing else, your M-mode trap
handler can recognize an M-mode ECALL and execute an EBREAK at a
specific, known address. The semihosting interface recognizes an EBREAK
at *that* address as a semihost call, handles it, and resumes execution,
while passing other EBREAKs to the user's debugger. This is really a
workaround for insufficient debugging support -- the debugger should be
able to intercept an M-mode ECALL directly.)



-- Jacob

Jacob Bachmeyer

unread,
Nov 30, 2017, 11:54:03 PM11/30/17
to Bruce Hoult, Krste Asanovic, Liviu Ionescu, Andrew Waterman, Alex Marshall, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Bruce Hoult wrote:
> But what about debugger traps? If you're going to insert breakpoints
> by replacing user instructions (rather than using hardware breakpoint
> address registers) then you'd like to replace as few bytes of them and
> as few instructions as possible. And you still (as Liviu argues) may
> want at least a few bits of argument/literal to be passed to the debugger.

The debugger also has a trick not available for syscalls -- the debugger
gets the address of the EBREAK that was executed and the debugger knows
*why* it put an EBREAK at *that* address.

> Ideally you'd want a 16 bit debugger trap instruction with a few bits
> of literal, so you could replace a single RVC instruction. Or if the
> instruction is a bigger one, replace it with a 16 bit debugger trap
> instruction and one or more 16 bit NOPs.

RVC currently has C.EBREAK.

> Having to put in 8 bytes of replacement instructions to call the
> debugger is ugly. One 4 byte instruction to load a code into a
> register, and another to actually trap. That's potentially up to four
> user instructions being lifted out to be emulated or whatever.
>
> PLUS .. where do you find a register to put the constant for the
> debugger? With syscall at least the compiler knows about it and you
> can register-allocate around it. But with a debugger breakpoint you
> need to work with arbitrary code, which may well be already using the
> register you want to pass the constant in.
>
> Hence the suggestion to use odd-valued magic numbers in mscratch for
> this communication. But that also needs several instructions .. and
> looks like m mode only.

This is why I do not like using EBREAK for this: EBREAK is supposed to
be a debugging breakpoint, while ECALL is supposed to be an environment
call. All of these problems are stemming from trying to abuse EBREAK
for ECALL's purpose.



-- Jacob

Jacob Bachmeyer

unread,
Dec 1, 2017, 12:02:02 AM12/1/17
to Liviu Ionescu, Cesar Eduardo Barros, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Liviu Ionescu wrote:
> if I understand the encoding right, from the imm[11:0] field, only two
> values are used, 000000000000 for ECALL and 000000000001 for EBREAK.

This is true for the user ISA; the privileged ISA defines xRET and WFI
in this space as well, using a few more values. (Not actually a problem
for EBREAK <smallimm>, but why not just read rs1, or use the rs1 field
itself as a small immediate?)


-- Jacob

Liviu Ionescu

unread,
Dec 1, 2017, 2:12:53 AM12/1/17
to jcb6...@gmail.com, Bruce Hoult, Krste Asanovic, Andrew Waterman, Alex Marshall, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 1 Dec 2017, at 06:53, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
> This is why I do not like using EBREAK for this: EBREAK is supposed to be a debugging breakpoint, while ECALL is supposed to be an environment call. All of these problems are stemming from trying to abuse EBREAK for ECALL's purpose.

nope. this is exactly how semihosting was designed to work, by intentionally triggering breakpoint events.

> Exactly the reason I do not like the mscratch workaround -- EBREAK is available in all modes, but you are suggesting a use specific to M-mode.

that was a mistake, sorry, at a second thought I reconsidered, the mscratch workaround is not usable, some semihosting calls, like SYS_WRITEC and SYS_WRITE0, should be available in all modes, as an independent `trace::printf()` output stream.

then I realised that in the future there may be embedded devices with both U and M modes, exactly like current Cortex-M devices, and, I'll need a method to write tests for applications running on them, so it became obvious that semihosting must be available for these dual-mode configurations too.

so the mscratch proposal was dismissed.

the current proposal is to use a sequence including guaranteed illegal instructions, like 'uncompressed EBREAK, uncompressed all zero, uncompressed all one`, that should be safe enough for the 'debugger' to identify semihosting calls.

> This is true for the user ISA; the privileged ISA defines xRET and WFI in this space as well, using a few more values. (Not actually a problem for EBREAK <smallimm>, but why not just read rs1, or use the rs1 field itself as a small immediate?)

`EBREAK <reg>` is also ok, if available. and there is no need to actually pass any value in the register, the call can be identified by the register number alone. (for example `EBREAK x7` == 'semihosting').


regards,

Liviu




Bruce Hoult

unread,
Dec 1, 2017, 2:45:50 AM12/1/17
to Liviu Ionescu, Jacob Bachmeyer, Krste Asanovic, Andrew Waterman, Alex Marshall, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
This seems silly. If you're going to put illegal instructions in the program, why not just put an illegal instruction that's almost exactly like an EBREAK except with some of the fields that are supposed to be zero non-zero? RS1 if you want. Or other bits in the imm12.

That will trap with a cause of "illegal instruction", you can examine it, find it's an enhanced EBREAK, and do what you want with it.

You may need to revise that in the future if more instructions are defined using those encodings, but on any given CPU you (and the debugger) know what instructions you have.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Dec 1, 2017, 3:31:37 AM12/1/17
to Bruce Hoult, Jacob Bachmeyer, Krste Asanovic, Andrew Waterman, Alex Marshall, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 1 Dec 2017, at 09:45, Bruce Hoult <br...@hoult.org> wrote:
>
> ... why not just put an illegal instruction that's almost exactly like an EBREAK except with some of the fields that are supposed to be zero non-zero? RS1 if you want. Or other bits in the imm12.
> That will trap with a cause of "illegal instruction", you can examine it, find it's an enhanced EBREAK, and do what you want with it.

'do what you want with it'? I don't want/need to do anything, I don't want to trap any exception, and I don't want the program running on the device to be required to do anything, all I need is a standard method to break to the debugger, and on RISC-V the one and only such method is an EBREAK.

things are very simple, I don't know how to better explain them to you. :-(

placing illegal instructions after EBREAK is probably not the most elegant solution, but so far was the safest method. (mscratch is M-mode only, getting the runtime address of a symbol requires a lot of effort in all semihosting enabled debuggers).

one disadvantage of placing illegal instructions after EBREAK is that when running such code on a non-semihosting enabled debugger, if the debugger is instructed to retrun normally after EBREAK, the core will crash with illegal instructions, but generally trying to run semihosting applications in a non-semihosting enabled debugger makes little sense.

---

I think it becomes more and more obvious that, in order to avoid such ugly hacks, we need a proper ISA solution.


taking a look at the Table 6.1 in the privileged ISA manual, to be in line with the other instructions, it seems we can have 5 bits of immediate value, similar to the rs2 field in the SFENCE instruction.

there are many free encodings left, the next one would be:

0001010 xxxxx 00000 000 00000 1110011

or we can use the current EBREAK encoding, but with a non zero value for rs1:

0000000 00001 rrrrr 000 00000 1110011

Andrew, any thoughts on this?


regards,

Liviu

Michael Clark

unread,
Dec 1, 2017, 4:42:20 AM12/1/17
to Liviu Ionescu, Bruce Hoult, Krste Asanovic, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee

> On 1/12/2017, at 4:51 AM, Liviu Ionescu <i...@livius.net> wrote:
>
> how do you suggest to handle multiple ABIs?

The current wisdom is that the success of an ISA is inversely proportional to the number of ABIs; as one progressively becomes weighed down by supporting all of the ABI variants in compilers and associated libraries e.g dealing with all of the brokenness and incompatibility that multiple ABI variants create.

Successful ISAs have had a reducing number of ABIs as their success has increased (armv8, amd64). Failed ISAs have on the other hand have had an increasing number of ABIs (MIPS, an old ISA is still making ABI breaks, most recently their NaN encoding).

If one has a 10 trillion yen market cap one, then one might have the resources to support several ABIs however its likely that arm is now only supporting a smaller number of ABIs compared to the early days, a small amount of armv6, with mostly armv7 and armv8 little endian. In fact the number of supported ABIs has reduced in recent times. Hardfloat is standard on armv8 (2011).

arm on servers is exclusively armv8.x-a (2011), little endian, hard float.

Apple for example has stopped providing OS updates for armv7 devices. iOS 11 only runs on armv8. In armv8, all of the main armv7 extensions are mandatory (Thumb-2, NEON, VFPv4).

Arm embedded development today is most likely to be at least armv7. There might still be some soft float with no NEON or VFP in highly embedded applications.

x86-64 has 2 major ABIs, Windows PE/COFF and System V on Linux/*BSD. The BSDs share an ABI with Linux (calling conventions) but have different syscalls and syscall numbers. The BSDs are of course a little less popular and more fragmented when it comes to ABIs when compared to Linux variants which are all binary compatible (in large part due to the LSB ISO standardisation effort which covers symbols and symbol versions in all of the core libraries).

Power has moved to a 64-Bit ELF V2 ABI which is little endian. The V1 ABI and big endian is now legacy. SPARC as we know is disappearing as of this year.

Apple uses the upper bits in the syscall number register on both x86 and arm to control dispatch to either BSD or Mach kernels. Only the lower 12 bits are used for the syscall number. Same ABI (calling convention), different syscall numbers. The Darwin x86-64 syscall ABI is very similar to System V with the only difference being that condition codes are used to indicate error returns vs negative return values in %rax. It is essentially System V syscall ABI with the addition of condition codes.

I like the idea of an mtvec ECALL handler that calls EBREAK... ILLEGAL. It however can be added later to re-introduce binary compact. For U500 series this would allow semi-hosting for existing Linux binaries. I don’t think it would be too expensive in comparison to the cycle count of the syscall itself. With this approach, someone could send you a binary without source and you could use it in a semi-hosting setup.

I’d suggest being parsimonious when considering adding additional ABI variants :-D

Cesar Eduardo Barros

unread,
Dec 1, 2017, 4:48:20 AM12/1/17
to Liviu Ionescu, Bruce Hoult, Jacob Bachmeyer, Krste Asanovic, Andrew Waterman, Alex Marshall, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Em 01-12-2017 06:31, Liviu Ionescu escreveu:
>
>
>> On 1 Dec 2017, at 09:45, Bruce Hoult <br...@hoult.org> wrote:
>>
>> ... why not just put an illegal instruction that's almost exactly like an EBREAK except with some of the fields that are supposed to be zero non-zero? RS1 if you want. Or other bits in the imm12.
>> That will trap with a cause of "illegal instruction", you can examine it, find it's an enhanced EBREAK, and do what you want with it.
>
> placing illegal instructions after EBREAK is probably not the most elegant solution, but so far was the safest method. (mscratch is M-mode only, getting the runtime address of a symbol requires a lot of effort in all semihosting enabled debuggers).

What Bruce is proposing is not putting an invalid instruction after
EBREAK, but replacing the EBREAK itself with an invalid instruction. The
debugger can trap invalid instructions as well as it can trap breakpoints.

The encoding of the invalid instruction could even be the suggested
"EBREAK with rs1 set".

Michael Clark

unread,
Dec 1, 2017, 5:42:29 AM12/1/17
to jcb6...@gmail.com, Liviu Ionescu, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
I like this idea but i’m not sure if the current version of the debug spec can cause debug traps on ECALL like it can with EBREAK. It would be a nice addition, while not an ISA change, it might be a Debug spec change which is also problematic for existing silicon.

It can otherwise be added with a small amount of complexity in OpenOCD by trap patching mtvec, assuming the debugger has some RAM for the trap patch code containing EBREAK, e.g. so it can continue to forward other traps to an existing trap handler (if present). The trap interceptor could also be put in boot section of the XIP flash on the HiFive1 where there is plenty of room.

Trap patching was a common way to intercept system calls on the M68K with MacOS <= 9

Liviu Ionescu

unread,
Dec 1, 2017, 10:39:30 AM12/1/17
to jcb6...@gmail.com, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 1 Dec 2017, at 06:45, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
> ... execute an EBREAK at a specific, known address. The semihosting interface recognizes an EBREAK at *that* address as a semihost call, handles it, and resumes execution,

I'm very curious to learn how the semihosting interface knows what this address is. (substitute 'semihosting interface' by OpenOCD, QEMU, J-Link GDB Server).


Liviu

Michael Clark

unread,
Dec 1, 2017, 1:31:09 PM12/1/17
to Bruce Hoult, Liviu Ionescu, Jacob Bachmeyer, Krste Asanovic, Andrew Waterman, Alex Marshall, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
There is a difference between the canonical illegal instructions or some other marker sequence after the ebreak vs as yet unused opcode space. Using unused opcode space is essentially a de facto change in the ISA.

I think ebreak with sentinel instructions might be the way to go, and potentially we can put an ecall wrapper in XIP non volatile memory at some point in the future. ABI compat is not as big of an issue on the FE310 as it is on the U54MC. One could perhaps sidestep calling this a syscall ABI, rather it is a semi-hosting call ABI. If one wants to deploy binaries conforming to a well known ABI then one can install a trap handling stub to dispatch to a host, however we can add this later.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Bruce Hoult

unread,
Dec 1, 2017, 1:55:12 PM12/1/17
to Krste Asanovic, Liviu Ionescu, Andrew Waterman, Alex Marshall, Drew Barbier, Jacob Bachmeyer, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Yes, but see the at the end of the post for debugger calls.

Passing in a register is definitely the simplest and most future-proof.

It's important that the register is not one normally used for argument passing, so that C stubs wrapping system calls can just stuff the system call number into that register without shuflfing the arguments. Using the last argument passing register can be ok if system calls don't have many arguments (I seem to recall they max out at 4?)

I don't much like every call site needing an extra instruction to load the syscall number into the register. But .. in most toolchains there is only one instance of each syscall and user code calls a wrapper rather than it being inlined.

On the receiving end in the kernel a register is definitely simpler. If the system call number is encoded in a literal in the ECALL instruction then either:

1) hardware needs to extract the syscall number into a CSR, or

2) hardware needs to save the faulting opcode into a CSR and then kernel code needs to mask out the syscall number, or

3) kernel code needs to get the address of the faulting instruction from a CSR (NOT the return address! That would be OK if instructions are fixed length, but not with C extension or future longer instructions). And then the kernel code needs to load the instruction from user space and mask out the syscall number.

These are all very ugly, needing more instructions than if the caller simply put the syscall number into a register, and possibly also requiring extra hardware and CSRs.

So, it's pretty clear for syscalls.


But what about debugger traps? If you're going to insert breakpoints by replacing user instructions (rather than using hardware breakpoint address registers) then you'd like to replace as few bytes of them and as few instructions as possible. And you still (as Liviu argues) may want at least a few bits of argument/literal to be passed to the debugger.

Ideally you'd want a 16 bit debugger trap instruction with a few bits of literal, so you could replace a single RVC instruction. Or if the instruction is a bigger one, replace it with a 16 bit debugger trap instruction and one or more 16 bit NOPs.

Having to put in 8 bytes of replacement instructions to call the debugger is ugly. One 4 byte instruction to load a code into a register, and another to actually trap. That's potentially up to four user instructions being lifted out to be emulated or whatever.

PLUS .. where do you find a register to put the constant for the debugger? With syscall at least the compiler knows about it and you can register-allocate around it. But with a debugger breakpoint you need to work with arbitrary code, which may well be already using the register you want to pass the constant in.

Hence the suggestion to use odd-valued magic numbers in mscratch for this communication. But that also needs several instructions .. and looks like m mode only.

|     To post to this group, send email to isa...@groups.riscv.org.
|     Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-
|     dev/.
|     To view this discussion on the web visit https://groups.google.com/a/
|     groups.riscv.org/d/msgid/isa-dev/183F2952-832B-405A-BF45-
|     7D03570DD6D2%40livius.net.


| --
| You received this message because you are subscribed to the Google Groups
| "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email

| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/
| .
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/
| CAMU%2BEkxbQ44BVHCW7w4_t-MKeRR5L%2Bha3Xo01X8EKC2ee2NFDg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Dec 1, 2017, 1:55:12 PM12/1/17
to Michael Chapman, isa...@groups.riscv.org


> On 30 Nov 2017, at 22:20, Michael Chapman <michael.c...@gmail.com> wrote:
>
>
> Breakpoints are placed by the debugger/openOCD. They know both where
> they are. Therefore an EBREAK at some other location is a semihosting call.

Liviu Ionescu

unread,
Dec 1, 2017, 1:55:12 PM12/1/17
to Michael Chapman, isa...@groups.riscv.org


> On 30 Nov 2017, at 22:35, Michael Chapman <michael.c...@gmail.com> wrote:
>
>
> You should use a semihosting call (to print out the PC and a message)
> for that!

:-)

I do, but support for `trace::printf()` is optional, i.e. I may have a `trace::printf()` in my code, but the user may decide not to implement the method `trace::write()`, and the default weak `trace::write()` does nothing, so no messages are shown.

and even if the message is shown, it is still convenient to have the debugger halt.

so my code usually looks like:

```c
void do_something()
{
should_not_return();

trace::printf("should_not_return() actually returned\n");

#if defined(DEBUG)
os::arch::brk();
#endif
while(true) {
os::arch::wfi();
}
}
```

and, BTW, on Cortex-M devices, semihosting is not the only trace channel, in my code I provide implementations for `trace::write()` to forward the trace stream to:

- semihosting
- ARM ITM (the recommended ARM solution; when using common J-Link probes, speeds up to 6 Mbps are common)
- SEGGER RTT, a proprietary SEGGER solution, significantly much faster (tens of Mbps)

currently, on the available RISC-V test boards, the only available trace channel is an FTDI UART, at the fantastic speed of 115200 bps. not exactly encouraging...


regards,

Liviu

Jacob Bachmeyer

unread,
Dec 1, 2017, 9:34:14 PM12/1/17
to Liviu Ionescu, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee
Configuration? Reading it from ELF headers or symbol table? I am
assuming that tethered semihosting involves *loading* the program from
the host, so the loader can simply tell the semihosting debugger that
the semihost EBREAK is at address X. (There is not much reason to store
the program in NVM if it cannot run without a host interface connected.)

Another option that others have mentioned is to use EBREAK followed by
32-bit zero followed by a semihost call packet (format determined by
semihosting tools in use) in the instruction stream. 32-bit zero is
guaranteed to be an invalid instruction in RISC-V, while all ones is
actually an infinitely long instruction, so 32 ones are only guaranteed
invalid on implementations that do not have instructions longer than 32
bits.


-- Jacob

Liviu Ionescu

unread,
Dec 2, 2017, 4:19:33 AM12/2/17
to jcb6...@gmail.com, Bruce Hoult, Alex Marshall, Andrew Waterman, Drew Barbier, Krste Asanovic, Megan Wachs, RISC-V ISA Dev, Yunsup Lee


> On 2 Dec 2017, at 04:34, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
> Reading it from ELF headers or symbol table?

ah, ok. please let me know when existing tools (especially commercial ones, like J-Link) will implement this.

> EBREAK followed by 32-bit zero followed by a semihost call packet

yes, this is the final workaround we decided to use, to compensate for the lack of multiple EBREAK instructions.


regards,

Liviu

Rishiyur Nikhil

unread,
Dec 2, 2017, 1:30:29 PM12/2/17
to Liviu Ionescu, RISC-V ISA Dev
>  Jacob said:
>    This is why I do not like using EBREAK for this: EBREAK is
>    supposed to be a debugging breakpoint, while ECALL is supposed to
>    be an environment call.  All of these problems are stemming from
>    trying to abuse EBREAK for ECALL's purpose.

>  Liviu said:
>    nope. this is exactly how semihosting was designed to work, by
>    intentionally triggering breakpoint events.

Liviu, I don't follow this.  I agree with Jacob's comment that the
functionality certainly sounds like special kind of environment call,
so why this certainty that it must be a break?  Note: I'm a newbie to
the concept of semi-hosting, so perhaps I'm missing some historical
development?

Nikhil


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Andrew Waterman

unread,
Dec 2, 2017, 1:53:08 PM12/2/17
to Rishiyur Nikhil, Liviu Ionescu, RISC-V ISA Dev
The main reason is that semihosting calls are usually serviced by the debugger, and EBREAK is an existing mechanism that signals the debugger. By contrast, ECALL does not signal the debugger; it just generates an exception locally.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Liviu Ionescu

unread,
Dec 2, 2017, 4:36:44 PM12/2/17
to Andrew Waterman, Rishiyur Nikhil, RISC-V ISA Dev


> On 2 Dec 2017, at 20:52, Andrew Waterman <and...@sifive.com> wrote:
>
> The main reason is that semihosting calls are usually serviced by the debugger, and EBREAK is an existing mechanism that signals the debugger. By contrast, ECALL does not signal the debugger; it just generates an exception locally.

well said.

perhaps this will help all understand how semihosting works, and why it mandates an EBREAK, since my explanations were constantly disputed.

thank you Andrew.

Liviu

Jacob Bachmeyer

unread,
Dec 2, 2017, 4:40:21 PM12/2/17
to Andrew Waterman, Rishiyur Nikhil, Liviu Ionescu, RISC-V ISA Dev
Andrew Waterman wrote:
> The main reason is that semihosting calls are usually serviced by the
> debugger, and EBREAK is an existing mechanism that signals the
> debugger. By contrast, ECALL does not signal the debugger; it just
> generates an exception locally.

But in full environments (think "workstation") EBREAK must also generate
an exception locally, since the debugger is another process and
debugging is actually mediated through the supervisor. (Or through a
hypervisor or even monitor when debugging a supervisor.)


-- Jacob

Liviu Ionescu

unread,
Dec 2, 2017, 6:09:47 PM12/2/17