debug mechanism comparison

146 views
Skip to first unread message

Tim Newsome

unread,
Dec 6, 2016, 4:27:30 PM12/6/16
to RISC-V Debug Group
I'm hoping that even if we can't agree on how exactly to do debug, we can agree on the differences in the proposals. I've been working on a Google doc that you might have seen. I'd like other people to contribute, but just opening it up to everybody might result in a mess. So I've moved it to github where we can track what changes were made, and by who.


I'd love to hear thoughts, pull requests, and other comments.

I'd especially like a better summary of what I've labeled the Direct Design. (I'm also happy to change names to whatever makes sense to people, but as was brought up in the meeting lightweight/full-featured didn't really highlight the differences.)

Tim

Alex Bradbury

unread,
Dec 7, 2016, 5:51:51 AM12/7/16
to Tim Newsome, RISC-V Debug Group
On 6 December 2016 at 21:27, Tim Newsome <t...@sifive.com> wrote:
> I'm hoping that even if we can't agree on how exactly to do debug, we can
> agree on the differences in the proposals. I've been working on a Google doc
> that you might have seen. I'd like other people to contribute, but just
> opening it up to everybody might result in a mess. So I've moved it to
> github where we can track what changes were made, and by who.
>
> The new document lives at
> https://sifive.github.io/debug-mechanism-comparison/
>
> I'd love to hear thoughts, pull requests, and other comments.

Hi Tim, many thanks for writing this up. A few comments/questions:

* Could you elaborate on your comment about how "making the debugger
perform well will require some careful coding" for your spec. What
does debugger 'performance' mean in this context?

* Would it be worth incorporating some basic definitions in to the doc
much like you had in your slides. In addition to the definitions in
your slides, I'd perhaps add 'debug monitor' to the mix

* Does anybody understand the requirements for 'real-time' or low
latency debug? It seems in some use cases, as low latency as possible
is desired. What sort of latency do we see on the SiFive debug
implementation?

* I'm not quite sure where it fits in the document, but it would be
good to bring in clear descriptions of some of the key arguments
presented at the foundation meeting and on the mailing list. Maybe it
integrates somewhere in the comparison, or else a section at the end.
e.g.
* The Direct specification allows a higher degree of abstraction. An
implementer could provide that memory-mapped interface and implement
it by injecting instructions into the core. The counter-argument is
that by specifying the implementation approach, the Instruction spec
allows a more efficient design.

* It strikes me that the Direct approach is probably more amenable to
allowing configuration of a debugger via devicetree or config-string
(or maybe. e.g. suppose you have a design with a vector register file,
describing how to access registers a memory map is something we're
very used to doing and have various schemas to allow. The Instruction
approach would require some schema that allowed instruction templates
to be specified. Doable, but less natural than a simple memory map

If any of the above sounds like the kind of direction you'd like this
document to go in, I'd be happy to contribute pull requests but I
probably won't get a chance until next week (technically I'm on
holiday and probably shouldn't be writing a long email about debug).

Best,

Alex

Tim Newsome

unread,
Dec 7, 2016, 10:38:02 AM12/7/16
to Alex Bradbury, RISC-V Debug Group
Hi Alex. You raise some good points. I'll reply to your questions here, and then see about updating the document.

On Wed, Dec 7, 2016 at 2:51 AM, Alex Bradbury <a...@asbradbury.org> wrote:
On 6 December 2016 at 21:27, Tim Newsome <t...@sifive.com> wrote:
> I'm hoping that even if we can't agree on how exactly to do debug, we can
> agree on the differences in the proposals. I've been working on a Google doc
> that you might have seen. I'd like other people to contribute, but just
> opening it up to everybody might result in a mess. So I've moved it to
> github where we can track what changes were made, and by who.
>
> The new document lives at
> https://sifive.github.io/debug-mechanism-comparison/
>
> I'd love to hear thoughts, pull requests, and other comments.

Hi Tim, many thanks for writing this up. A few comments/questions:

* Could you elaborate on your comment about how "making the debugger
perform well will require some careful coding" for your spec. What
does debugger 'performance' mean in this context?

Using the Instruction approach requires more thought than using the Direct approach. You have to consider which instructions, and using the Instruction Buffer you have to figure out reasonable instruction sequences. The simple implementation involves one round trip to the debug device (USB-JTAG adapter) per scan, which will be slow. You'll be forced to combine multiple JTAG scans into a single request to the debug device.
In the Direct approach you don't have to worry about this nearly as much.

Does that make sense? 

* Would it be worth incorporating some basic definitions in to the doc
much like you had in your slides. In addition to the definitions in
your slides, I'd perhaps add 'debug monitor' to the mix

Good idea.
Debug monitor: A program running on a target to allow remote debugging. This is typically running as a separate process on the target, in the kernel, or simply an interrupt handler.

* Does anybody understand the requirements for 'real-time' or low
latency debug? It seems in some use cases, as low latency as possible
is desired. What sort of latency do we see on the SiFive debug
implementation?

I haven't heard any specific requirements. I suspect they run the gamut from "no impact at all" to "interrupting for 50ms is OK" but I don't know that. In the SiFive debug implementation (0.11 of the spec), the processor will be "halted" for about 100 instructions. They'll be fetched from Debug ROM/RAM, and I'm not sure how slow accessing them is. I haven't measured the impact, or even implemented it since gdb doesn't support the concept of accessing a target while it's running.

* I'm not quite sure where it fits in the document, but it would be
good to bring in clear descriptions of some of the key arguments
presented at the foundation meeting and on the mailing list. Maybe it
integrates somewhere in the comparison, or else a section at the end.
e.g.
  * The Direct specification allows a higher degree of abstraction. An
implementer could provide that memory-mapped interface and implement
it by injecting instructions into the core. The counter-argument is
that by specifying the implementation approach, the Instruction spec
allows a more efficient design.

I think the way to add that is to add eg. "degree of abstraction" as a metric, and then compare.
I'll do that.

* It strikes me that the Direct approach is probably more amenable to
allowing configuration of a debugger via devicetree or config-string
(or maybe. e.g. suppose you have a design with a vector register file,
describing how to access registers a memory map is something we're
very used to doing and have various schemas to allow. The Instruction
approach would require some schema that allowed instruction templates
to be specified. Doable, but less natural than a simple memory map

I'll agree that specifying addresses for memory-mapped registers is more common, but assuming a debugger knows what to look for specifying what kind of instruction to use to access the DM is just as easy. 7.10.3 in my proposed spec tries to address this.
 
If any of the above sounds like the kind of direction you'd like this
document to go in, I'd be happy to contribute pull requests but I
probably won't get a chance until next week (technically I'm on
holiday and probably shouldn't be writing a long email about debug).

I would like the document to concisely reflect the important differences between the proposals, and also contain the background required to understand it. The hope is that somebody who hasn't been following the discussion can read it in relatively little time, and have enough information to form their own opinion on what they prefer.

Tim 

Tim Newsome

unread,
Dec 7, 2016, 4:23:05 PM12/7/16
to Alex Bradbury, RISC-V Debug Group
On Wed, Dec 7, 2016 at 7:38 AM, Tim Newsome <t...@sifive.com> wrote:

* Does anybody understand the requirements for 'real-time' or low
latency debug? It seems in some use cases, as low latency as possible
is desired. What sort of latency do we see on the SiFive debug
implementation?

I haven't heard any specific requirements. I suspect they run the gamut from "no impact at all" to "interrupting for 50ms is OK" but I don't know that. In the SiFive debug implementation (0.11 of the spec), the processor will be "halted" for about 100 instructions. They'll be fetched from Debug ROM/RAM, and I'm not sure how slow accessing them is. I haven't measured the impact, or even implemented it since gdb doesn't support the concept of accessing a target while it's running.

A kind of similar point to yours was brought up in the call, asking about how we define "ease of verification." The real issue is that the "metrics" aren't very precise in any case, but they are aspects to consider when choosing a debug spec. We can compare what each spec offers even when we can't put a number on a given "metric" for each one.
 

* I'm not quite sure where it fits in the document, but it would be
good to bring in clear descriptions of some of the key arguments
presented at the foundation meeting and on the mailing list. Maybe it
integrates somewhere in the comparison, or else a section at the end.
e.g.
  * The Direct specification allows a higher degree of abstraction. An
implementer could provide that memory-mapped interface and implement
it by injecting instructions into the core. The counter-argument is
that by specifying the implementation approach, the Instruction spec
allows a more efficient design.

I think the way to add that is to add eg. "degree of abstraction" as a metric, and then compare.
I'll do that.

Or perhaps better: number of possible implementations
What do people think?

Tim

Megan Wachs

unread,
Dec 7, 2016, 4:39:11 PM12/7/16
to Tim Newsome, Alex Bradbury, RISC-V Debug Group
* I'm not quite sure where it fits in the document, but it would be
good to bring in clear descriptions of some of the key arguments
presented at the foundation meeting and on the mailing list. Maybe it
integrates somewhere in the comparison, or else a section at the end.
e.g.
  * The Direct specification allows a higher degree of abstraction. An
implementer could provide that memory-mapped interface and implement
it by injecting instructions into the core. The counter-argument is
that by specifying the implementation approach, the Instruction spec
allows a more efficient design.

I think the way to add that is to add eg. "degree of abstraction" as a metric, and then compare.
I'll do that.

Or perhaps better: number of possible implementations
What do people think?

I think "degree of abstraction" is an OK metric, but I don't agree that the memory mapped interface is "more abstract".  As I said in today's call, we need to agree *what* the abstraction should be. Both "do a memory mapped access" and "execute these instructions" are less abstract translations of the ultimate abstraction "give me the current value of what's in CSR x". In the extreme, we could send the actual GDB protocol directly into the hardware (this is why I was asking about adv_debug_system... is that sort of how it works?) and then the hardware implementer could do whatever they want to translate that command. But I *think* no one likes that idea, because we want to do more translation in the SW given assumptions about what is still easy to do in hardware.

Megan
 


--
You received this message because you are subscribed to the Google Groups "RISC-V Debug Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debug+unsubscribe@groups.riscv.org.
To post to this group, send email to de...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/debug/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/debug/CAGDihembzu%3D8Dx94in%3DVCH6nWsTEoGdzNVhbeav59AW-ixwK_g%40mail.gmail.com.



--
Megan A. Wachs
Engineer | SiFive, Inc 
300 Brannan St, Suite 403 
San Francisco, CA  94107 

Alex Bradbury

unread,
Dec 7, 2016, 5:04:50 PM12/7/16
to Tim Newsome, RISC-V Debug Group
On 7 December 2016 at 21:23, Tim Newsome <t...@sifive.com> wrote:
> On Wed, Dec 7, 2016 at 7:38 AM, Tim Newsome <t...@sifive.com> wrote:
>>
>>
>>> * Does anybody understand the requirements for 'real-time' or low
>>> latency debug? It seems in some use cases, as low latency as possible
>>> is desired. What sort of latency do we see on the SiFive debug
>>> implementation?
>>
>>
>> I haven't heard any specific requirements. I suspect they run the gamut
>> from "no impact at all" to "interrupting for 50ms is OK" but I don't know
>> that. In the SiFive debug implementation (0.11 of the spec), the processor
>> will be "halted" for about 100 instructions. They'll be fetched from Debug
>> ROM/RAM, and I'm not sure how slow accessing them is. I haven't measured the
>> impact, or even implemented it since gdb doesn't support the concept of
>> accessing a target while it's running.
>
>
> A kind of similar point to yours was brought up in the call, asking about
> how we define "ease of verification." The real issue is that the "metrics"
> aren't very precise in any case, but they are aspects to consider when
> choosing a debug spec. We can compare what each spec offers even when we
> can't put a number on a given "metric" for each one.

I agree, I think it's fine if we don't decide one is "better" than the
other, but instead just have a prose description of what sort of
use-cases it may be able to handle. Hopefully people can chime in with
experiences (e.g. on a previous SoC project, they required debugger
latency of N cycles).

The design space is too complex to be purely quantitative.

Best,

Alex

Alex Bradbury

unread,
Dec 7, 2016, 5:18:47 PM12/7/16
to Megan Wachs, Tim Newsome, RISC-V Debug Group
On 7 December 2016 at 21:39, Megan Wachs <me...@sifive.com> wrote:
> I think "degree of abstraction" is an OK metric, but I don't agree that the
> memory mapped interface is "more abstract". As I said in today's call, we
> need to agree *what* the abstraction should be. Both "do a memory mapped
> access" and "execute these instructions" are less abstract translations of
> the ultimate abstraction "give me the current value of what's in CSR x". In
> the extreme, we could send the actual GDB protocol directly into the
> hardware (this is why I was asking about adv_debug_system... is that sort of
> how it works?) and then the hardware implementer could do whatever they want
> to translate that command. But I *think* no one likes that idea, because we
> want to do more translation in the SW given assumptions about what is still
> easy to do in hardware.

I think the point isn't so much to decide one is "more abstract" than
the other, but to highlight the key design trade-offs for a wider
community to chew over. I think we can agree both approaches use a
_different_ abstraction, and it is the case that the 'direct' approach
could be an interface to an implementation that works by inserting
instructions. At the same time, I take your point that exposing a
memory map for registers or the ability to insert instructions are
really just different ways of achieving the ultimate goal of exposing
information to debug from the processor core.

Another aspect that comes to mind regarding abstraction: exposing
registers and other resources directly via a memory map strikes me as
a more 'conventional' approach, and one that is likely to be used
across peripherals implemented on most RISC-V SoCs. That's not to say
it's better or worse than the Instruction approach, just something
else that comes to my mind when comparing them.

The most compelling argument in this whole debate for me so far has
been how e.g. adding vector support and a new vector register file
when using the Instruction debugging approach could require no changes
at all on the hardware implementation side in order to expose the VRF
to the debugger.

Best,

Alex

Rex McCrary

unread,
Dec 7, 2016, 8:22:48 PM12/7/16
to Alex Bradbury, Megan Wachs, Tim Newsome, RISC-V Debug Group
Launching off Alex's comments...If we want to support very high end processors, the memory map approach has some characteristics we need to better understand how to handle well.   I have a concern that a slave interface on a processor running four threads in non-vector designs will make it more difficult to meet multi-GHz frequencies and adding four banks of Vector registers, as Alex mentioned above,  would make it even harder.  The concerns include wires congestion, and area, among others. From what I have read and am told by our CPU designers a lot of effort is placed in understanding every wire path in the core and adding all these wires from the very GPRs that are so careful with and the interface that goes with it would be a non-starter for some design teams.  

One could argue that you could "multi-cycle" that logic, but that adds additional complexity, verification, and does not help the wire or congestion concerns.   One could then argue that you could "serialize" the wide buses over multiples cycles, but is adding more complexity as well.


--
You received this message because you are subscribed to the Google Groups "RISC-V Debug Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debug+unsubscribe@groups.riscv.org.
To post to this group, send email to de...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/debug/.

Joe Xie

unread,
Dec 8, 2016, 8:24:27 AM12/8/16
to Tim Newsome, RISC-V Debug Group
Hi Tim,

Dose your proposal (i.e. full feature one) provide a mechanism to disable debug when the hart is running in M-mode? Consider a use case that we only want people to debug applications running in user mode but hardware will disable debug as long as it is running in M-mode, does you proposal support that use case? 

Thanks
Joe

--
You received this message because you are subscribed to the Google Groups "RISC-V Debug Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debug+un...@groups.riscv.org.

To post to this group, send email to de...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/debug/.


This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Stefan Wallentowitz

unread,
Dec 8, 2016, 8:53:18 AM12/8/16
to de...@groups.riscv.org
On 08.12.2016 14:24, Joe Xie wrote:
> Hi Tim,
>
> Dose your proposal (i.e. full feature one) provide a mechanism
> to disable debug when the hart is running in M-mode? Consider a use case
> that we only want people to debug applications running in user mode but
> hardware will disable debug as long as it is running in M-mode, does you
> proposal support that use case?
>
> Thanks
> Joe

Hi Joe,

that came up during the call yesterday, too. We have for now found that
the security/authentication discussion is orthogonal to both specs and
both specs can support it generally (we specifically had the example you
are asking).

Cheers,
Stefan


signature.asc

Richard Herveille

unread,
Dec 8, 2016, 9:47:55 AM12/8/16
to RISC-V Debug Group, Richard Herveille, Tim Newsome
Hi all,

I thought about what was said during yesterdays debug call and I took some time to consider/think it through.
I would like to propose the following (another proposal), which, hopefully, combines the best of the two proposals.

Rationale behind this proposal:
  1. Keep-it-simple. Use existing structures in the CPU
  2. Keep it flexible


Proposal:
  1. I really like having the hart’s Debug Unit being memory mapped. This allows it to be connected to a dedicated Debug Controller [Transport-Module] (point-to-point) or act as another slave on the (system) bus. This allows easily changing/updating the Debug Controller [Transport-Module] or having multiple Debug Controllers. For example imagine an SoC with multiple interfaces. To bring up the SoC a dedicated JTAG Debug Controller can be used. Once that’s up and running switching to a PCIe interface is straightforward. Or even using a 2nd hart/core to debug another one works out of the box.
  2. The advantage of making the Debug Controller [Transport-Module] a system-bus slave has the advantage that it can be used to access/inspect all slaves on the bus, without using the CPU. This is a big help in bringing up the peripherals while the CPU isn’t behaving nicely (or simple dead, which unfortunately sometimes happens).
    I realise that this is then the physical view and doesn’t necessarily represent the view from the CPU. If that’s required, then use the debug monitor (see below).
  3. I don’t like the idea of having scratch-ram or instruction buffers inside the CPU. But I see and like the flexibility of this approach. Therefore I propose to remove these from (inside) the CPU and use existing memory instead. Now no changes to the CPU’s pipeline are needed at all.
    If you want to run a debug session, use the Debug Controller (or any other master which is on the system bus) to load the debug code/debug monitor into RAM. Then make the hart jump to that code when it hits a trigger (either hardware or software breakpoint). Of course if the debug monitor is in ROM, there’s no need to load it first.
    If we have a ‘jump-to’-register for each hardware-breakpoint we can have different code per breakpoint (if desired). Thus allowing chained triggers for example.

I scribbled a diagram that (hopefully) explains all this a bit better.


We still need to define the debug registers and decide whether these live in the CSR space or are dedicated to the Debug Unit. But this should not cause any serious issues.
These registers would define hardware breakpoints, provide feature-set to debug SW, determine privilege level which may be debugged, ...


With the above proposal we got a simple and clean way of jumping into a Debug Monitor. But how do we get access to the CPU’s internals?  With the initial (direct) proposal it’s just a matter of addressing the registers as a RAM (address, RW, data). It’s simple, direct, and low latency. My customers (all embedded microcontroller type applications) really like the simplicity. However I also realise that can get big for multi-hart implementations with 128bit RF/FRF/VRF.
Should we have a command/data CSR for handshaked communication? For example the debug software writes ‘read X1’ to the command register. The debug monitor polls this and sees the command, gets X1 and writes its contents into the data register and then clears the command register (signalling command complete). The debug sw polls the command register, sees it cleared, and gets the data?
In my initial (Direct) proposal SingleStep meant, stall hart’s instruction fetch. Clear SingleStep bit, continue instruction fetch. I still like it’s simplicity, but I guess single stepping could be handled via the debug monitor in a similar fashion as accessing internal registers. A breakpoint causes a jump to debug monitor. Debug SW issues ‘continue’ command. Debug monitor acknowledges and returns to PC before jump.


How much memory would the debug monitor need? What about the increased latency, is that an issue?


Some additional thoughts;
Tim asked me if my Debug Controller could be reduced in size. If the hart’s Debug Unit is mapped in the system bus space, then yes. It can be reduced by ~25%, which is a big plus.

Richard

DebugUnitProposal.pdf

Tim Newsome

unread,
Dec 8, 2016, 12:34:22 PM12/8/16
to Joe Xie, RISC-V Debug Group
On Thu, Dec 8, 2016 at 5:24 AM, Joe Xie <jo...@nvidia.com> wrote:
Dose your proposal (i.e. full feature one) provide a mechanism to disable debug when the hart is running in M-mode? Consider a use case that we only want people to debug applications running in user mode but hardware will disable debug as long as it is running in M-mode, does you proposal support that use case? 

It does not currently, although that would be easy to add. I'm a bit nervous about security models that allow debugging in U-mode but not others, simply because the debugger can change the user program to execute arbitrary code. Generally being able to execute arbitrary code is sufficient to exploit some vulnerability. But I suppose if it's essential to let somebody debug U mode then this is better than just giving them M mode access as well.

Note that at present time the debugger can also explicitly change the execution mode to a more privileged one. That behavior could also easily be limited to times when "full debug" is allowed, or something like that.

Thinking about this more, for security it's actually a nice feature of the instruction stuffing spec that memory accesses to the system go through the core (if the optional bus master isn't implemented).

Tim
 


Thanks
Joe

On Dec 7, 2016, at 5:27 AM, Tim Newsome <t...@sifive.com> wrote:

I'm hoping that even if we can't agree on how exactly to do debug, we can agree on the differences in the proposals. I've been working on a Google doc that you might have seen. I'd like other people to contribute, but just opening it up to everybody might result in a mess. So I've moved it to github where we can track what changes were made, and by who.


I'd love to hear thoughts, pull requests, and other comments.

I'd especially like a better summary of what I've labeled the Direct Design. (I'm also happy to change names to whatever makes sense to people, but as was brought up in the meeting lightweight/full-featured didn't really highlight the differences.)

Tim

--
You received this message because you are subscribed to the Google Groups "RISC-V Debug Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debug+unsubscribe@groups.riscv.org.

To post to this group, send email to de...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/debug/.

Tim Newsome

unread,
Dec 8, 2016, 3:30:18 PM12/8/16
to Richard Herveille, RISC-V Debug Group
Hi Richard,

This is promising direction. I have a few comments/questions:
1. Where is the PC saved when you jump to the debug monitor? (A CSR in the core seems like the obvious choice to me.)
2. Where do you save a scratch register when jumping to the debug monitor? You need to save at least one and possibly two. If you're thinking of storing it in the RAM, keep in mind you need to statically allocate space for each hart, since they all might jump to the debug monitor at any time.
3. If you want to keep the amount of RAM required for debug to a minimum, you can simply have the debugger write the minimum code required for whatever it needs to do next there. That's a bit more complex than using a monitor that understands commands, but shouldn't be too bad.

Thanks,
Tim


ROA LOGIC
Design Services and Silicon Proven IP

Richard Herveille
Managing Director
Cell +31 (6) 5207 2230






On Dec 7, 2016, at 5:27 AM, Tim Newsome <t...@sifive.com> wrote:

I'm hoping that even if we can't agree on how exactly to do debug, we can agree on the differences in the proposals. I've been working on a Google doc that you might have seen. I'd like other people to contribute, but just opening it up to everybody might result in a mess. So I've moved it to github where we can track what changes were made, and by who.


I'd love to hear thoughts, pull requests, and other comments.

I'd especially like a better summary of what I've labeled the Direct Design. (I'm also happy to change names to whatever makes sense to people, but as was brought up in the meeting lightweight/full-featured didn't really highlight the differences.)

Tim

--
You received this message because you are subscribed to the Google Groups "RISC-V Debug Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debug+unsubscribe@groups.riscv.org.


This email message is for the sole use of the intended recipient(s) and may contain confidential information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

--
You received this message because you are subscribed to the Google Groups "RISC-V Debug Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debug+unsubscribe@groups.riscv.org.

To post to this group, send email to de...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/debug/.

Tim Vogt

unread,
Dec 8, 2016, 7:14:32 PM12/8/16
to RISC-V Debug Group, richard....@roalogic.com, t...@sifive.com

  1. I really like having the hart’s Debug Unit being memory mapped. This allows it to be connected to a dedicated Debug Controller [Transport-Module] (point-to-point) or act as another slave on the (system) bus. This allows easily changing/updating the Debug Controller [Transport-Module] or having multiple Debug Controllers. For example imagine an SoC with multiple interfaces. To bring up the SoC a dedicated JTAG Debug Controller can be used. Once that’s up and running switching to a PCIe interface is straightforward. Or even using a 2nd hart/core to debug another one works out of the box.
  2. The advantage of making the Debug Controller [Transport-Module] a system-bus slave has the advantage that it can be used to access/inspect all slaves on the bus, without using the CPU. This is a big help in bringing up the peripherals while the CPU isn’t behaving nicely (or simple dead, which unfortunately sometimes happens).
    I realise that this is then the physical view and doesn’t necessarily represent the view from the CPU. If that’s required, then use the debug monitor (see below).
Since this is a physical view, I would recommend that it go in an appendix as an example implementation, and not be part of the main spec.  I would expect the spec itself to just describe 1) the architectural interface of the debug controller to the outside world (registers / memory addresses / behaviors), and 2) the architectural impact on the CPU core behavior (storing / restoring state like PC on a debug trigger, run control operations, etc.).  Individual implementations may choose to do it different ways (e.g., debug transport and debug controller integrated together; possibly even integrated with the hart in tiny systems).

FYI, my current focus is also on low-resource systems, so I'd like to see a debug solution which is lightweight but can scale up efficiently.
  1. I don’t like the idea of having scratch-ram or instruction buffers inside the CPU. But I see and like the flexibility of this approach. Therefore I propose to remove these from (inside) the CPU and use existing memory instead. Now no changes to the CPU’s pipeline are needed at all.
    If you want to run a debug session, use the Debug Controller (or any other master which is on the system bus) to load the debug code/debug monitor into RAM. Then make the hart jump to that code when it hits a trigger (either hardware or software breakpoint). Of course if the debug monitor is in ROM, there’s no need to load it first.
    If we have a ‘jump-to’-register for each hardware-breakpoint we can have different code per breakpoint (if desired). Thus allowing chained triggers for example.
If I understand correctly, this sounds like something akin to an Interrupt Vector Table (maybe call it a "Debug Vector Table"?).  If so, then it can be architecturally defined similar to an interrupt (i.e., what state is saved by the hart entering debug context, hart behavior/restrictions while in debug context, state restored when leaving debug context, etc.).  Interrupt behavior is something that is well defined and people already understand, so defining a debug context in a similar way seems appealing.

I like that aspect of the proposal, but it does seem to require a debug monitor implementation without a possibility of a direct implementation, which seems a bit restrictive.  Please correct me if I misunderstand.
 
We still need to define the debug registers and decide whether these live in the CSR space or are dedicated to the Debug Unit. But this should not cause any serious issues.
These registers would define hardware breakpoints, provide feature-set to debug SW, determine privilege level which may be debugged, ...

With the above proposal we got a simple and clean way of jumping into a Debug Monitor. But how do we get access to the CPU’s internals?  With the initial (direct) proposal it’s just a matter of addressing the registers as a RAM (address, RW, data). It’s simple, direct, and low latency. My customers (all embedded microcontroller type applications) really like the simplicity. However I also realise that can get big for multi-hart implementations with 128bit RF/FRF/VRF.
Should we have a command/data CSR for handshaked communication? For example the debug software writes ‘read X1’ to the command register. The debug monitor polls this and sees the command, gets X1 and writes its contents into the data register and then clears the command register (signalling command complete). The debug sw polls the command register, sees it cleared, and gets the data?
In my initial (Direct) proposal SingleStep meant, stall hart’s instruction fetch. Clear SingleStep bit, continue instruction fetch. I still like it’s simplicity, but I guess single stepping could be handled via the debug monitor in a similar fashion as accessing internal registers. A breakpoint causes a jump to debug monitor. Debug SW issues ‘continue’ command. Debug monitor acknowledges and returns to PC before jump.

To me, these registers, commands, and protocol are the heart of what needs to be defined in the debug spec.  If defined well, this interface could be generic enough to be implemented directly in HW, or in SW via a debug monitor.
I like the idea of essentially a message-based protocol (send a message to the debug controller, wait for the response message).  It's simple and very extensible.  For example, we could define a base set of commands supported by all implementations, with extensions of additional commands depending on the capabilities of the hart--similar to the way the instruction set is defined with base instructions and optional extensions (RV32I, M, A, etc.).

In a message-based scheme, standard (base) operations like reading a register or single-stepping could have small, fixed-size messages, whereas optional extensions could have longer (or even variable-length) messages.  For example, one such optional command could be "run the instruction sequence contained in the (variable length) payload of this message."

I haven't thought through all the implications of such a message-based protocol, but it would simplify the interface definition, would allow for both HW and SW implementations, and would be easily extensible.

--Tim Vogt (not the other Tim ;-)

Alex Bradbury

unread,
Dec 16, 2016, 11:30:16 AM12/16/16
to Richard Herveille, RISC-V Debug Group, Tim Newsome
On 8 December 2016 at 14:47, Richard Herveille
<richard....@roalogic.com> wrote:
> With the above proposal we got a simple and clean way of jumping into a
> Debug Monitor. But how do we get access to the CPU’s internals? With the
> initial (direct) proposal it’s just a matter of addressing the registers as
> a RAM (address, RW, data). It’s simple, direct, and low latency. My
> customers (all embedded microcontroller type applications) really like the
> simplicity. However I also realise that can get big for multi-hart
> implementations with 128bit RF/FRF/VRF.
> Should we have a command/data CSR for handshaked communication? For example
> the debug software writes ‘read X1’ to the command register. The debug
> monitor polls this and sees the command, gets X1 and writes its contents
> into the data register and then clears the command register (signalling
> command complete). The debug sw polls the command register, sees it cleared,
> and gets the data?

So the obvious concern about this is that perhaps you're describing a
new command language in parallel to the existing instruction set. I
see the question of "why not just use an actual RISC-V instruction"
was asked in the recent meeting. Are there reasons beyond the recorded
response (that the instruction takes more bits)?

I feel that thanks to all of the discussions we've had so far, I have
a reasonably good understanding of the trade-offs between the
'instruction' and 'direct' approach. I haven't developed the same
understanding for this proposal, and as with any compromise proposal
there's always a concern we end up with something that's "worst of
both worlds". For large implementations, what advantage do you see of
the 'command register' vs exposing the 'direct' memory map, but
handling accesses via instruction stuffing?

Thanks,

Alex
Reply all
Reply to author
Forward
0 new messages