Wow, that���s big news. Congrats! I look forward to hearing more.
> This debug spec will be just as open as the RISC-V ISA, and hopefully will be as good as the ISA so everybody can just use it. SiFive will release open source implementations of the debug spec for both the Rocket Chip and Z-Scale cores.
>
> I've spent 8 years at GHS implementing software support for a variety of hardware debug interfaces, so I have a pretty good idea what features people might want, but I'd love to hear what you think before seriously starting to write a spec. Simple features (hardware data breakpoints), workflows (program flash and reboot), other requirements (must work well with 1024 cores), I want to hear it all. I can't promise to fit everything in, but I will do my best.
Having worked at semiconductor companies, I���ve been quite frustrated with the
after-thought debugging features typically offered. They tend to be driven by the
hardware implementation and doesn���t fit what a SW developer needs. One typical
example: pick a small subset of the following random and obscure set of events.
Take an interrupt if any of them happens.
Further, in the face of multiple threads, I���ve found ���single step��� debugging impractical
and trace-based debugging the only way to make progress.
For my embedded cores, I implemented(*) two features I found very helpful:
- A trace buffer containing the pc of every taken control flow operation.
This allows you to better tell ���how did we get here��� than you can get from the stack.
- For a subset of memory locations, trace of the cycle-time, hartid, and PC of stores to
this location.
All traces should spill to external memory for large and useful traces.
Regards,
Tommy
(*) my implementation wasn���t quite this general, alas.
> I've spent 8 years at GHS implementing software support for a variety of hardware debug interfaces, so I have a pretty good idea what features people might want, but I'd love to hear what you think before seriously starting to write a spec. Simple features (hardware data breakpoints), workflows (program flash and reboot), other requirements (must work well with 1024 cores), I want to hear it all. I can't promise to fit everything in, but I will do my best.
Having worked at semiconductor companies, I’ve been quite frustrated with the
after-thought debugging features typically offered. They tend to be driven by the
hardware implementation and doesn’t fit what a SW developer needs. One typical
example: pick a small subset of the following random and obscure set of events.
Take an interrupt if any of them happens.
Further, in the face of multiple threads, I’ve found “single step” debugging impractical
and trace-based debugging the only way to make progress.
For my embedded cores, I implemented(*) two features I found very helpful:
- A trace buffer containing the pc of every taken control flow operation.
This allows you to better tell “how did we get here” than you can get from the stack.
- For a subset of memory locations, trace of the cycle-time, hartid, and PC of stores to
this location.
All traces should spill to external memory for large and useful traces.
<snip>
The current goal is to have a working system done by July 1, 2016. That means a RISC-V implementation on an FPGA with a real JTAG debugger connected to it, debugging RISC-V code. I'm getting started on this ASAP, so please let me know what you want the debug interface to do for you.
It may be interesting for you to follow or help us by reviewing at the
Open SoC Debug project:
* Website: opensocdebug.org
* Introductory slides: http://opensocdebug.org/slides/2015-11-12-overview/
We are transferring some stuff from a previous project, but mostly try
to do a clean room specification.
One of the first things we add thanks to Andreas Traber and the Pulp
team is support for run control debug for cores based on the advanced
debug system (OpenRISC-ish, but pretty generic).
Please contact me if deeper interested.
Best,
Stefan
I don't think I've ever used a JTAG-based debugger, but execution traces are extremely useful.At Nokia, what we mostly used was timestamped process ID and procedure entry/exit so that we could measure system performance. For example, looking at full stack performance including kernel, database, and application when flick-scrolling on the touchscreen so we could see where the time was going.Here at CSAIl we are working on a RISC-V implementation in BSV and we're using online tracing of execution to verify correct execution of our implementation.
As you point out, some tooling is required. The current state of the art for tracing on production hardware is extremely difficult to use. Surely with an open ISA and open source software we can do better than that. I am completely agnostic about what the hardware interface is for configuration and control of debug, as long as it is a well documented interface for which open source tools are available.
I think you ought to do an IEEE-standard debugger.
IEEE-ISTO 5001-2003 (Nexus) feature set, class I, is the basic JTAG interface, cheap, practical and it meets many needs.
Then as time and budget permits, add hardware and software options supporting classes 2 through 4.
The Nexus options are standardized, and were developed by demanding customers, e.g. people debugging nonstop engine controllers.
Most of the features people have asked-for so far fit into these options.
And, then, you’re done. You have a range of good debuggers by any standard.
On Thu, Nov 19, 2015 at 1:03 PM Tim Newsome <t...@sifive.com> wrote:<snip>The current goal is to have a working system done by July 1, 2016. That means a RISC-V implementation on an FPGA with a real JTAG debugger connected to it, debugging RISC-V code. I'm getting started on this ASAP, so please let me know what you want the debug interface to do for you.Along the lines of what Tommy and Jamey were suggesting, I'd like to see something like Intel Processor Trace (PT) on RISC-V:
Having worked at semiconductor companies, I’ve been quite frustrated with the
after-thought debugging features typically offered. They tend to be driven by the
hardware implementation and doesn’t fit what a SW developer needs. One typical
example: pick a small subset of the following random and obscure set of events.
Take an interrupt if any of them happens.I don't quite follow your example of what doesn't fit. Are you talking about the hardware breakpoints these implementations provide?
- A trace buffer containing the pc of every taken control flow operation.
This allows you to better tell “how did we get here” than you can get from the stack.How do you look at this data? I ask because in theory you can store more information in fewer bits, but then a lot more tooling is required in the debugger to show it in a way that makes sense.
If you're in the habit of just hex dumping a buffer, then that might not be what you want. But if you're always in an environment where it's easy to add some tooling, then a fancier trace format would be preferable.
- For a subset of memory locations, trace of the cycle-time, hartid, and PC of stores to
this location.By cycle time you mean a timestamp (in cycles) when the instruction was executed, or the time it actually took for the store to happen? (The latter actually seems pretty tricky to define, with caches in the mix.)
How many memory ranges would you typically be interested in?
How useful would it be to store the value of each store as well?
How large is large?
Are you interested at all in a dedicated trace port, so a program can be traced without affecting its timing at all?
Sorry for all the questions. I'm just trying to put together exactly what you want. I think I know what you want, but it's better to ask and be sure.
- A trace buffer containing the pc of every taken control flow operation.
This allows you to better tell “how did we get here” than you can get from the stack.How do you look at this data? I ask because in theory you can store more information in fewer bits, but then a lot more tooling is required in the debugger to show it in a way that makes sense.I assume controls to flush the internal buffer and CSRs pointing to the start and end of the trace buffer.Access to the buffer would be the same way you access memory. You do provide a way to read andwrite memory over JTAG, right?
If you're in the habit of just hex dumping a buffer, then that might not be what you want. But if you're always in an environment where it's easy to add some tooling, then a fancier trace format would be preferable.I’m not sure what you mean, but it’s hardly difficult to add tool support for my suggestion.
How many memory ranges would you typically be interested in?That’s an important question. Having a small (say, 4) memory locationpoints is IMO useless in practice. If the implementation have a TLB,then I’d consider enabling tracing on a per-page level, with the bit in thePTE.
How useful would it be to store the value of each store as well?data + byteenables, yes I forgot(!) that.How large is large?128K+ events, but obviously “large” is context dependent.
Are you interested at all in a dedicated trace port, so a program can be traced without affecting its timing at all?IMO, anything that requires specialized hardware will not get as much usage so *I* wouldn’t,but you are of course correct that completely avoiding observation effects would requirededicated resources, such as a dedicated trace buffer [per core].Sorry for all the questions. I'm just trying to put together exactly what you want. I think I know what you want, but it's better to ask and be sure.Thanks for asking. I appreciate that these are just my opinions, butas with everything RISC-V, this is an opportunity to do things better.
No, that fails for fx. computed function calls.
> I believe as a general design principle we don't like to put things in the TLB that don't "belong" there. How many pages might you want to set this bit on?
> What about memory ranges, instead of locations?
> Ie. the breakpoint hits if the address accessed is between A and B. 4 sets of those seems pretty typical, although I'm not sure if it's enough to be useful. (The spec will probably allow for a largish number of them, if the implementer so chooses.)
That���s is understandable, but unfortunate. The only thing I can say is that the more general such a feature is,
the more cases it would be applicable to.
One of the hardest things I ever had to debug was a generational garbage collector (each session took days).
A feature such as this would have been enormously useful, but only if it could cover the entire heap. YMMV.
Regards,
Tommy
> You can save a lot of space by only storing a single bit for every branch taken/not taken, instead of storing full PCs.
No, that fails for fx. computed function calls.
Scratch that. It fails completely! Hint: many branches can target the same address. How do you know which one?
> I believe as a general design principle we don't like to put things in the TLB that don't "belong" there. How many pages might you want to set this bit on?
> What about memory ranges, instead of locations?
> Ie. the breakpoint hits if the address accessed is between A and B. 4 sets of those seems pretty typical, although I'm not sure if it's enough to be useful. (The spec will probably allow for a largish number of them, if the implementer so chooses.)
That’s is understandable, but unfortunate. The only thing I can say is that the more general such a feature is,
the more cases it would be applicable to.
One of the hardest things I ever had to debug was a generational garbage collector (each session took days).
A feature such as this would have been enormously useful, but only if it could cover the entire heap. YMMV.
Regards,
Tommy
On 19.11.2015 22:30, Tim Newsome wrote:
> I took a read through your slides. (I like this presentation format, but
> I'd like to see slide N/M displayed at the bottom.)
Yeah, I used this reveal.js for the first time, seems it is an easy way
to do online slides with markdown and host them on github. Will see if
there is an N/M display option. Thanks.
> Using glip over JTAG seems a bit heavy-weight for a small processor.
> This might be fine for a full-featured CPU with several cores, but it
> seems a bit wasteful for a microcontroller. For instance, it requires
> sending 48 bits of data just to set up a data transfer, which is a huge
> amount of overhead since typical data transfers are small. (User prints
> out the value of a register, changes the value of a variable, etc.) On
> most 32-bit processors, 48 bits might be the largest register in the
> chip, just to accommodate this overhead! There are large data transfers,
> but even there you're limited by the size of a reasonable register on a
> chip. Nobody's going to put a 4KB shift register on the chip just to be
> able to efficiently download a program. (Maybe you can reuse on-chip
> cache for that, but I suspect just adding the hardware to turn it into a
> shift register is prohibitively expensive.)
It is more or less I/O driven. If you have no high-speed I/O you can use
JTAG and that only brings you to decent speed for live tracing if you
use large shift registers. If you don't use a large shift register, your
throughput will be less of course. But that gives you what you also get
with standard JTAG debug interfaces, right?
But as said, I see the JTAG option more as a generic option if you start
using open soc debug.
> Your ideas of having one component trigger trace on another are more
> interesting to me.
This is indeed an interesting topic in research and already done in some
companis, like what someone else on the thread mentioned as online trace
processing. A good starting point if you are interested in this topic is
a nice presentation from a workshop we held
(http://www.mad-workshop.de/2013.html):
http://www.mad-workshop.de/slides/7_1.pdf
> Typo on the DEBUG PROCESSORS slides: microporgrams
Thanks for pointing out. I will do a revision and also include plans to
add a logic analyzer module type.
Best,
Stefan
On 20 Nov 2015, at 03:46, Tim Newsome <t...@sifive.com> wrote:
I fully agree on Richard's NEXUS opinion and would like to add that even
the industry that created it doesn't use it widely. Together with it
being a closed standard I think there remains no good reason to use it.
> Also I would suggest to split the debug controller from the actual
> interface. There���s strong case to have the same debug system work over
> JTAG, USB, Ethernet, ���.
Exactly, I think this is the most important that it should not be a
_JTAG_ debug controller, but the debug interface controller with a
simple MMIO-ish interface.
> What definitely needs to be defined is the set of registers/features the
> CPU must support. Then the behaviour, states, and features of the debug
> controller, then how it interfaces to multiple physical interfaces.
Andreas and I recently started something in this direction and are at
the very beginning. Basically we start with the adv_dbg_sys interface
and plan an adapter to a rather generic MMIO-interface plus an interrupt
mechanism to signal events.
Coming back to Tim's original question, I think the most important thing
I want from a JTAG debug is that it is developed together with other
projects in the community for better interoperability.
Best,
Stefan
On Nov 19, 2015, at 3:32 PM, Tim Newsome <t...@sifive.com> wrote:On Thu, Nov 19, 2015 at 11:59 AM, Jamey Hicks <jamey...@gmail.com> wrote:I don't think I've ever used a JTAG-based debugger, but execution traces are extremely useful.At Nokia, what we mostly used was timestamped process ID and procedure entry/exit so that we could measure system performance. For example, looking at full stack performance including kernel, database, and application when flick-scrolling on the touchscreen so we could see where the time was going.Here at CSAIl we are working on a RISC-V implementation in BSV and we're using online tracing of execution to verify correct execution of our implementation.What exactly does "online" mean in this context?
As you point out, some tooling is required. The current state of the art for tracing on production hardware is extremely difficult to use. Surely with an open ISA and open source software we can do better than that. I am completely agnostic about what the hardware interface is for configuration and control of debug, as long as it is a well documented interface for which open source tools are available.Is there anything you're not agnostic about? (Eg. type of data collected during trace, size of trace buffer, ...)
The BERI debug unit (which we talk to over JTAG) allows:
- Recording stream traces into a buffer, which can be dumped over JTAG (or PCIe), with each entry containing containing:
* Current PC
* Executed instruction
* Register value written (for stores to memory, the address)
* Wrapping cycle count value
* The current ASID
- Pausing and resuming the processor
- Setting breakpoints at specific PC values
- Injecting arbitrary instructions to execute
We currently don���t have, but would find very useful, (ARM and Intel equivalents have this) the ability for software to inject arbitrary data into the stream from software, for example changes to VM mappings and the current PID from the kernel.
I mostly use this for post-mortem debugging and have written an LLVM-based GUI tool that we use to inspect traces of a few tens of GBs. Our trace allows us to reconstruct all of the values in registers quite quickly (even short traces often have a context switch fairly early on, which gives us everything), but it would be nice if the start of a trace could have a complete dump of all register values. The format of our traces is parameterised in our trace analysis library (C++ template with a thing that knows how to transform the on-disk trace format into something in memory) and would likely be easy to adapt to RISC-V (assuming a slightly less immature LLVM implementation). I discussed this briefly with Yunsup a couple of weeks ago.
I���ve not used the online debugging features very often, but for post-mortem debugging of OS and compiler issues it���s been invaluable to be able to easily explore large traces. I���m very glad we didn���t have to try to bring up a software stack without this support.
David
So these are all cool and awesome ideas and make a lot of sense for big systems, but we are in the microcontroller segment … and when the debug system gets bigger than the actual CPU something is off.So how about creating a list of increasingly complexer/more powerful debug facilities? Ranging from bare minimum (i.e. single step), to slightly more complex (HW breakpoints/watchpoints), … to full per-thread traces.
I know NEXUS follows a similar approach; but NEXUS closed up. You need to pay to get even basic access to their specs. Which contradicts why we chose to opt for the RISC-V (free). Although the 1999 release is still available on the web and was released for free.
Also I would suggest to split the debug controller from the actual interface. There’s strong case to have the same debug system work over JTAG, USB, Ethernet, ….What definitely needs to be defined is the set of registers/features the CPU must support. Then the behaviour, states, and features of the debug controller, then how it interfaces to multiple physical interfaces.We are currently working on a JTAG interface, but there’s no reason the debug controller couldn't communicate via USB. Reason why we’re currently focused on JTAG is because that interface is simply available on most (if not all) designs and platforms we work on and adding additional pins or a different interface is not allowed by our customers.
On 20.11.2015 09:36, Richard Herveille wrote:
> I know NEXUS follows a similar approach; but NEXUS closed up. You need
> to pay to get even basic access to their specs. Which contradicts why we
> chose to opt for the RISC-V (free). Although the 1999 release is still
> available on the web and was released for free.
I fully agree on Richard's NEXUS opinion and would like to add that even
the industry that created it doesn't use it widely. Together with it
being a closed standard I think there remains no good reason to use it.
> Also I would suggest to split the debug controller from the actual
> interface. There’s strong case to have the same debug system work over
> JTAG, USB, Ethernet, ….
Exactly, I think this is the most important that it should not be a
_JTAG_ debug controller, but the debug interface controller with a
simple MMIO-ish interface.
> What definitely needs to be defined is the set of registers/features the
> CPU must support. Then the behaviour, states, and features of the debug
> controller, then how it interfaces to multiple physical interfaces.
Andreas and I recently started something in this direction and are at
the very beginning. Basically we start with the adv_dbg_sys interface
and plan an adapter to a rather generic MMIO-interface plus an interrupt
mechanism to signal events.
Coming back to Tim's original question, I think the most important thing
I want from a JTAG debug is that it is developed together with other
projects in the community for better interoperability.
Best,
Stefan
On Nov 19, 2015, at 3:32 PM, Tim Newsome <t...@sifive.com> wrote:On Thu, Nov 19, 2015 at 11:59 AM, Jamey Hicks <jamey...@gmail.com> wrote:I don't think I've ever used a JTAG-based debugger, but execution traces are extremely useful.At Nokia, what we mostly used was timestamped process ID and procedure entry/exit so that we could measure system performance. For example, looking at full stack performance including kernel, database, and application when flick-scrolling on the touchscreen so we could see where the time was going.Here at CSAIl we are working on a RISC-V implementation in BSV and we're using online tracing of execution to verify correct execution of our implementation.What exactly does "online" mean in this context?By online, I mean we run the ISA simulator and the FPGA implementation in lockstep and compare the trace output without storing it anywhere.
As you point out, some tooling is required. The current state of the art for tracing on production hardware is extremely difficult to use. Surely with an open ISA and open source software we can do better than that. I am completely agnostic about what the hardware interface is for configuration and control of debug, as long as it is a well documented interface for which open source tools are available.Is there anything you're not agnostic about? (Eg. type of data collected during trace, size of trace buffer, ...)It’s pretty clear the community needs a range of options for that, depending on the needs of the applications. Even on higher end embedded devices, I expect that there will be pressure to reduce pin counts, so we need a low pin count interface to the debug/trace functionality.I’m sure that JTAG will be required for some applications, but I would prefer a packet oriented interface, so we can multiplex debug commands/responses, messages from the application (e.g., kernel console), and various types and levels of traces. For extremely small devices, the physical interface could be two or three wires, like SPI or I2C.For higher performance devices, USB-3 seems like it might work. That is the direction we’re leaning if we get funding to build RISC-V prototype chips. Off-the-shelf debug/trace controllers. :)
On 19 Nov 2015, at 18:02, Tim Newsome <t...@sifive.com> wrote:
>
> so please let me know what you want the debug interface to do for you.
The BERI debug unit (which we talk to over JTAG) allows:
- Recording stream traces into a buffer, which can be dumped over JTAG (or PCIe), with each entry containing containing:
* Current PC
* Executed instruction
* Register value written (for stores to memory, the address)
* Wrapping cycle count value
* The current ASID
- Pausing and resuming the processor
- Setting breakpoints at specific PC values
- Injecting arbitrary instructions to execute
We currently don’t have, but would find very useful, (ARM and Intel equivalents have this) the ability for software to inject arbitrary data into the stream from software, for example changes to VM mappings and the current PID from the kernel.
I mostly use this for post-mortem debugging and have written an LLVM-based GUI tool that we use to inspect traces of a few tens of GBs.
Our trace allows us to reconstruct all of the values in registers quite quickly (even short traces often have a context switch fairly early on, which gives us everything), but it would be nice if the start of a trace could have a complete dump of all register values. The format of our traces is parameterised in our trace analysis library (C++ template with a thing that knows how to transform the on-disk trace format into something in memory) and would likely be easy to adapt to RISC-V (assuming a slightly less immature LLVM implementation). I discussed this briefly with Yunsup a couple of weeks ago.
I’ve not used the online debugging features very often, but for post-mortem debugging of OS and compiler issues it’s been invaluable to be able to easily explore large traces. I’m very glad we didn’t have to try to bring up a software stack without this support.
David
My statement was not against JTAG. What others and I are saying is that
a debug interface definition is needed, the control/access scheme
defined (MMIO) and the question how to access registers is independent
of this (let it be JTAG, SWP, Aurora, ..). I think your question was in
the direction of the first and second mainly, right?
>
> Can you point me at any projects, preferably ones that have something at
> least half working?
There was a discussion between Richard and Andreas a week ago about
adv_debug_sys-based interfaces. This is a good starting point for a
discussion, I think.
Best,
Stefan
How do you envision USB/Ethernet debugging working? Would you have a USB/Ethernet controller on-chip which can optionally become a debug controller? That sounds doable assuming you can make those services work while the core is halted. Since JTAG as a transport just provides access to a few new registers, that should be easy to accommodate in USB/Ethernet as well.
Can you point me at any projects, preferably ones that have something at least half working?
Tim
Best,
Stefan
On 20.11.2015 19:54, Tim Newsome wrote:
> Exactly, I think this is the most important that it should not be a
> _JTAG_ debug controller, but the debug interface controller with a
> simple MMIO-ish interface.
>
>
> I disagree, for the 2 reasons I mentioned in my e-mail to Richard just now:
> 1. JTAG is the simplest interface, so it will see the largest adoption.
> (Even microcontrollers might implement it.)
> 2. JTAG is the slowest interface, so it benefits the most from any
> optimization.
My statement was not against JTAG. What others and I are saying is that
a debug interface definition is needed, the control/access scheme
defined (MMIO) and the question how to access registers is independent
of this (let it be JTAG, SWP, Aurora, ..).
I think your question was in
the direction of the first and second mainly, right?
>
> Can you point me at any projects, preferably ones that have something at
> least half working?
There was a discussion between Richard and Andreas a week ago about
adv_debug_sys-based interfaces. This is a good starting point for a
discussion, I think.
Best,
Stefan
How do you envision USB/Ethernet debugging working? Would you have a USB/Ethernet controller on-chip which can optionally become a debug controller? That sounds doable assuming you can make those services work while the core is halted. Since JTAG as a transport just provides access to a few new registers, that should be easy to accommodate in USB/Ethernet as well.JTAG in its essence is a simple serial protocol that requires a few dedicated registers. That can easily be done via USB, even without having a CPU working I think.Ethernet might be a bit more problematic to do without a CPU I guess, especially since it would be really neat if the communication is over TCP or UDP. But again all you need to do is mimic the JTAG regs.
While the core is paused. This is useful for dumping other information (e.g. doing a load at a particular address with $0 as the destination will put the value at that memory address into the stream trace).
>
>> I mostly use this for post-mortem debugging and have written an LLVM-based GUI tool that we use to inspect traces of a few tens of GBs.
>
> What are you using that lets you store tens of GBs of trace data?
Hard disks. There���s a small RAM buffer that we dump via JTAG over USB or PCIe.
David
David
and I think std quad SPI would suffice for most cases.SPI is much simpler to implement. The octal does need DDRIn general, USB, Eth or PCIe are a pain for smaller cores.Increases complexity and size and all three need PHY support.The best compromise is a SPI variant. Quad SPI or the new OctalSince we plan to release a quad SPI as part of our open source IPportfolio, the community can avoid having to license anything.If as proposed we have a standard interface agnostic interface we can use JTAG or a variant like thisfor the physical transport.
We need to finalize this pretty soon. We have a bunch of real world SoCs coming upand probably a start-up that may use our cores, so would like to standardize on somethingsooner than later.