Why does x86 triple fault reset?

Rick C. Hodgin

unread,

Apr 11, 2016, 11:25:44 PM4/11/16

to

Why didn't the Intel designers reserve some location in memory for
a triple fault handler? One which would allow the system memory to
be dumped to disk or output device if the condition arose?

Why just reboot? Seems improper.

Best regards,
Rick C. Hodgin

Robert Wessel

unread,

Apr 12, 2016, 12:32:03 AM4/12/16

to

On Mon, 11 Apr 2016 20:25:43 -0700 (PDT), "Rick C. Hodgin"
<rick.c...@gmail.com> wrote:

>Why didn't the Intel designers reserve some location in memory for
>a triple fault handler? One which would allow the system memory to
>be dumped to disk or output device if the condition arose?
>
>Why just reboot? Seems improper.

They already have a double fault handler. Those should already never
occur, but have a handler in case they do. What would be the point of
triple, quadruple, pentuple, etc. fault handlers? If the OS
implementers couldn't set up the double fault handler correctly (and
remember, the code only has to get to the first instruction of the
double fault handler before that no longer applies), what makes you
think that a triple fault handler will be correctly implemented?

In any event, a triple fault does not cause a reboot. It causes a
shutdown, which causes a special bus cycle, which external hardware
can recognize. PCs since the AT have recognized this via external
hardware, and have used that external hardware to then cause a reset,
which leads to a reboot. If that external hardware that caught the
shutdown were to set a testable flag, the reset handler could do some
sort of save operation (although some register contents would be
lost). Not unlike what it does now in the even of a reset via the
keyboard controller (where it check a flag in the CMOS RAM to
determine what kind of reset to do). Unfortunately it does not. But
again, this is a non-issue - just get the double fault handler right.

And if you really want to trap a triple fault, the virtualization
functions will let you do that (there's a specific VM Exit for a
shutdown).

Rick C. Hodgin

unread,

Apr 12, 2016, 11:18:21 AM4/12/16

to

On Tuesday, April 12, 2016 at 12:32:03 AM UTC-4, robert...@yahoo.com wrote:
> On Mon, 11 Apr 2016 20:25:43 -0700 (PDT), "Rick C. Hodgin"
> <rick.c...@gmail.com> wrote:
>
> >Why didn't the Intel designers reserve some location in memory for
> >a triple fault handler? One which would allow the system memory to
> >be dumped to disk or output device if the condition arose?
> >
> >Why just reboot? Seems improper.
>
>
> They already have a double fault handler. Those should already never
> occur, but have a handler in case they do. What would be the point of
> triple, quadruple, pentuple, etc. fault handlers? If the OS
> implementers couldn't set up the double fault handler correctly (and
> remember, the code only has to get to the first instruction of the
> double fault handler before that no longer applies), what makes you
> think that a triple fault handler will be correctly implemented?

There are slots defined in the TSS for multiple stack segments and
offsets. Another could be added for this cause

My thinking is that there's some reason why a triple fault occurs.
It's an unexpected condition and something's awry. And as a result,
the conditions which brought about the total collapse may want to be
examined. The addition of a triple fault handler would put the
machine into a state where it begins executing code at the specified
address, using the values from the current TSS which indicate where
the stack should go, etc.

In this case, it would be in a state where every subsequent failure
does not do anything except restart at the values indicated in the
TSS, and signal a pin that it's in a triple fault condition. By
toggling the triple fault pin repeatedly, an external monitoring
device could then force the reboot.

That's my thinking.

> In any event, a triple fault does not cause a reboot. It causes a
> shutdown, which causes a special bus cycle, which external hardware
> can recognize. PCs since the AT have recognized this via external
> hardware, and have used that external hardware to then cause a reset,
> which leads to a reboot. If that external hardware that caught the
> shutdown were to set a testable flag, the reset handler could do some
> sort of save operation (although some register contents would be
> lost). Not unlike what it does now in the even of a reset via the
> keyboard controller (where it check a flag in the CMOS RAM to
> determine what kind of reset to do). Unfortunately it does not. But
> again, this is a non-issue - just get the double fault handler right.
>
> And if you really want to trap a triple fault, the virtualization
> functions will let you do that (there's a specific VM Exit for a
> shutdown).

I'm thinking of the designer's mentality when they designed it. I can
see the case being made for the fact that the machine's already in such
an indeterminate state that to continue processing could mean a corruption
of I/O devices or their data. That's a valid point to me. I can also see
then a flag setting or TSS setting being added which indicates a triple-
fault disposition, to either do as it does today, or to enter into the
special triple-fault state where it runs a known block of code, and then
will stop at the HLT instruction permanently until reset.

I'm thinking of my implementation of LibSF 386-x40, and what I want the
CPU to do when it reaches code which causes a triple fault.

Quadibloc

unread,

Apr 12, 2016, 2:17:36 PM4/12/16

to

I don't know if this is relevant, but I was just recently reading an article on
the G4 processor for IBM's ESA/390 computers and how it had error-correcting
features that caught up with those on IBM's earlier mainframes which used
multiple chips to implement the CPU.

John Savard

Robert Wessel

unread,

Apr 12, 2016, 2:28:54 PM4/12/16

to

On Tue, 12 Apr 2016 08:18:18 -0700 (PDT), "Rick C. Hodgin"

<rick.c...@gmail.com> wrote:

>On Tuesday, April 12, 2016 at 12:32:03 AM UTC-4, robert...@yahoo.com wrote:
>> On Mon, 11 Apr 2016 20:25:43 -0700 (PDT), "Rick C. Hodgin"
>> <rick.c...@gmail.com> wrote:
>>
>> >Why didn't the Intel designers reserve some location in memory for
>> >a triple fault handler? One which would allow the system memory to
>> >be dumped to disk or output device if the condition arose?
>> >
>> >Why just reboot? Seems improper.
>>
>>
>> They already have a double fault handler. Those should already never
>> occur, but have a handler in case they do. What would be the point of
>> triple, quadruple, pentuple, etc. fault handlers? If the OS
>> implementers couldn't set up the double fault handler correctly (and
>> remember, the code only has to get to the first instruction of the
>> double fault handler before that no longer applies), what makes you
>> think that a triple fault handler will be correctly implemented?
>
>There are slots defined in the TSS for multiple stack segments and
>offsets. Another could be added for this cause
>
>My thinking is that there's some reason why a triple fault occurs.
>It's an unexpected condition and something's awry.

You're missing the point - that what the double fault is for. At that
point the system is already fouled up beyond all hope. And it's the
double fault handler that should do some logging or attempt some sort
of recovery. What you're asking for is that if the system is so
screwed up that it can't start the double fault handler, that you're
going to try again with the triple fault handler. What's the point?
And if you do that, why not a quadruple fault handler to clean up if
*that* fails?

>And as a result,
>the conditions which brought about the total collapse may want to be
>examined. The addition of a triple fault handler would put the
>machine into a state where it begins executing code at the specified
>address, using the values from the current TSS which indicate where
>the stack should go, etc.

That just seems crazy. Why would you try to embed this
should-never-occur condition in every single TSS? Why not just use
the interrupt mechanism to select a new TSS for the triple fault?

>In this case, it would be in a state where every subsequent failure
>does not do anything except restart at the values indicated in the
>TSS, and signal a pin that it's in a triple fault condition. By
>toggling the triple fault pin repeatedly, an external monitoring
>device could then force the reboot.

Most likely that will just lead to loops of triple faults should you
ever get to such a point.

>That's my thinking.
>
>> In any event, a triple fault does not cause a reboot. It causes a
>> shutdown, which causes a special bus cycle, which external hardware
>> can recognize. PCs since the AT have recognized this via external
>> hardware, and have used that external hardware to then cause a reset,
>> which leads to a reboot. If that external hardware that caught the
>> shutdown were to set a testable flag, the reset handler could do some
>> sort of save operation (although some register contents would be
>> lost). Not unlike what it does now in the even of a reset via the
>> keyboard controller (where it check a flag in the CMOS RAM to
>> determine what kind of reset to do). Unfortunately it does not. But
>> again, this is a non-issue - just get the double fault handler right.
>>
>> And if you really want to trap a triple fault, the virtualization
>> functions will let you do that (there's a specific VM Exit for a
>> shutdown).
>
>I'm thinking of the designer's mentality when they designed it. I can
>see the case being made for the fact that the machine's already in such
>an indeterminate state that to continue processing could mean a corruption
>of I/O devices or their data. That's a valid point to me. I can also see
>then a flag setting or TSS setting being added which indicates a triple-
>fault disposition, to either do as it does today, or to enter into the
>special triple-fault state where it runs a known block of code, and then
>will stop at the HLT instruction permanently until reset.

Again, the machine is already FUBAR'd at the double fault. And it's
the double fault handler that's supposed to deal with that. You only
get a triple fault if the machine is so screwed up it can't start the
double fault handler. The event you're trying to handle is *why*
there's a double fault handler.

Rick C. Hodgin

unread,

Apr 12, 2016, 3:32:40 PM4/12/16

to

On Tuesday, April 12, 2016 at 2:28:54 PM UTC-4, robert...@yahoo.com wrote:
> On Tue, 12 Apr 2016 08:18:18 -0700 (PDT), "Rick C. Hodgin"
> <rick.c...@gmail.com> wrote:
>
> >On Tuesday, April 12, 2016 at 12:32:03 AM UTC-4, robert...@yahoo.com wrote:
> >> On Mon, 11 Apr 2016 20:25:43 -0700 (PDT), "Rick C. Hodgin"
> >> <rick.c...@gmail.com> wrote:
> >>
> >> >Why didn't the Intel designers reserve some location in memory for
> >> >a triple fault handler? One which would allow the system memory to
> >> >be dumped to disk or output device if the condition arose?
> >> >
> >> >Why just reboot? Seems improper.
> >>
> >>
> >> They already have a double fault handler. Those should already never
> >> occur, but have a handler in case they do. What would be the point of
> >> triple, quadruple, pentuple, etc. fault handlers? If the OS
> >> implementers couldn't set up the double fault handler correctly (and
> >> remember, the code only has to get to the first instruction of the
> >> double fault handler before that no longer applies), what makes you
> >> think that a triple fault handler will be correctly implemented?
> >
> >There are slots defined in the TSS for multiple stack segments and
> >offsets. Another could be added for this cause
> >
> >My thinking is that there's some reason why a triple fault occurs.
> >It's an unexpected condition and something's awry.
>
>
> You're missing the point - that what the double fault is for.

No. Double-faults are recoverable, and could be brought about by a
faulty driver only. The rest of the system wouldn't need to go down,
just restart the driver.

> At that
> point the system is already fouled up beyond all hope. And it's the
> double fault handler that should do some logging or attempt some sort
> of recovery.

That's what it does. It will terminate whatever process caused the
issue, including loadable driver modules, and then report the event in
some way for examination.

> What you're asking for is that if the system is so
> screwed up that it can't start the double fault handler, that you're
> going to try again with the triple fault handler.

The double-fault handler is signaled when there's an interrupt to some
vector, and due to whatever cause another interrupt occurs.

These are completely recoverable in most situations. It's why there's
a dedicated handler for it. It were always a fatal condition, the 386
designers would've had it reboot then. It's not always fatal, so it
has provided recovery mechanisms.

> What's the point?
> And if you do that, why not a quadruple fault handler to clean up if
> *that* fails?

When a triple fault occurs, something unexpected has occurred beyond a
normal degree of unexpectedness. It is expected that loadable modules
and drivers could fail, so the kernel is designed to catch those errors
and terminate those processes.

When a triple fault occurs, however, something has gone completely awry,
and there's a reason for it that wasn't previously accounted for.

It is desirable to be able to flush memory to disk, or to output to some
debug port, so that an inspection of the machine could be made post-
mortem.

> >And as a result,
> >the conditions which brought about the total collapse may want to be
> >examined. The addition of a triple fault handler would put the
> >machine into a state where it begins executing code at the specified
> >address, using the values from the current TSS which indicate where
> >the stack should go, etc.
>
> That just seems crazy. Why would you try to embed this
> should-never-occur condition in every single TSS? Why not just use
> the interrupt mechanism to select a new TSS for the triple fault?

Because the TSS was created and instantiated when the machine was in a
known, stable state. It would likely have been setup properly.

I'm not opposed to the idea of having a special global interrupt vector
for the triple-fault condition. A single internal register would work.
That would be fine with me.

> >In this case, it would be in a state where every subsequent failure
> >does not do anything except restart at the values indicated in the
> >TSS, and signal a pin that it's in a triple fault condition. By
> >toggling the triple fault pin repeatedly, an external monitoring
> >device could then force the reboot.
>
> Most likely that will just lead to loops of triple faults should you
> ever get to such a point.

The area of memory which is reserved for the triple fault handler, a
relatively small block of probably only 1 KB max, could be protected
by the hardware so that it's not able to be written to unless a particular
RING-0 protocol is followed. This would prevent it from being accidentally
overwritten by an errant program, and would exist for this purpose (to be
able to write out memory should the condition occur).

When I was developing my kernel, I had triple faults on a regular basis.
I never knew why. I wound up doing silly things like inserting:

; Display a message "made it here" on-screen, and then
; entering an infinite loop so I can see where the error
; was occurring.
@@:
jmp @B

Having an image of what was in memory would've aided in trying to track
down where the error occurred. Knowing at least the CS selector, and
EIP would've been beneficial, for example.

Perhaps it would be enough to have those pushed out on the address and
data pins so they can be read externally when the triple-fault pin was
high, though the stack trace would also be desirable, as would the code
that existed at those locations, and the memory they pointed to.

It's not in a fatal state at a double-fault. It has the potential of
being in a fatal state, but it is still potentially recoverable. When
it reaches the triple fault state, that's when it's beyond hope, and
that's why I think it should have this ability, because without it
everything about the current machine environment is lost, but with it
the current machine environment can be saved and examined post-mortem.

> >I'm thinking of my implementation of LibSF 386-x40, and what I want the
> >CPU to do when it reaches code which causes a triple fault.

Quadibloc

unread,

Apr 13, 2016, 2:57:33 PM4/13/16

to

On Tuesday, April 12, 2016 at 1:32:40 PM UTC-6, Rick C. Hodgin wrote:

> No. Double-faults are recoverable, and could be brought about by a
> faulty driver only.

This is true; having looked this up on Wikipedia, I see that a double fault
occurs if a fault takes place _during an interrupt service routine_.

So we're not talking about a hardware failure here. And software, as we all
know, has bugs. Even really stupid, horrible, bugs. So why should there be any
limit - let alone a very strict one - on what action the processor can take in
response to a software bug?

This seems like a very reasonable question you're asking.

However, I *can* also see the other side of this.

What happens when there's a fault? That is, a trap because of an instruction
doing something it shouldn't do - like dividing by zero - or even the most
legitimate and humdrum condition that has to be handled by software... say a
*page fault*?

After all, if one is going to have virtual memory, one shouldn't make it a trap
for the unwary?

The trouble is, of course, that whenever there _is_ a fault, what happens? Why, there's an *interrupt*. Now, one _can_ stack interrupts several deep, as a higher-priority interrupt can interrupt a lower-priority interrupt... but instruction traps, since they result from conditions _internal_ to what the computer is doing at the time, *cannot* be ignored.

So they are at the _highest_ priority. Normally, an interrupt service routine for interrupts at a given priority cannot be interrupted by another interrupt f the _same_ priority, only one of a _higher_ priority.

So this means that the part of the kernel that services traps has to be *permanently resident in memory* so that there won't *be* any page faults when it runs - and it has to be very carefully written, so that it won't divide by zero or do anything else like that.

A double fault means the operating system is a piece of crap, and Intel can't
save it from itself. At least it gets to put up a proper blue screen of death.

Except...

If some piece of user software has an error like a divide by zero, the trap
handler presumably needs to look at _where the error occurred_ in order to
provide even a core dump, let alone try to recover. And it might be that the
location of the instruction causing the trap is immediately adjacent in virtual
address space to instructions not yet physically resident in RAM.

However, trap handlers *are* expected to deal with this.

Basically, one _returns from the trap_ but not to the trapped, failed routine -
but to a part of the operating system that may not be part of the kernel
(likely it will be, as it may still need privileged access to memory, though it
might be able to get by with just access to the failed task)... but since one
is no longer within the *interrupt service routine* when doing the core dump or
whatever, if a page fault _is_ hit, it will now be a single fault, not double,
let alone triple.

It isn't just Intel; pretty much any CPU maker will consider itself entitled to
assume that OS developers get at least interrupt service handlers *right*.
Particularly for traps, since these are, after all, part of the OS.

Interrupt service handlers supplied by peripheral manufacturers... at least
they're not fault handlers, so errors there are a trifle less catastrophic.

John Savard

paul wallich

unread,

Apr 13, 2016, 4:45:11 PM4/13/16

to

On 4/13/16 2:57 PM, Quadibloc wrote:
[...]

> Except...
>
> If some piece of user software has an error like a divide by zero, the trap
> handler presumably needs to look at _where the error occurred_ in order to
> provide even a core dump, let alone try to recover. And it might be that the
> location of the instruction causing the trap is immediately adjacent in virtual
> address space to instructions not yet physically resident in RAM.
>
> However, trap handlers *are* expected to deal with this.
>
> Basically, one _returns from the trap_ but not to the trapped, failed routine -
> but to a part of the operating system that may not be part of the kernel
> (likely it will be, as it may still need privileged access to memory, though it
> might be able to get by with just access to the failed task)... but since one
> is no longer within the *interrupt service routine* when doing the core dump or
> whatever, if a page fault _is_ hit, it will now be a single fault, not double,
> let alone triple.

Which means that if you get a triple fault something has gone so
horribly awry that the justification about looking at where the original
fault occurred and/or doing some kind of dump under the control of the
OS as it currently stands is no longer valid. You don't know that any of
the information the machine thinks it has about what's going on is
valid. And attempting to unwind far enough to do something useful may
produce even more garbage.

Rick C. Hodgin

unread,

Apr 13, 2016, 5:04:18 PM4/13/16

to

I do not advocate unwinding in a triple fault state. I only advocate
branching to a known, fixed location in memory, one which has been setup
and protected from corruption by hardware protocol, to begin executing
that code.

That secure code would contain whatever the OS or hypervisor needs are
for logging error information in a triple fault condition, which may
include doing nothing except simply displaying a blue screen, or even
just simply issuing an ISA instruction sequence which forces the reboot
when already in a triple fault state.

However, the more useful reason for having this feature would be to
include code which force-reinitialize the environment and some piece
of hardware (such as a network card), contacts a known remote IP address,
and sends a full memory dump to that machine with whatever last recorded
debugging info was made by the kernel (should the kernel be compiled to be
in debug mode recording extra information about its services and activities,
so as to try to track down this bug).

I think LibSF 386-x40 will provide this facility. It's a minimal amount
of effort, memory is cheap, an isolated MB or so would not hurt anything
for the minimal drivers, and it would provide utility in cases where
there is a triple fault state, which, today, happens so quickly it's just
a head scratcher as to what just happened. It shouldn't be like that.

Users should be able to see something about their crashed machine, and
possibly even be given the option to restart it, or do the memory dump.
It's only natural in my opinion.

I can see why they didn't do it in the 1980s. Limited design tools.
Memory was expensive. Several factors. But in the 2010s, things are a
little different, and machines are and should be far more capable, and
give far more utility to their users, and developers.

-----
FWIW, I believe that Intel's vPro has the ability to do something similar
to this at the chipset level, rather than at the CPU level, though in
order for vPro to work it requires a vPro-enabled network card, chipset,
and CPU, potentially providing for much more "behind closed doors"
activity than what I'm in pursuit of.

Quadibloc

unread,

Apr 13, 2016, 5:33:37 PM4/13/16

to

On Wednesday, April 13, 2016 at 3:04:18 PM UTC-6, Rick C. Hodgin wrote:

> Users should be able to see something about their crashed machine, and
> possibly even be given the option to restart it, or do the memory dump.
> It's only natural in my opinion.

Well, they do, for every other kind of crash.

This particular kind of crash can only happen if the _trap_ service routine
causes another trap. This is a small, critical piece of operating system code.
It can be written so that it won't do that.

So a *double fault* condition should be purely hypothetical, never occurring in
practice. Of course, a hardware failure could cause the appearance of a
software bug.

But despite this, Intel does make provision for the double fault condition,
which is already beyond what would be useful in any normal situation. If one is
writing an operating system, it makes it possible to debug the trap service
routine... if one can't get it correct the first time.

Thus, a double-fault is fully provided for, despite the fact that it should
_never_ occur in normal computer operation, even though all kinds of other
crashes and interrupts and faults certainly do. The triple-fault case is not
included since there is no rational provision that could be made for it in an
operating system.

In the case of virtualization, the double-fault, at least, makes some sense,
since one might use virtualization to run an operating system one is debugging
or that one does not trust.

John Savard

Robert Wessel

unread,

Apr 13, 2016, 6:31:45 PM4/13/16

to

On Wed, 13 Apr 2016 11:57:30 -0700 (PDT), Quadibloc
<jsa...@ecn.ab.ca> wrote:

>On Tuesday, April 12, 2016 at 1:32:40 PM UTC-6, Rick C. Hodgin wrote:
>
>> No. Double-faults are recoverable, and could be brought about by a
>> faulty driver only.
>
>This is true; having looked this up on Wikipedia, I see that a double fault
>occurs if a fault takes place _during an interrupt service routine_.
>
>So we're not talking about a hardware failure here. And software, as we all
>know, has bugs. Even really stupid, horrible, bugs. So why should there be any
>limit - let alone a very strict one - on what action the processor can take in
>response to a software bug?

No, that's simply wrong.

Somewhat simplified, you can (sometimes) get a double fault between
the time the exception is raised and the CPU gets ready to start
fetching the first instruction of the handler.

You can *only* get a double fault from before the point in time where
the first instruction in the handler is executed. IOW, and exception
in code in an interrupt handler does not generate* a double fault.

Not even all exceptions during the processing of an exception (and
before the execution of the first instruction in the handler) generate
a double fault. Consider the table at the beginning of the Intel x86
reference for double faults, where they first classify exceptions into
three categories (benign, contributor and page fault), and then have a
3x3 table which describes what happens if a second exception happens
during the first (for example, a page fault during a divide by zero
exception just results in a page fault, dot a double fault).

Anyway, here's what Intel has to say about it:

"A program-state following a double-fault exception is undefined. The
program or task cannot be resumed or restarted. The only available
action of the double-fault exception handler is to collect all
possible context information for use in diagnostics and then close the
application and/or shut down or reset the processor."

*With the obvious exception is that if the code in the ISR generates
an exception that itself triggers a double fault, because it's not set
up properly, but that's really not on point.

Robert Wessel

unread,

Apr 14, 2016, 2:33:31 AM4/14/16

to

On Tue, 12 Apr 2016 12:32:38 -0700 (PDT), "Rick C. Hodgin"

Not normally - unless you're letting the DD play around in the setup
for an interrupt/exception handler, and it screws those up (or screws
one of those up by overlaying storage).

Quadibloc

unread,

Apr 14, 2016, 3:24:59 AM4/14/16

to

On Wednesday, April 13, 2016 at 4:31:45 PM UTC-6, robert...@yahoo.com wrote:

> No, that's simply wrong.
>
> Somewhat simplified, you can (sometimes) get a double fault between
> the time the exception is raised and the CPU gets ready to start
> fetching the first instruction of the handler.
>
> You can *only* get a double fault from before the point in time where
> the first instruction in the handler is executed. IOW, and exception
> in code in an interrupt handler does not generate* a double fault.

Ah, then it's even worse than I thought, and even clearer why Intel would not
provide for the case of a triple fault.

John Savard

Rick C. Hodgin

unread,

Apr 14, 2016, 8:22:12 AM4/14/16

to

The double-fault is recoverable. If it's possible to do so, you can
correct the code issue, run the appropriate IRETD instructions, and it
will resume right where it left off. In the alternative, if it's possible
to do so, you can terminate the task and continue on with the rest of the
system, restarting whatever needs restarted, etc.

In some cases the double fault is not recoverable. But the CPU provides
mechanisms to capture it. And, FWIW, in my kernel development, there were
several few times when I would be debugging something and my debugger code
was buggy and it would signal a double fault on its own. Had I had my
debugger sufficiently developed at that point, I could've made the
appropriate changes to my code and continued on.

-----
In short: When the double fault occurs, the processor does trap to the OS
or debugger, allowing an examination of the system, and the opportunity for
recovery.

When a triple fault occurs, it does nothing except enter its shutdown mode,
which ubiquitously means a total reboot.

My argument is that it should not do this. Triple faults should not occur.
And when they do, it's for some reason that was not anticipated. That's
why an examination of the system should be possible through the triple-fault-
level features I propose be added to the CPU, which allow for not just a
hardware shutdown, but rather a controlled software shutdown of the hardware
when/if it reaches that state.

EricP

unread,

Apr 14, 2016, 11:35:25 AM4/14/16

to

Rick C. Hodgin wrote:
> On Thursday, April 14, 2016 at 2:33:31 AM UTC-4, robert...@yahoo.com wrote:
>> On Tue, 12 Apr 2016 12:32:38 -0700 (PDT), "Rick C. Hodgin"
>> <rick.c...@gmail.com> wrote:
>>
>>> On Tuesday, April 12, 2016 at 2:28:54 PM UTC-4, robert...@yahoo.com wrote:
>>>> On Tue, 12 Apr 2016 08:18:18 -0700 (PDT), "Rick C. Hodgin"
>>>> <rick.c...@gmail.com> wrote:
>>>>
>>> No. Double-faults are recoverable, and could be brought about by a
>>> faulty driver only. The rest of the system wouldn't need to go down,
>>> just restart the driver.
>> Not normally - unless you're letting the DD play around in the setup
>> for an interrupt/exception handler, and it screws those up (or screws
>> one of those up by overlaying storage).
>
> The double-fault is recoverable. If it's possible to do so, you can
> correct the code issue, run the appropriate IRETD instructions, and it
> will resume right where it left off. In the alternative, if it's possible
> to do so, you can terminate the task and continue on with the rest of the
> system, restarting whatever needs restarted, etc.
>
> In some cases the double fault is not recoverable. But the CPU provides
> mechanisms to capture it. And, FWIW, in my kernel development, there were
> several few times when I would be debugging something and my debugger code
> was buggy and it would signal a double fault on its own. Had I had my
> debugger sufficiently developed at that point, I could've made the
> appropriate changes to my code and continued on.

Not according to the Intel documentation.
You may have gotten away with that, but that was random chance.

A double fault is an instruction *abort*, triggered when there is
serious corruption of protected kernel memory data structures
such that a prior exception or interrupt could not be delivered.

The are 5 classes of error, within which are specific triggers,
as well as the original interrupt vector or exception error code.
All that information is lost on a double fault.
The stack-saved CS and EIP registers are undefined,
the program-state following a double-fault exception is undefined.

For example, if an external interrupt triggers a segment-not-present
exception, you don't know the original interrupt address,
which segment was violated, what the violation was,
the pre-interrupt CS and EIP, or the general registers.

The only thing a double fault should do is invoke crash dump save
and kick to the kernel debugger.

Eric

Rick C. Hodgin

unread,

Apr 14, 2016, 11:52:13 AM4/14/16

to

That is one type of double fault. There are others which signal a
double fault, but are not caused by that type of condition.

> The are 5 classes of error, within which are specific triggers,
> as well as the original interrupt vector or exception error code.
> All that information is lost on a double fault.
> The stack-saved CS and EIP registers are undefined,
> the program-state following a double-fault exception is undefined.
>
> For example, if an external interrupt triggers a segment-not-present
> exception, you don't know the original interrupt address,
> which segment was violated, what the violation was,
> the pre-interrupt CS and EIP, or the general registers.
>
> The only thing a double fault should do is invoke crash dump save
> and kick to the kernel debugger.

In my kernel I can set it up to give you an example. It will show the
address which caused the double-fault, and it allows for recovery.

It is possible that under the condition where during the course of a
particular fault, in preparation for the fault handler, a second fault
occurs which signals the double fault, that that information is lost,
though I would argue in that case that none of it is lost because the
double-fault in that case came as the result of the same condition
which would've triggered the first fault.

In any event, I remember the first time I saw the "double fault
breakpoint at [address]" and I thought, "What?" I then trace through
my code and saw where I was and why it had happened. It was a fault
triggered by a bug in my fault handler, and it was recoverable. I
was even able to continue debugging that bit of code afterward. And,
it's happened maybe a dozen other times.

My speaking is from experience. And I'm happy to compile my kernel
which has been purposefully broken in some code bits for the purpose of
demonstrating what I'm talking about.

-----
My argument is: At the point of the CPU shutdown, everything's a total
loss. So why not retain the ability to do something which MAY help in
tracking down the bug. That is all.

EricP

unread,

Apr 14, 2016, 12:17:05 PM4/14/16

to

Rick C. Hodgin wrote:
>
> In any event, I remember the first time I saw the "double fault
> breakpoint at [address]" and I thought, "What?" I then trace through
> my code and saw where I was and why it had happened. It was a fault
> triggered by a bug in my fault handler, and it was recoverable. I
> was even able to continue debugging that bit of code afterward. And,
> it's happened maybe a dozen other times.
>
> My speaking is from experience. And I'm happy to compile my kernel
> which has been purposefully broken in some code bits for the purpose of
> demonstrating what I'm talking about.

Well, the program state is explicitly undefined after a double fault.
Now undefined can include your processor model choosing not to
totally barf all over itself in your particular situation.
However that is just one outcome within "undefined"'s
potential state space of possibilities.

Eric

Rick C. Hodgin

unread,

Apr 14, 2016, 12:22:28 PM4/14/16

to

I just looked it up in the 80386 manual, and it says very simply that
it is an ABORT and that you cannot know the address of the exception.

In my experience, I assume that means that there may be cases where it
is known, but in other cases it is unknown and you cannot know it due to
the nature of whatever it was that triggered the second fault, and it may
have been that in my cases I simply was lucky in that the way it happened
allowed it to push the values onto the stack, which my debugger was able
to intercept, show, and allow me to respond to with ongoing mouse and
keyboard events.

Alright. I'll accept that I'm wrong about the nature of the double fault.
And, I'll further submit myself for disciplinary action, and am willing to
make a public apology.

Rick C. Hodgin

unread,

Apr 14, 2016, 12:27:59 PM4/14/16

to

I think that's what it was in my case. Because of the nature of my
double fault scenario, it was able to store the address and was in a
recoverable state. But in other cases, it would not be.

I'll have to consider this in designing LibSF 386-x40's fault handlers.
I think it may behoove me to introduce a triple fault handler which is
in an Integrity Selector area of memory which will always be present,
and have the CPU internally save the addresses attempted by the interrupts,
which can then all be recovered by the triple fault handler. In fact,
I may set it up so that a triple fault is used for that purpose, and
then any subsequent fault introduces a quadruple fault which would signal
the CPU shutdown.

There has to be a way for modern hardware to keep enough information about
the cause of the problem so the system either (1) doesn't have to crash,
or (2) if it does have to crash, can do so politely knowing everything
about what just caused it to crash, even if this (2) option is only enabled
by a special flag setting which may introduce a performance slowdown, but
would be useful in helping to track down bugs.

It would be nice if logic compilers could automatically generate a sim
machine which doesn't tie back to VHDL or Verilog, but rather creates a
virtual runtime that is fast and efficient, and fully emulates the
underlying hardware logic in a rapidly executable and generic way. If
it could generate C source code, for example... :-)

wolfgang kern

unread,

Apr 14, 2016, 3:43:53 PM4/14/16

to

Rick C. Hodgin wrote:
...

>> Well, the program state is explicitly undefined after a double fault.
>> Now undefined can include your processor model choosing not to
>> totally barf all over itself in your particular situation.
>> However that is just one outcome within "undefined"'s
>> potential state space of possibilities.

> I think that's what it was in my case. Because of the nature of my
> double fault scenario, it was able to store the address and was in a
> recoverable state. But in other cases, it would not be.

A double fault is nothing else then the only way to terminate,
cleanup and restart the OS rather than any faulting application.

So there is no need for a triple fault handler because it wont/cant
do more than the double fault handler could/should do.

and Intel/AMD do well with enter shutdown when the double fault
handler invoke exceptions of any kind.
__
wolfgang

Rick C. Hodgin

unread,

Apr 15, 2016, 1:43:16 PM4/15/16

to

On Thursday, April 14, 2016 at 3:43:53 PM UTC-4, wolfgang kern wrote:
> Rick C. Hodgin wrote:
> ...
> >> Well, the program state is explicitly undefined after a double fault.
> >> Now undefined can include your processor model choosing not to
> >> totally barf all over itself in your particular situation.
> >> However that is just one outcome within "undefined"'s
> >> potential state space of possibilities.
>
> > I think that's what it was in my case. Because of the nature of my
> > double fault scenario, it was able to store the address and was in a
> > recoverable state. But in other cases, it would not be.
>
> A double fault is nothing else then the only way to terminate,
> cleanup and restart the OS rather than any faulting application.

That's not entirely true. It is possible in a number of cases that the
system is still recoverable when a double-fault occurs. It's just that
it's not ALWAYS that way, so because it's an unknown, the best thing for
a production system to do is assume the worst and politely shutdown.

However, in my case the double-faults arose from something that was
recoverable, and that was my experience with double-faults in practice.
The rest was just theory from reading about them.

> So there is no need for a triple fault handler because it wont/cant
> do more than the double fault handler could/should do.

On LibSF 386-x40 it will, as the double-fault handler will also provide
a mechanism to obtain the guaranteed information about where it was
when both the first fault, and the second fault occurred, allowing for
a complete restart.

I will probably also setup the triple fault handler to do the same,
and to become an N-fault handler up to some maximum number of nestings,
at which case it will then signal the quadruple fault.

> and Intel/AMD do well with enter shutdown when the double fault
> handler invoke exceptions of any kind.

I agree. At the time they created the architecture, resources were
limited, recovery was unlikely, and there were many more developers
who had a true, solid knowledge of low-level development who would
likely be able to very quickly resolve the issue in their mind before
seeing any code or data.

Whereas that's still true in a more limited fashion, and certainly in
a far lower percentage-wise ratio, the tools have also matured in very
big ways. As such, the capabilities of the machine should be there to
help out the developer, should it be desirable to do so.

I have considered also, for example, for LibSF 386-x40 to create a
quadruple fault protocol that signals all of its internal state to a port
repeatedly, with the ability to request a memory dump also by the
assertion of an external signal, making both production and debug
motherboards as products I would produce.

Message has been deleted

Rick C. Hodgin

unread,

Jun 30, 2018, 8:58:56 AM6/30/18

to

On 6/29/2018 11:11 PM, c_gr...@yahoo.com wrote:

> On Monday, April 11, 2016 at 10:25:44 PM UTC-5, Rick C. Hodgin wrote:
>> Why didn't the Intel designers reserve some location in memory for
>> a triple fault handler? One which would allow the system memory to
>> be dumped to disk or output device if the condition arose?
>>
>> Why just reboot? Seems improper.

Old thread, but I provide my thoughts for clarification.

Your response version 1:

> Basically, if a computer manages to crash while crashing while
> crashing (that's basically what a triple fault is), it's safe to
> assume at that point that it's never gonna end.

Your response version 2:

> Honestly, if a machine manages to crash while crashing while crashing,
> it's probably safe to assume that the cycle's gonna continue as long
> as the system's still running.

Note: Both of these comments were deleted on Google Groups, but they
remain on Usenet.

-----
The purpose of the "final fault" handler would be to acknowledge that
totally corrupted state, to assume the the machine is in a completely
invalid state, but to still be able to off-load information about the
machine so it can be stored for post-mortum diagnosis and debugging.

How many people have never known what the cause of their triple fault
was and had to sit there for hours single-stepping through code to
find which line triple faults it? I've done that in my OS kernel.

Having a reserved area of memory protected by the OS so that once it's
setup by the kernel at startup, a lock instruction is issued which then
completely seals and isolates that block of code from all subsequent
writes, allowing it to always be known to be in the original state,
allowing the crashed system to begin off-loading data about the machine,
do a memory dump, etc., to a serial port, or a dedicated debugging
network card and fixed IP address reserved for this purpose which lets
a remote machine capture the information for real-time analysis before
the machine reboots. It would allow remote interrogation, examination,
interaction, etc.

Intel later added this ability, by the way, with their vPro extensions.
vPro operates about the hypervisor layer so they can remote into a
completely crashed and unresponsive machine and obtain information from
it, reboot, etc.

It would be a great debugging tool to have.

My Arxoda CPU will have a dedicated final fault handler like this.
And in my personal opinion, to not have one is not a good design.
It leaves the state of buggy software at least one step away from
being able to be reasonably debugged. It places an undue burden
on kernel and device driver authors. It removes the ability to be
able to diagnose the machine state when it is crashed.

--
Rick C. Hodgin

MitchAlsup

unread,

Jun 30, 2018, 12:39:29 PM6/30/18

to

You also want that memory protected FROM the OS.

Rick C. Hodgin

unread,

Jun 30, 2018, 1:40:35 PM6/30/18

to

On 6/30/2018 12:39 PM, MitchAlsup wrote:
> On Saturday, June 30, 2018 at 7:58:56 AM UTC-5, Rick C. Hodgin wrote:
>> Having a reserved area of memory protected by the OS so that once it's
>
> You also want that memory protected FROM the OS.

That's what I meant above by this part:

>> setup by the kernel at startup, a lock instruction is issued which then
>> completely seals and isolates that block of code from all subsequent
>> writes, allowing it to always be known to be in the original state,
>> allowing the crashed system to begin off-loading data about the machine,
>> do a memory dump, etc., to a serial port, or a dedicated debugging
>> network card and fixed IP address reserved for this purpose which lets
>> a remote machine capture the information for real-time analysis before
>> the machine reboots. It would allow remote interrogation, examination,
>> interaction, etc.

It reads in part:

...setup by the kernel at startup, a lock instruction is issued

which then completely seals and isolates that block of code from
all subsequent writes, allowing it to always be known to be in

the original state, allowing the crashed system to...

Once the kernel would issue that lock instruction, it cannot be unlocked
without a reset / reboot. The protocol would be:

reboot
boostrap loader
kernel sets up final fault recovery program
lock final fault memory block
system runs

If it ever gets to a triple fault state, it would call the final fault
code and begin running the program there. That code could be something
simple like begin sending data out a serial port, or write to dedicated
NVRAM for debug recovery, or whatever else would be required for the
developer's needs.

--
Rick C. Hodgin

MitchAlsup

unread,

Jul 1, 2018, 3:04:43 PM7/1/18

to

On Saturday, June 30, 2018 at 12:40:35 PM UTC-5, Rick C. Hodgin wrote:
> On 6/30/2018 12:39 PM, MitchAlsup wrote:
> > On Saturday, June 30, 2018 at 7:58:56 AM UTC-5, Rick C. Hodgin wrote:
> >> Having a reserved area of memory protected by the OS so that once it's
> >
> > You also want that memory protected FROM the OS.
>
> That's what I meant above by this part:
>
> >> setup by the kernel at startup, a lock instruction is issued which then
> >> completely seals and isolates that block of code from all subsequent
> >> writes, allowing it to always be known to be in the original state,
> >> allowing the crashed system to begin off-loading data about the machine,
> >> do a memory dump, etc., to a serial port, or a dedicated debugging
> >> network card and fixed IP address reserved for this purpose which lets
> >> a remote machine capture the information for real-time analysis before
> >> the machine reboots. It would allow remote interrogation, examination,
> >> interaction, etc.
>
> It reads in part:
>
> ...setup by the kernel at startup, a lock instruction is issued
> which then completely seals and isolates that block of code from
> all subsequent writes, allowing it to always be known to be in
> the original state, allowing the crashed system to...

You would be better off if there were no translation to that page, the page is not on any free-list, and the triple fault handler had to put the translation
into the MMU tables or access it directly in physical memory with the MMU off.

Malicious code will not be looking for an open lock, but will simply write to
the page if they can find it.

Rick C. Hodgin

unread,

Jul 1, 2018, 3:42:57 PM7/1/18

to

On Sunday, July 1, 2018 at 3:04:43 PM UTC-4, MitchAlsup wrote:
> On Saturday, June 30, 2018 at 12:40:35 PM UTC-5, Rick C. Hodgin wrote:
> > It reads in part:
> >
> > ...setup by the kernel at startup, a lock instruction is issued
> > which then completely seals and isolates that block of code from
> > all subsequent writes, allowing it to always be known to be in
> > the original state, allowing the crashed system to...
>
> You would be better off if there were no translation to that page, the page
> is not on any free-list, and the triple fault handler had to put the translation
> into the MMU tables or access it directly in physical memory with the MMU off.
>
> Malicious code will not be looking for an open lock, but will simply write to
> the page if they can find it.

I'm talking about a block that exists in the memory controller, so that if
any write attempt is issued in the range of the locked memory area it doesn't
honor the write request, but silently ignores it. This would allow the CPU
to enforce an initial setup phase, and then a lockdown phase which persists
until reset, which will never be used unless there is a triple fault.

It would not be in any tables within the CPU, but only in a new finalFault
register used to load the physical read offset for when the finalFault
condition initiates.

In addition, once the finalFault state is reached, the lock is lifted and
read/write access is restored, allowing for interrogation of the system,
and even the possibility of recovery were the system to be debugged by a
remote debugger able to implement changes to the environment, to issue a
type of "ffiret" instruction, which would do a "finalFault interrupt return."

The mechanics of implementation can be adapted or adjusted as needed, but
in concept I think the idea is sound. And it is something I intend to
incorporate into my own CPU.

--
Rick C. Hodgin

MitchAlsup

unread,

Jul 1, 2018, 3:47:17 PM7/1/18

to

On Sunday, July 1, 2018 at 2:42:57 PM UTC-5, Rick C. Hodgin wrote:
> On Sunday, July 1, 2018 at 3:04:43 PM UTC-4, MitchAlsup wrote:
> > On Saturday, June 30, 2018 at 12:40:35 PM UTC-5, Rick C. Hodgin wrote:
> > > It reads in part:
> > >
> > > ...setup by the kernel at startup, a lock instruction is issued
> > > which then completely seals and isolates that block of code from
> > > all subsequent writes, allowing it to always be known to be in
> > > the original state, allowing the crashed system to...
> >
> > You would be better off if there were no translation to that page, the page
> > is not on any free-list, and the triple fault handler had to put the translation
> > into the MMU tables or access it directly in physical memory with the MMU off.
> >
> > Malicious code will not be looking for an open lock, but will simply write to
> > the page if they can find it.
>
> I'm talking about a block that exists in the memory controller, so that if
> any write attempt is issued in the range of the locked memory area it doesn't
> honor the write request, but silently ignores it. This would allow the CPU
> to enforce an initial setup phase, and then a lockdown phase which persists
> until reset, which will never be used unless there is a triple fault.

I do something like this in my new ISA design.

Certain virtual pages are accessed by hardware on behalf of software yet under
control of the MMU tables. When HW accesses such a page it checks that none of the Readable, Writable, nor Executable bits have been set in the PTE. If they have been set, a OS_check is raised at the Hypervisor level.

Rick C. Hodgin

unread,

Jul 1, 2018, 3:57:59 PM7/1/18

to

On Sunday, July 1, 2018 at 3:47:17 PM UTC-4, MitchAlsup wrote:
> On Sunday, July 1, 2018 at 2:42:57 PM UTC-5, Rick C. Hodgin wrote:
> > I'm talking about a block that exists in the memory controller, so that if
> > any write attempt is issued in the range of the locked memory area it doesn't
> > honor the write request, but silently ignores it. This would allow the CPU
> > to enforce an initial setup phase, and then a lockdown phase which persists
> > until reset, which will never be used unless there is a triple fault.
>
> I do something like this in my new ISA design.
>
> Certain virtual pages are accessed by hardware on behalf of software yet under
> control of the MMU tables. When HW accesses such a page it checks that none of
> the Readable, Writable, nor Executable bits have been set in the PTE. If they
> have been set, a OS_check is raised at the Hypervisor level.

We would make a good team. You have the ability to teach me what I'd need
to know to implement my ideas in hardware, and I could help you implement
yours. I have high software skills and could help you with simulations or
analysis, etc.

If you ever wanted to step into the role of venture capitalist and fund a
research project for a couple years ... I'm in. My productivity is severely
hampered by me having a regular full-time job unrelated to these endeavors.
I'm relegated to using my after-hours time, evenings, weekends, sick /
vacation days, holidays, etc.

To work full-time on hardware design would be a dream come true. I would
be willing to sign up for a two-year project with an extension option to be
implemented based on our inevitable success. :-)

--
Rick C. Hodgin

Quadibloc

unread,

Jul 1, 2018, 10:20:06 PM7/1/18

to

On Saturday, June 30, 2018 at 10:39:29 AM UTC-6, MitchAlsup wrote:

> You also want that memory protected FROM the OS.

Actually, I think this contains an important clue.

Normally, when people are using their computers, they're trying to get useful work
out of applications that run under the OS. If something is completely toasted,
there's nothing more to be done with it.

Sometimes, though, they are doing debugging - and even debugging routines that
have to run in kernel mode. Having the computer provide features to assist with
that *all the time* might create security holes.

But having an alternate (or virtualized) mode of operation that facilitates that
sort of work would still be a good thing.

John Savard

Stefan Monnier

unread,

Jul 3, 2018, 4:11:02 PM7/3/18

to

> How many people have never known what the cause of their triple fault
> was and had to sit there for hours single-stepping through code to
> find which line triple faults it?

Very few. And nowadays, I'd expect most of those who might be affected
will use VMs for easier debugging.

Stefan

Rick C. Hodgin

unread,

Jul 3, 2018, 4:17:08 PM7/3/18

to

I have spent days. :-) That was mostly in the '90s and early '00s.
Today I would use a VM, but even then you're still left to single-
stepping at times.

A VM with the final fault handler would be a great benefit.

--
Rick C. Hodgin

wolfgang kern

unread,

Jul 3, 2018, 6:13:51 PM7/3/18

to

Stefan Monnier replied to a Rick post:

We (started by Rick) already had this discussion a while back.

I repeat:
what do you think is the x86 double fault exception good for ?
if this will raise another exception we are lost anyway,
so AMD/Intel decided somehow wise to Reset the whole PC then.
__
wolfgang
(several warnings ignored, Rick will remain in my killfile forever).

George Neuner

unread,

Jul 3, 2018, 8:42:29 PM7/3/18

to

Problem with most VM software is that it replicates (virtualizes) the
exact processor you have. Not so useful if you want to futz with 386
code on an i7.

An ISA emulator like Bochs or Unicorn is a reasonable choice ... if it
supports the chip you want in the mode you need. There are a number
of emulators available, but many have *complete* support only for user
mode instructions.

YMMV,
George

MitchAlsup

unread,

Jul 3, 2018, 9:35:58 PM7/3/18

to

On Tuesday, July 3, 2018 at 7:42:29 PM UTC-5, George Neuner wrote:
> On Tue, 03 Jul 2018 16:11:00 -0400, Stefan Monnier
> <mon...@iro.umontreal.ca> wrote:
>
> >> How many people have never known what the cause of their triple fault
> >> was and had to sit there for hours single-stepping through code to
> >> find which line triple faults it?
> >
> >Very few. And nowadays, I'd expect most of those who might be affected
> >will use VMs for easier debugging.
>
> Problem with most VM software is that it replicates (virtualizes) the
> exact processor you have. Not so useful if you want to futz with 386
> code on an i7.

I had a proposal at AMD to have a processor contain a version register.
The lowest version supported in my proposal was 486 at the time Opteron
was on Rev G:: HT2 and DDR3. It decorated the decoder with a table indexed
by the version, and the table contained Unimplemented flag bits.

One of the more difficult things, here, was that after CPUID was imple-
mented; CPUID was data dependent based on version.

......never went anywhere........but was easy to implement.........

Rick C. Hodgin

unread,

Jul 4, 2018, 7:30:04 AM7/4/18

to

On 7/3/2018 8:42 PM, George Neuner wrote:
> An ISA emulator like Bochs or Unicorn is a reasonable choice ... if it
> supports the chip you want in the mode you need. There are a number
> of emulators available, but many have *complete* support only for user
> mode instructions.

The biggest issues I've found with software-based ISAs is their lack
of general chipset support. For custom kernel development it's usually
okay. For trying to debug other software, I believe they often have a
2 GB memory limit, which is limiting in what can be installed and run.

ON the whole, I have found Bochs to be very desirable. And I've added
extensions to version 2.3.5 to support off-loading of the 0xb0000 memory
segment for monochrome memory to a remote monitor, allowing for kernel
debugging using a true and accurate hardware model (Hercules monochrome
graphics). I also modified the ISA to allow a particular INT + param
settings to query remote hardware, like the keyboard and mouse. It made
it easier to debug on a virtualized environment. I could never get the
hardware timer to be cycle-accurate, however. As a result, the timing
on things that rely on a fixed timer do not work properly.

As technology has advanced, I've planned to move away from MDA and into
the modern realm of full-screen full-color GUIs. When I resume develop-
ment on my kernel, I will use some of the tools I'm developing now (Lord
willing, James 4:15), and it will speed up development notably.

I see Bochs as a very desirable choice for emulation since it's
completely extensible. I also plan to create a custom emulator for my
Arxoda CPU. I will probably complete that before I have actual hard-
ware working in an FPGA.

--
Rick C. Hodgin

George Neuner

unread,

Jul 5, 2018, 1:45:49 AM7/5/18

to

On Wed, 4 Jul 2018 07:30:01 -0400, "Rick C. Hodgin"
<rick.c...@gmail.com> wrote:

>On 7/3/2018 8:42 PM, George Neuner wrote:
>> An ISA emulator like Bochs or Unicorn is a reasonable choice ... if it
>> supports the chip you want in the mode you need. There are a number
>> of emulators available, but many have *complete* support only for user
>> mode instructions.
>
>The biggest issues I've found with software-based ISAs is their lack
>of general chipset support. For custom kernel development it's usually
>okay. For trying to debug other software, I believe they often have a
>2 GB memory limit, which is limiting in what can be installed and run.
>
>ON the whole, I have found Bochs to be very desirable. And I've added
>extensions to version 2.3.5 to support off-loading of the 0xb0000 memory
>segment for monochrome memory to a remote monitor, allowing for kernel
>debugging using a true and accurate hardware model (Hercules monochrome
>graphics). I also modified the ISA to allow a particular INT + param
>settings to query remote hardware, like the keyboard and mouse. It made
>it easier to debug on a virtualized environment. I could never get the
>hardware timer to be cycle-accurate, however. As a result, the timing
>on things that rely on a fixed timer do not work properly.

For cycle accurate timing you need a *simulator* - not an *emulator*.
[Aside: it seems that many people really don't understand the
difference: there are many packages available which quite obviously
are "emulators", but which mistakenly are being called "simulators".]

I am not aware of freeware / open source software simulators for any
halfway modern CPUs [I do know of a couple for 8-bitters]. Part of
the problem is that the chip makers don't publish cycle timing any
longer. There are commercial simulators, of course, but they tend to
be expen$ive.

It's probably cheaper to just get a development board with jtag and
the chip you want. Which probably is what everyone does and why there
are no low cost chip simulators [he says naively, being a software
person with limited hardware design experience].

>As technology has advanced, I've planned to move away from MDA and into
>the modern realm of full-screen full-color GUIs.

Geez, I hope so. I haven't seen an MDA display in over 30 years.

>I see Bochs as a very desirable choice for emulation since it's
>completely extensible. I also plan to create a custom emulator for my
>Arxoda CPU. I will probably complete that before I have actual hard-
>ware working in an FPGA.

Bochs is good, but by design it is limited to x86. If you need
something more flexible, you might want to take a look at Unicorn.
[Not having seen the internals] It's supposed to be extremely modular
and easy to extend.

George

Bruce Hoult

unread,

Jul 5, 2018, 3:49:05 AM7/5/18

to

Verilator works pretty well, assuming you have the RTL of course. We use it a lot before we tape-out.

I guess Intel RTL is super secret. ARM give real RTL to (certain types of) their licencees to do their own integration and place&route with. We give working but limited RTL to people evaluating our cores .. they can run it in verilator, or do their own PPA calculations.

> It's probably cheaper to just get a development board with jtag and
> the chip you want. Which probably is what everyone does and why there
> are no low cost chip simulators [he says naively, being a software
> person with limited hardware design experience].

If the chip already exists then that's by far the easiest and cheapest way.

> >I see Bochs as a very desirable choice for emulation since it's
> >completely extensible. I also plan to create a custom emulator for my
> >Arxoda CPU. I will probably complete that before I have actual hard-
> >ware working in an FPGA.
>
> Bochs is good, but by design it is limited to x86. If you need
> something more flexible, you might want to take a look at Unicorn.
> [Not having seen the internals] It's supposed to be extremely modular
> and easy to extend.

Unicorn is purely a CPU emulator. It's just that part stripped out of QEMU.

We use QEMU, in both of its modes:

1) user mode CPU emulation, loading emulated-ISA Linux ELF binaries and passing system calls on to the host OS. This lets you transparently run ARM or MIPS or RISC-V or whatever binaries from the command line or makefiles etc on an x86 Linux machine. This is great for building software that doesn't play well with cross-compilation. Especially when combined with a chroot containing a guest ISA root file system.

2) full system emulation, including not only the user and privileged CPU modes but also MMU and block and character devices, networking etc. You can use generic peripherals, or completely emulate a board such as a Raspberry Pi or our Hifive1 (300 MHz 32 bit microcontroller) or Hifive Unleashed (1.5 GHz penta-core 64 bit Linux). In this case you can boot and run a completely unmodified disk image.

Rick C. Hodgin

unread,

Jul 5, 2018, 8:32:23 AM7/5/18

to

Correct. The Perfect6502 simulator is a logic simulator for the
chip. It runs a very small C program to simulate hardware logic:

https://www.youtube.com/watch?v=fWqBmmPQP40&t=29m53s

This is the goal I have for my Logician tool, but in a visual manner
using something like Blender's nodes for logic unit hookup:

Begins at 6:26:
www.youtube.com/watch?v=aaJhmYA5q4c&6m26s

I want to be able to create logical units like those nodes, and then
hook them up using those noodles. I also want to be able to hide the
noodles so the screen isn't messy, but each unit fades away unless
it's highlighted or pinned.

> I am not aware of freeware / open source software simulators for any
> halfway modern CPUs [I do know of a couple for 8-bitters]. Part of
> the problem is that the chip makers don't publish cycle timing any
> longer. There are commercial simulators, of course, but they tend to
> be expen$ive.
>
> It's probably cheaper to just get a development board with jtag and
> the chip you want. Which probably is what everyone does and why there
> are no low cost chip simulators [he says naively, being a software
> person with limited hardware design experience].
>
> >As technology has advanced, I've planned to move away from MDA and into
> >the modern realm of full-screen full-color GUIs.
>
> Geez, I hope so. I haven't seen an MDA display in over 30 years.

The last daily/regular kernel development I did was back in the
early 2000s, with the bulk of it being between 1998 and 2002. Here's
the dual VGA/MDA environment I had. I miss my little green monitor
there for real debugging. It had that slow refresh when it changed
from one display to another that had a nice effect. My dyslexic
brain actually liked that transition as I could see things changing
better:

http://www.visual-freepro.org/videos/2014_02_13__exodus_debi_debugger.ogv

MDAs were used for kernel level debugging because they operate in a
RAM and register/port space that's completely isolated from VGA and
later extensions. It allows debugging on the physical machine without
altering the state of the VGA or corrupting anything. It could be
done in text mode, or graphics mode. I used the MDA in graphics mode
and it gave 720x348 pixels, using an 8x6 font resulted in 90 x 43.5
characters, which was quite a bit for console-like window back then.
I added a flashing cursor and a mouse pointer, click, drag-and-drop,
etc. It was a good system. The source code for it is in MASM 6.x
assembly here:

http://www.libsf.org:8990/projects/LIB/repos/libsf/browse/exodus/source/vga

You could also use multiple video cards back then, but they were ex-
pensive. Nowadays it's much easier to use remote network debugging
or a single card that support multiple outputs and dedicate a monitor
to debugging. For my Arxoda CPU and kernel designs, I intend to use
remote debugging when I have it running on physical hardware. In my
Logician tool, I will simulate a network card and allow remote debug-
ging even there (James 4:15 "Lord willing").

> >I see Bochs as a very desirable choice for emulation since it's
> >completely extensible. I also plan to create a custom emulator for my
> >Arxoda CPU. I will probably complete that before I have actual hard-
> >ware working in an FPGA.
>
> Bochs is good, but by design it is limited to x86. If you need
> something more flexible, you might want to take a look at Unicorn.
> [Not having seen the internals] It's supposed to be extremely modular
> and easy to extend.

That's been okay for me. The vast majority of my development has
been on x86, and specifically on 80386-80686 targets. The rest has
been on ARM.

--
Rick C. Hodgin

George Neuner

unread,

Jul 5, 2018, 5:46:26 PM7/5/18

to

On Thu, 5 Jul 2018 00:49:03 -0700 (PDT), Bruce Hoult
<bruce...@gmail.com> wrote:

>Unicorn is purely a CPU emulator. It's just that part stripped out of QEMU.

According to Unicorn:

QEMU cannot emulate a chunk of raw binary code without any context:
it requires either a proper executable binary (for example, a file in
ELF format), or a whole system image with a full OS inside.
Meanwhile, Unicorn just focuses on CPU operations, and can emulate
raw code without context

I have not tried to use QEMU for bare metal work, so I don't know if
that is correct. But the ability to run bare code is very desireable
for people trying to write systems.

Bochs can do it, but only for x86 architectures.

George

Bruce Hoult

unread,

Jul 5, 2018, 7:10:50 PM7/5/18

to

You can't do "bare metal" work with just a CPU emulation. You need a way to get data in and out of it -- that is, some form of I/O. That's what qemu does and we run "bare metal" programs on qemu all the time. For example programs for our HiFive1.

A CPU emulator and nothing else (well, it's got to have some form of RAM emulation too) might be useful for testing for regressions in compiler code generation using tiny code snippets if you set it up with initial RAM and register contents, run a handful of emulated instructions, and then compare the resulting RAM and register contents against those expected.

I can't think what else it would be useful for .. certainly not for developing real software that runs on the hardware.