--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> Could you tell me about the future plan if possible?
>
We saw a lot of problems and shortcomings in the emulator recently, so
the plan is to improve its correctness. There is also a requirement to be
able single step emulated code. Having setjmp/longjmp will greatly simplify
the code. What are you interested in?
Does your plan also include making the emulator independent of KVM?
Could you tell me about the future plan if possible?
> handle some special cases correctly (code execution from ROM, ins/outs
I am mainly interested in clearly understanding the KVM x86 emulator.
In that sense, what I felt first was it's impossible to understand why
it is working without the deep(whole) knowledge of the KVM's architecture.
If emulator itself is self contained, it will be much help for me.
Though I do not think every instruction should be implemented, it would be
nice if each instruction emulated is independent of KVM: if we can check the
validity of them using only SDM, it would be really nice!
Thanks,
Takuya
>
> Though I do not think every instruction should be implemented, it would be
> nice if each instruction emulated is independent of KVM: if we can check the
> validity of them using only SDM, it would be really nice!
>
Agree. That is my goal too.
I'm all for radical ideas, but from a pragmatic point of view, you
shouldn't use longjmp in the kernel. Seriously bad things are happening
with it; it leaves local variables undefined, doesn't undo global state
changes.
So if you:
spin_lock(&s->lock);
if (!s->active)
longjmp(buf, -1);
... you are broken. This case can be made very much more complex and
hard to reason about by using local variables which are reset by the
longjmp.
Further, it requires use of the volatile keyword to interact properly
with logic involving more than one variable, and thus, by definition is
impossible to use in the kernel, which does not implement the volatile
keyword. :)
Instead, for this case, use the fact that there is an architecturally
designed finite number of exceptions that can be processed
simultaneously. This means if you queue exceptions to a pending list of
control-flow interrupting events to be processed, as long as the queue
is appropriately sized, you will never overflow this queue and never
require dynamic allocation. Further, you can then naturally follow the
exception priority rules at the top-level of the emulator and never need
to pass back complex exception structures, merely a simple return value
which indicates whether to return to top-level control logic or continue
with instruction emulation. I believe using this style of programming
will make your need for setjmp/longjmp go away.
Zach
> ... you are broken. This case can be made very much more complex
> and hard to reason about by using local variables which are reset by
> the longjmp.
>
> Further, it requires use of the volatile keyword to interact
> properly with logic involving more than one variable, and thus, by
> definition is impossible to use in the kernel, which does not
> implement the volatile keyword. :)
volatile is a language keyword how it can be not implemented by the
kernel? And why volatile is needed to implement longjmp?
>
> Instead, for this case, use the fact that there is an
> architecturally designed finite number of exceptions that can be
> processed simultaneously. This means if you queue exceptions to a
> pending list of control-flow interrupting events to be processed, as
> long as the queue is appropriately sized, you will never overflow
> this queue and never require dynamic allocation. Further, you can
> then naturally follow the exception priority rules at the top-level
> of the emulator and never need to pass back complex exception
> structures, merely a simple return value which indicates whether to
> return to top-level control logic or continue with instruction
> emulation. I believe using this style of programming will make your
> need for setjmp/longjmp go away.
>
Of course it is possible to use return values instead. This is what code
does currently and this is completely unrelated to exception queue
depth. Code will be much simpler if we will be able to bail out from the
depth of emulator immediately if exception condition is met or exit to
userspace is required instead of passing the condition up the call
chain.
--
Gleb.
Local variables which are not volatile are "undefined" after a longjmp.
Thus setjmp() return value is the only valid rvalue otherwise.
As I said, the kernel does not implement the volatile keyword :)
(i.e. its use is heavily discouraged to the point one can consider it
not implemented)
>> Instead, for this case, use the fact that there is an
>> architecturally designed finite number of exceptions that can be
>> processed simultaneously. This means if you queue exceptions to a
>> pending list of control-flow interrupting events to be processed, as
>> long as the queue is appropriately sized, you will never overflow
>> this queue and never require dynamic allocation. Further, you can
>> then naturally follow the exception priority rules at the top-level
>> of the emulator and never need to pass back complex exception
>> structures, merely a simple return value which indicates whether to
>> return to top-level control logic or continue with instruction
>> emulation. I believe using this style of programming will make your
>> need for setjmp/longjmp go away.
>>
>>
> Of course it is possible to use return values instead. This is what code
> does currently and this is completely unrelated to exception queue
> depth. Code will be much simpler if we will be able to bail out from the
> depth of emulator immediately if exception condition is met or exit to
> userspace is required instead of passing the condition up the call
> chain.
>
Anything that can generate exceptions is going to need logic to handle
error cases anyway... the depth can not be that bad. Especially if you
structure it so as to optimize for tail calling.
Zach
This avoids all concerns about local variables and should be cleaner,
faster and simpler to implement.
In practice return value from setjmp is all I need.
> As I said, the kernel does not implement the volatile keyword :)
> (i.e. its use is heavily discouraged to the point one can consider
> it not implemented)
>
> >>Instead, for this case, use the fact that there is an
> >>architecturally designed finite number of exceptions that can be
> >>processed simultaneously. This means if you queue exceptions to a
> >>pending list of control-flow interrupting events to be processed, as
> >>long as the queue is appropriately sized, you will never overflow
> >>this queue and never require dynamic allocation. Further, you can
> >>then naturally follow the exception priority rules at the top-level
> >>of the emulator and never need to pass back complex exception
> >>structures, merely a simple return value which indicates whether to
> >>return to top-level control logic or continue with instruction
> >>emulation. I believe using this style of programming will make your
> >>need for setjmp/longjmp go away.
> >>
> >Of course it is possible to use return values instead. This is what code
> >does currently and this is completely unrelated to exception queue
> >depth. Code will be much simpler if we will be able to bail out from the
> >depth of emulator immediately if exception condition is met or exit to
> >userspace is required instead of passing the condition up the call
> >chain.
>
> Anything that can generate exceptions is going to need logic to
> handle error cases anyway... the depth can not be that bad.
> Especially if you structure it so as to optimize for tail calling.
>
Tail call is not what usually happens. Usually emulation goes like this:
if (check some conditions) {
queue exception A
return exception queued
}
if (check other conditions) {
queue exception B
return exception queued
}
do some emulation
try to read guest memory
if (read failed) {
queue exception C
return exception queued
}
if (read needs exit to userspace for device emulation)
return please go out and retrieve me the data
continue emulation
try to write guest memory
if (write failed) {
queue exception C
return exception queued
}
if (write needs exit to userspace for device emulation)
return please go out and process the data
emulate some more.
return emulation done
--
Gleb.
It's going to be ugly to emulate segmentation, NX and write protect
support without hardware to do this checking for you, but it's just what
you have to do in this slow path - tedious, fully specified emulation.
Just because it's tedious doesn't mean we need to use setjmp / longjmp.
Throw / catch might be effective, but it's still pretty bizarre to do
tricks like that in C.
Zach
> Think about what happens if in the middle of
> instruction emulation some data from device emulated in userspace is
> needed. Emulator should be able to tell KVM that exit to userspace is
> needed and restart instruction emulation when data is available.
setjmp/longjmp are useful constructs in general but
IME are better suited for infrequent exceptions vs.
routine usage.
If the issue is finding some clean and regular way
to back out from (and possibly reeneter) logic
expressed within nested function invocations, have
you considered turning the problem inside out and
using a state machine approach?
--
john....@third-harmonic.com
Well, setjmp/longjmp really is not much more than exception handling in C.
-hpa
For what it's worth, I think that setjmp/longjmp is not anywhere near as
dangerous as people want to make it out to be. gcc will warn for
dangerous uses (and a lot of non-dangerous uses), but generally the
difficult problems can be dealt with by moving the setjmp-protected code
into a separate function.
I'd be curious to see if it would need to evolve it to preemptsetjmp /
irqlongjmp or some other more complex forms in time.
But I'd rather implement a new language where acquisition of resources
such as locks, dynamically allocated objects, and ref counts are
predicated in the function typing and are heavily encouraged to possess
defined inverses. Then the closure of a particular layer of nesting
already has enough information to provide release upon escape, and the
compiler can easily take the burden of checking for a large class of
lock and resource violation.
And it would have to be prettier than the current languages that do
that, meaning operator overloading would be banned. Although it would
define rational numbers, super-extended precision arithmetic, imaginary
numbers, quaternions and matrices as part of the spec, so there would be
no need to use arithmetic overrides anyway, and then all the nonsensical
operators could die, die, die, especially the function () and logical
operator overrides.
Zach
/me takes away Zach's caffeine.
-hpa
> If the issue is finding some clean and regular way
> to back out from (and possibly reeneter) logic
> expressed within nested function invocations, have
> you considered turning the problem inside out and
> using a state machine approach?
I don't see how state machine will help. But the goal
is not to rewrite emulator.c (this will no be excepted
by kvm maintainers), but improve it gradually.
--
Gleb.
> But I'd rather implement a new language where acquisition of
> resources such as locks, dynamically allocated objects, and ref
> counts are predicated in the function typing and are heavily
> encouraged to possess defined inverses. Then the closure of a
> particular layer of nesting already has enough information to
> provide release upon escape, and the compiler can easily take the
> burden of checking for a large class of lock and resource violation.
>
> And it would have to be prettier than the current languages that do
> that, meaning operator overloading would be banned. Although it
> would define rational numbers, super-extended precision arithmetic,
> imaginary numbers, quaternions and matrices as part of the spec, so
> there would be no need to use arithmetic overrides anyway, and then
> all the nonsensical operators could die, die, die, especially the
> function () and logical operator overrides.
>
Will you language have a lot of parentheses?
--
Gleb.
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index cfcb6f0..089a405 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -35,6 +35,45 @@
#include "x86.h"
#include "tss.h"
+typedef unsigned long jmp_buf[8];
+int setjmp(jmp_buf);
+void longjmp(jmp_buf, int);
+
+asm (
+" .align 4\n"
+" .type setjmp, @function\n"
+"setjmp:\n"
+" pop %rsi # Return address, and adjust the stack\n"
+" xorl %eax,%eax # Return value\n"
+" movq %rbx,(%rdi)\n"
+" movq %rsp,8(%rdi) # Post-return %rsp!\n"
+" push %rsi # Make the call/return stack happy\n"
+" movq %rbp,16(%rdi)\n"
+" movq %r12,24(%rdi)\n"
+" movq %r13,32(%rdi)\n"
+" movq %r14,40(%rdi)\n"
+" movq %r15,48(%rdi)\n"
+" movq %rsi,56(%rdi) # Return address\n"
+" ret\n"
+" .size setjmp,.-setjmp\n"
+
+" .align 4\n"
+" .type longjmp, @function\n"
+"longjmp:\n"
+" movl %esi,%eax # Return value (int)\n"
+" movq (%rdi),%rbx\n"
+" movq 8(%rdi),%rsp\n"
+" movq 16(%rdi),%rbp\n"
+" movq 24(%rdi),%r12\n"
+" movq 32(%rdi),%r13\n"
+" movq 40(%rdi),%r14\n"
+" movq 48(%rdi),%r15\n"
+" jmp *56(%rdi)\n"
+" .size longjmp,.-longjmp\n"
+ );
+
+static jmp_buf jb;
+
/*
* Opcode effective-address decode tables.
* Note that we only emulate instructions that have at least one memory
@@ -1729,7 +1768,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt,
c->dst.bytes,
ctxt->vcpu);
if (rc != X86EMUL_CONTINUE)
- return rc;
+ longjmp(jb, 1);
break;
case OP_NONE:
/* no writeback */
@@ -2391,6 +2430,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
saved_eip = c->eip;
+ if (setjmp(jb)) {
+ printk(KERN_ERR"setjump() == 1\n");
+ return 0;
+ }
+
if (ctxt->mode == X86EMUL_MODE_PROT64 && (c->d & No64)) {
kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
goto done;
--
Gleb.
Well, with mmio you'd expect it to happen every read access.
> Although setjmp/longjmp that I know about
> are routine usage. See QEMU TCG main loop or userspace
> thread libraries.
>
Agreed, nothing magical about it.
>> If the issue is finding some clean and regular way
>> to back out from (and possibly reeneter) logic
>> expressed within nested function invocations, have
>> you considered turning the problem inside out and
>> using a state machine approach?
>>
> I don't see how state machine will help. But the goal
> is not to rewrite emulator.c (this will no be excepted
> by kvm maintainers), but improve it gradually.
>
That is orthogonal. If we decide a state machine is the best
implementation, then we'll find a way to move over to that. However, I
don't think a state machine is a good representation considering some of
the code paths are very complicated and depend on a many memory accesses
(e.g. hardware task switches).
--
error compiling committee.c: too many arguments to function
The setjmp/longjmp implementation should definitely live in arch/*/lib,
even if kvm is the only user.
--
error compiling committee.c: too many arguments to function
--
Obviously.
-hpa
> On 03/02/2010 09:28 AM, Gleb Natapov wrote:
>> On Mon, Mar 01, 2010 at 02:13:32PM -0500, john cooper wrote:
>>
>>> Gleb Natapov wrote:
>>>
>>>
>>>> Think about what happens if in the middle of
>>>> instruction emulation some data from device emulated in userspace is
>>>> needed. Emulator should be able to tell KVM that exit to userspace is
>>>> needed and restart instruction emulation when data is available.
>>>>
>>> setjmp/longjmp are useful constructs in general but
>>> IME are better suited for infrequent exceptions vs.
>>> routine usage.
>>>
>> Exception condition during instruction emulation _is_
>> infrequent.
>
> Well, with mmio you'd expect it to happen every read access.
Of course if you are hitting that kind of case very often
you don't want to do the emulation in the kernel but
in userspace so you don't have to take the context switch
overhead and everything else.
I know running emulations in userspace was for dosemu
the difference between a 16 color ega emulation on X
that was unusable to one that was good enough to play video
games like wolfenstein and doom.
Eric
> I know running emulations in userspace was for dosemu
> the difference between a 16 color ega emulation on X
> that was unusable to one that was good enough to play video
> games like wolfenstein and doom.
>
> Eric
--
Gleb.