[PATCH v1 0/2] Better fpu save/restore

20 views
Skip to first unread message

Avi Kivity

unread,
May 27, 2014, 5:12:42 AM5/27/14
to osv...@googlegroups.com
These two patches fix areas where we need to save the fpu. Uncovered by
sse memcpy.

Avi Kivity (2):
fpu: early save/restore in interrupt/exception context
elf: allow elf_resolve_pltgot() to use the fpu

arch/aarch64/exceptions.cc | 2 ++
arch/aarch64/mmu.cc | 5 ++---
arch/x64/arch-cpu.hh | 7 +++++++
arch/x64/elf-dl.S | 17 ++++++++++++-----
arch/x64/exceptions.cc | 4 ++++
arch/x64/mmu.cc | 5 ++---
core/elf.cc | 6 +++++-
core/sched.cc | 19 -------------------
include/osv/sched.hh | 1 -
9 files changed, 34 insertions(+), 32 deletions(-)

--
1.9.3

Avi Kivity

unread,
May 27, 2014, 5:12:44 AM5/27/14
to osv...@googlegroups.com
Since we cannot guarantee that the fpu will not be used in interrupts and
exceptions, we must save it earlier rather than later.

This was discovered with an fpu-based memcpy, but can be triggered in other
ways.

Signed-off-by: Avi Kivity <a...@cloudius-systems.com>
---
arch/aarch64/exceptions.cc | 2 ++
arch/aarch64/mmu.cc | 5 ++---
arch/x64/arch-cpu.hh | 7 +++++++
arch/x64/exceptions.cc | 4 ++++
arch/x64/mmu.cc | 5 ++---
core/sched.cc | 19 -------------------
include/osv/sched.hh | 1 -
7 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/arch/aarch64/exceptions.cc b/arch/aarch64/exceptions.cc
index 2b7e2d0..c92f39e 100644
--- a/arch/aarch64/exceptions.cc
+++ b/arch/aarch64/exceptions.cc
@@ -82,6 +82,8 @@ extern "C" { void interrupt(exception_frame* frame); }

void interrupt(exception_frame* frame)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
/* remember frame in a global, need to change if going to nested */
current_interrupt_frame = frame;

diff --git a/arch/aarch64/mmu.cc b/arch/aarch64/mmu.cc
index a54928b..2887eb2 100644
--- a/arch/aarch64/mmu.cc
+++ b/arch/aarch64/mmu.cc
@@ -16,6 +16,8 @@

void page_fault(exception_frame *ef)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
debug_early_entry("page_fault");
u64 addr;
asm volatile ("mrs %0, far_el1" : "=r"(addr));
@@ -39,10 +41,7 @@ void page_fault(exception_frame *ef)
assert(!(ef->spsr & processor::daif_i));

DROP_LOCK(irq_lock) {
- sched::arch_fpu fpu;
- fpu.save();
mmu::vm_fault(addr, ef);
- fpu.restore();
}

debug_early("leaving page_fault()\n");
diff --git a/arch/x64/arch-cpu.hh b/arch/x64/arch-cpu.hh
index 6040063..5580666 100644
--- a/arch/x64/arch-cpu.hh
+++ b/arch/x64/arch-cpu.hh
@@ -87,6 +87,13 @@ struct fpu_state_inplace {
typedef save_fpu<fpu_state_alloc_page> arch_fpu;
typedef save_fpu<fpu_state_inplace> inplace_arch_fpu;

+// lock adapter for inplace_arch_fpu
+class fpu_lock {
+ inplace_arch_fpu _state;
+public:
+ void lock() { _state.save(); }
+ void unlock() { _state.restore(); }
+};

inline arch_cpu::arch_cpu()
: gdt{0, 0x00af9b000000ffff, 0x00cf93000000ffff, 0x00cf9b000000ffff,
diff --git a/arch/x64/exceptions.cc b/arch/x64/exceptions.cc
index 501e3de..06bd8db 100644
--- a/arch/x64/exceptions.cc
+++ b/arch/x64/exceptions.cc
@@ -209,6 +209,8 @@ extern "C" { void interrupt(exception_frame* frame); }

void interrupt(exception_frame* frame)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
// Rather that force the exception frame down the call stack,
// remember it in a global here. This works because our interrupts
// don't nest.
@@ -255,6 +257,8 @@ extern "C" void nmi(exception_frame* ef)
extern "C"
void general_protection(exception_frame* ef)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
if (fixup_fault(ef)) {
return;
}
diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc
index cb34d84..38301b3 100644
--- a/arch/x64/mmu.cc
+++ b/arch/x64/mmu.cc
@@ -17,6 +17,8 @@

void page_fault(exception_frame *ef)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
sched::exception_guard g;
auto addr = processor::read_cr2();
if (fixup_fault(ef)) {
@@ -33,10 +35,7 @@ void page_fault(exception_frame *ef)

// And since we may sleep, make sure interrupts are enabled.
DROP_LOCK(irq_lock) { // irq_lock is acquired by HW
- sched::inplace_arch_fpu fpu;
- fpu.save();
mmu::vm_fault(addr, ef);
- fpu.restore();
}
}

diff --git a/core/sched.cc b/core/sched.cc
index 4126bda..0abf292 100644
--- a/core/sched.cc
+++ b/core/sched.cc
@@ -216,10 +216,6 @@ static constexpr bool scheduler_uses_fpu = true;

void cpu::reschedule_from_interrupt(bool preempt)
{
- if (scheduler_uses_fpu && preempt) {
- thread::current()->_fpu.save();
- }
-
assert(sched::exception_depth <= 1);
need_reschedule = false;
handle_incoming_wakeups();
@@ -246,9 +242,6 @@ void cpu::reschedule_from_interrupt(bool preempt)
// lowest runtime, and update the timer until the next thread's turn.
if (runqueue.empty()) {
preemption_timer.cancel();
- if (scheduler_uses_fpu && preempt) {
- p->_fpu.restore();
- }
return;
} else {
auto &t = *runqueue.begin();
@@ -258,9 +251,6 @@ void cpu::reschedule_from_interrupt(bool preempt)
if (delta > 0) {
preemption_timer.set(now + delta);
}
- if (scheduler_uses_fpu && preempt) {
- p->_fpu.restore();
- }
return;
}
}
@@ -287,11 +277,6 @@ void cpu::reschedule_from_interrupt(bool preempt)

if (preempt) {
trace_sched_preempt();
- if (!scheduler_uses_fpu) {
- // If runtime is not a float, we only need to save the FPU here,
- // just when deciding to switch threads.
- p->_fpu.save();
- }
}
if (p->_detached_state->st.load(std::memory_order_relaxed) == thread::status::queued
&& p != idle_thread) {
@@ -310,10 +295,6 @@ void cpu::reschedule_from_interrupt(bool preempt)
p->_detached_state->_cpu->terminating_thread->destroy();
p->_detached_state->_cpu->terminating_thread = nullptr;
}
-
- if (preempt) {
- p->_fpu.restore();
- }
}

void cpu::timer_fired()
diff --git a/include/osv/sched.hh b/include/osv/sched.hh
index 05b2b72..e1c851a 100644
--- a/include/osv/sched.hh
+++ b/include/osv/sched.hh
@@ -562,7 +562,6 @@ private:
attr _attr;
int _migration_lock_counter;
arch_thread _arch;
- arch_fpu _fpu;
unsigned int _id;
std::atomic<bool> _interrupted;
std::function<void ()> _cleanup;
--
1.9.3

Avi Kivity

unread,
May 27, 2014, 5:12:45 AM5/27/14
to osv...@googlegroups.com
elf_resolve_pltgot() can be called from anywhere, including with dirty fpu
state. Ensure the fpu is not clobbered during its execution.

Signed-off-by: Avi Kivity <a...@cloudius-systems.com>
---
arch/x64/elf-dl.S | 17 ++++++++++++-----
core/elf.cc | 6 +++++-
2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x64/elf-dl.S b/arch/x64/elf-dl.S
index ebb8050..15d5f23 100644
--- a/arch/x64/elf-dl.S
+++ b/arch/x64/elf-dl.S
@@ -20,13 +20,20 @@ __elf_resolve_pltgot:
.cfi_rel_offset %rip, 16
sub $8, %rsp # make room for resolved address
.cfi_adjust_cfa_offset 8
+ pushq_cfi %rbp
+ .cfi_adjust_cfa_offset 8
+ mov %rsp, %rbp
+ .cfi_def_cfa %rbp, 16
+
+ # align stack
+ andq $-16, %rsp
+
pushq_cfi %rax
pushq_cfi %rbx
pushq_cfi %rcx
pushq_cfi %rdx
pushq_cfi %rsi
pushq_cfi %rdi
- pushq_cfi %rbp
pushq_cfi %r8
pushq_cfi %r9
pushq_cfi %r10
@@ -35,10 +42,10 @@ __elf_resolve_pltgot:
pushq_cfi %r13
pushq_cfi %r14
pushq_cfi %r15
- mov 16*8+8(%rsp), %edi
- mov 16*8+0(%rsp), %rsi
+ mov 24(%rbp), %edi
+ mov 16(%rbp), %rsi
call elf_resolve_pltgot
- mov %rax, 15*8(%rsp)
+ mov %rax, 8(%rbp)
popq_cfi %r15
popq_cfi %r14
popq_cfi %r13
@@ -47,12 +54,12 @@ __elf_resolve_pltgot:
popq_cfi %r10
popq_cfi %r9
popq_cfi %r8
- popq_cfi %rbp
popq_cfi %rdi
popq_cfi %rsi
popq_cfi %rdx
popq_cfi %rcx
popq_cfi %rbx
popq_cfi %rax
+ leave
ret $16
.cfi_endproc
diff --git a/core/elf.cc b/core/elf.cc
index b4bd225..a822c4f 100644
--- a/core/elf.cc
+++ b/core/elf.cc
@@ -1311,7 +1311,11 @@ extern "C" { void* elf_resolve_pltgot(unsigned long index, elf::object* obj); }

void* elf_resolve_pltgot(unsigned long index, elf::object* obj)
{
- return obj->resolve_pltgot(index);
+ // symbol resolution of a variable can happen with a dirty fpu
+ sched::fpu_lock fpu;
+ WITH_LOCK(fpu) {
+ return obj->resolve_pltgot(index);
+ }
}

struct module_and_offset {
--
1.9.3

Glauber Costa

unread,
May 27, 2014, 5:27:35 AM5/27/14
to Avi Kivity, Osv Dev
Won't this make some paths - like the fixups - more expensive?

Avi, sorry, but given all this, it is no longer clear for me if the SSE part of evil memcpy would bring us
that much value. No to mention this is messing with the fpu for any memcpy size, whereas the sse problem exists
only for a short range. We have seen, I guess, that a single branch instruction is not *that* expensive for memcpy, so
maybe it is better to branch at memcpy site and only use sse if it is safe?



--
1.9.3

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Avi Kivity

unread,
May 27, 2014, 7:23:22 AM5/27/14
to Glauber Costa, Osv Dev
Fixups are rare and not performance sensitive.  So while this makes them more expensive, we don't care.

Sleeping faults will save and restore the fpu anyway, so they aren't changed by this.

Non-sleeping faults (say a COW or demand-zero) will be made more expensive by about 300 cycles (100-150ns).  That is comparable with a 4K memcpy so it will suffer, but the fault itself will still dominate.  Also COW/demand zero should be rare.

Interrupts that cause a context switch will see no change.  Interrupts that don't will see an extra 300 cycles.  Given that interrupts are very expensive in a virtualized environment, again this should be negligible.


Avi, sorry, but given all this, it is no longer clear for me if the SSE part of evil memcpy would bring us
that much value. No to mention this is messing with the fpu for any memcpy size, whereas the sse problem exists
only for a short range. We have seen, I guess, that a single branch instruction is not *that* expensive for memcpy, so
maybe it is better to branch at memcpy site and only use sse if it is safe?

The problem is not limited to memcpy.  We are making liberal use of library code, and so far we've been lucky that none of it touched the fpu, but there is no guarantee of that.


Commit Bot

unread,
May 28, 2014, 3:55:26 AM5/28/14
to osv...@googlegroups.com
From: Avi Kivity <a...@cloudius-systems.com>

fpu: early save/restore in interrupt/exception context

Since we cannot guarantee that the fpu will not be used in interrupts and
exceptions, we must save it earlier rather than later.

This was discovered with an fpu-based memcpy, but can be triggered in other
ways.

Signed-off-by: Avi Kivity <a...@cloudius-systems.com>

---
diff --git a/arch/aarch64/exceptions.cc b/arch/aarch64/exceptions.cc
--- a/arch/aarch64/exceptions.cc
+++ b/arch/aarch64/exceptions.cc
@@ -82,6 +82,8 @@ extern "C" { void interrupt(exception_frame* frame); }

void interrupt(exception_frame* frame)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
/* remember frame in a global, need to change if going to nested */
current_interrupt_frame = frame;

diff --git a/arch/aarch64/mmu.cc b/arch/aarch64/mmu.cc
--- a/arch/aarch64/mmu.cc
+++ b/arch/aarch64/mmu.cc
@@ -16,6 +16,8 @@

void page_fault(exception_frame *ef)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
debug_early_entry("page_fault");
u64 addr;
asm volatile ("mrs %0, far_el1" : "=r"(addr));
@@ -39,10 +41,7 @@ void page_fault(exception_frame *ef)
assert(!(ef->spsr & processor::daif_i));

DROP_LOCK(irq_lock) {
- sched::arch_fpu fpu;
- fpu.save();
mmu::vm_fault(addr, ef);
- fpu.restore();
}

debug_early("leaving page_fault()\n");
diff --git a/arch/x64/arch-cpu.hh b/arch/x64/arch-cpu.hh
--- a/arch/x64/arch-cpu.hh
+++ b/arch/x64/arch-cpu.hh
@@ -87,6 +87,13 @@ struct fpu_state_inplace {
typedef save_fpu<fpu_state_alloc_page> arch_fpu;
typedef save_fpu<fpu_state_inplace> inplace_arch_fpu;

+// lock adapter for inplace_arch_fpu
+class fpu_lock {
+ inplace_arch_fpu _state;
+public:
+ void lock() { _state.save(); }
+ void unlock() { _state.restore(); }
+};

inline arch_cpu::arch_cpu()
: gdt{0, 0x00af9b000000ffff, 0x00cf93000000ffff, 0x00cf9b000000ffff,
diff --git a/arch/x64/exceptions.cc b/arch/x64/exceptions.cc
--- a/arch/x64/exceptions.cc
+++ b/arch/x64/exceptions.cc
@@ -209,6 +209,8 @@ extern "C" { void interrupt(exception_frame* frame); }

void interrupt(exception_frame* frame)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
// Rather that force the exception frame down the call stack,
// remember it in a global here. This works because our interrupts
// don't nest.
@@ -255,6 +257,8 @@ extern "C" void nmi(exception_frame* ef)
extern "C"
void general_protection(exception_frame* ef)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
if (fixup_fault(ef)) {
return;
}
diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc
--- a/arch/x64/mmu.cc
+++ b/arch/x64/mmu.cc
@@ -17,6 +17,8 @@

void page_fault(exception_frame *ef)
{
+ sched::fpu_lock fpu;
+ SCOPE_LOCK(fpu);
sched::exception_guard g;
auto addr = processor::read_cr2();
if (fixup_fault(ef)) {
@@ -33,10 +35,7 @@ void page_fault(exception_frame *ef)

// And since we may sleep, make sure interrupts are enabled.
DROP_LOCK(irq_lock) { // irq_lock is acquired by HW
- sched::inplace_arch_fpu fpu;
- fpu.save();
mmu::vm_fault(addr, ef);
- fpu.restore();
}
}

diff --git a/core/sched.cc b/core/sched.cc

Commit Bot

unread,
May 28, 2014, 3:55:26 AM5/28/14
to osv...@googlegroups.com
From: Avi Kivity <a...@cloudius-systems.com>

elf: allow elf_resolve_pltgot() to use the fpu

elf_resolve_pltgot() can be called from anywhere, including with dirty fpu
state. Ensure the fpu is not clobbered during its execution.

Signed-off-by: Avi Kivity <a...@cloudius-systems.com>

---
diff --git a/arch/x64/elf-dl.S b/arch/x64/elf-dl.S
--- a/core/elf.cc
+++ b/core/elf.cc
@@ -1311,7 +1311,11 @@ extern "C" { void* elf_resolve_pltgot(unsigned long
index, elf::object* obj); }

void* elf_resolve_pltgot(unsigned long index, elf::object* obj)
{
- return obj->resolve_pltgot(index);
+ // symbol resolution of a variable can happen with a dirty fpu
+ sched::fpu_lock fpu;

Pekka Enberg

unread,
May 28, 2014, 4:06:08 AM5/28/14
to Avi Kivity, Osv Dev
On Wed, May 28, 2014 at 10:55 AM, Commit Bot <b...@cloudius-systems.com> wrote:
> From: Avi Kivity <a...@cloudius-systems.com>
>
> fpu: early save/restore in interrupt/exception context
>
> Since we cannot guarantee that the fpu will not be used in interrupts and
> exceptions, we must save it earlier rather than later.
>
> This was discovered with an fpu-based memcpy, but can be triggered in other
> ways.
>
> Signed-off-by: Avi Kivity <a...@cloudius-systems.com>

Lets switch back to your evil memcpy?

Avi Kivity

unread,
May 28, 2014, 4:25:55 AM5/28/14
to Pekka Enberg, Osv Dev
Actually I committed this by accident. But yes.

Pekka Enberg

unread,
May 28, 2014, 4:33:58 AM5/28/14
to Avi Kivity, Osv Dev
On Wed, May 28, 2014 at 11:25 AM, Avi Kivity <a...@cloudius-systems.com> wrote:
>> Lets switch back to your evil memcpy?
>
> Actually I committed this by accident. But yes.

Will you be committing a revert of a revert by accident too?

Nadav Har'El

unread,
May 29, 2014, 3:55:39 AM5/29/14
to Avi Kivity, Osv Dev
On Wed, May 28, 2014 at 10:55 AM, Commit Bot <b...@cloudius-systems.com> wrote:
From: Avi Kivity <a...@cloudius-systems.com>

fpu: early save/restore in interrupt/exception context

Since we cannot guarantee that the fpu will not be used in interrupts and
exceptions, we must save it earlier rather than later.

This was discovered with an fpu-based memcpy, but can be triggered in other
ways.

Signed-off-by: Avi Kivity <a...@cloudius-systems.com>

---
diff --git a/arch/aarch64/exceptions.cc b/arch/aarch64/exceptions.cc
--- a/arch/aarch64/exceptions.cc
+++ b/arch/aarch64/exceptions.cc
@@ -82,6 +82,8 @@ extern "C" { void interrupt(exception_frame* frame); }

 void interrupt(exception_frame* frame)
 {
+    sched::fpu_lock fpu;
+    SCOPE_LOCK(fpu);

This uses a significant chunk of the per-thread interrupt stack, so maybe we need to grow it accordingly  - or not, if it was already too big...
The good news is that this won't take more memory from before, since this patch removes the per-thread fpu state.

 
I think now the whole "preempt" parameter has become redundant. Would you mind that I send a patch to remove it? Or do you think it might be useful again in the future to keep this separation?
 
--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Avi Kivity

unread,
May 29, 2014, 4:09:31 AM5/29/14
to Nadav Har'El, Osv Dev

On 05/29/2014 10:55 AM, Nadav Har'El wrote:
On Wed, May 28, 2014 at 10:55 AM, Commit Bot <b...@cloudius-systems.com> wrote:
From: Avi Kivity <a...@cloudius-systems.com>

fpu: early save/restore in interrupt/exception context

Since we cannot guarantee that the fpu will not be used in interrupts and
exceptions, we must save it earlier rather than later.

This was discovered with an fpu-based memcpy, but can be triggered in other
ways.

Signed-off-by: Avi Kivity <a...@cloudius-systems.com>

---
diff --git a/arch/aarch64/exceptions.cc b/arch/aarch64/exceptions.cc
--- a/arch/aarch64/exceptions.cc
+++ b/arch/aarch64/exceptions.cc
@@ -82,6 +82,8 @@ extern "C" { void interrupt(exception_frame* frame); }

 void interrupt(exception_frame* frame)
 {
+    sched::fpu_lock fpu;
+    SCOPE_LOCK(fpu);

This uses a significant chunk of the per-thread interrupt stack, so maybe we need to grow it accordingly  - or not, if it was already too big...
The good news is that this won't take more memory from before, since this patch removes the per-thread fpu state.

We'll actually have to move it right back. xsave state (for xsave/xsaveopt/xrestore/xsavec/xsaves/xrestores) size is determined dynamically and wants 64-byte alignment.
Please do, I hate parameters.

Nadav Har'El

unread,
Jun 1, 2014, 10:15:46 AM6/1/14
to Avi Kivity, Osv Dev
On Wed, May 28, 2014 at 10:55 AM, Commit Bot <b...@cloudius-systems.com> wrote:
From: Avi Kivity <a...@cloudius-systems.com>

elf: allow elf_resolve_pltgot() to use the fpu

elf_resolve_pltgot() can be called from anywhere, including with dirty fpu
state.  Ensure the fpu is not clobbered during its execution.

Signed-off-by: Avi Kivity <a...@cloudius-systems.com>

---
diff --git a/arch/x64/elf-dl.S b/arch/x64/elf-dl.S
--- a/arch/x64/elf-dl.S
+++ b/arch/x64/elf-dl.S
@@ -20,13 +20,20 @@ __elf_resolve_pltgot:
        .cfi_rel_offset %rip, 16
        sub $8, %rsp  # make room for resolved address
        .cfi_adjust_cfa_offset 8
+       pushq_cfi %rbp
+       .cfi_adjust_cfa_offset 8
+       mov %rsp, %rbp
+       .cfi_def_cfa %rbp, 16
+
+       # align stack
+       andq $-16, %rsp
+


Hi Avi, is it possible that after this patch, something is wrong with the dwarf debugging data here? I won't pretend I remember anything from this CFI stuff (once I understood it pretty well, but I no longer do), but the "andq" without a matching cfi thing is most suspicious (?). Here is an example stack trace I'm seeing from inside __elf_resolve_pltgot:

#0  elf::object::resolve_pltgot (this=<optimized out>, index=<optimized out>)
    at /home/nyh/osv/core/elf.cc:574
#1  0x0000000000339050 in elf_resolve_pltgot (index=<optimized out>,
    obj=<optimized out>) at /home/nyh/osv/core/elf.cc:1356
#2  0x0000000000370e82 in __elf_resolve_pltgot ()
    at /home/nyh/osv/arch/x64/elf-dl.S:47
#3  0x0000000000000010 in ?? ()
#4  0x000000000033b12d in elf::program::get_library (this=<optimized out>,
    name=..., extra_path=...) at /home/nyh/osv/core/elf.cc:999

Frame #3 appears crap, and I'm missing a bunch of other frames (in this case I was expecting to see a call to some static constructor, which calls one function, which then calls the function that needs to be resolved so ends up calling __elf_resolve_pltgot().

By the way, I am curious - why is special stack-alignment code actually needed in __elf_resolve_pltgot? I would have thought that this function only gets called when the caller is prepared to call a real function, so the caller would already have ensured that regular ABI guarantees are followed before making this call. Isn't it so?

 
--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Avi Kivity

unread,
Jun 1, 2014, 10:22:24 AM6/1/14
to Nadav Har'El, Osv Dev

On 06/01/2014 05:15 PM, Nadav Har'El wrote:

On Wed, May 28, 2014 at 10:55 AM, Commit Bot <b...@cloudius-systems.com> wrote:
From: Avi Kivity <a...@cloudius-systems.com>

elf: allow elf_resolve_pltgot() to use the fpu

elf_resolve_pltgot() can be called from anywhere, including with dirty fpu
state.  Ensure the fpu is not clobbered during its execution.

Signed-off-by: Avi Kivity <a...@cloudius-systems.com>

---
diff --git a/arch/x64/elf-dl.S b/arch/x64/elf-dl.S
--- a/arch/x64/elf-dl.S
+++ b/arch/x64/elf-dl.S
@@ -20,13 +20,20 @@ __elf_resolve_pltgot:
        .cfi_rel_offset %rip, 16
        sub $8, %rsp  # make room for resolved address
        .cfi_adjust_cfa_offset 8
+       pushq_cfi %rbp
+       .cfi_adjust_cfa_offset 8
+       mov %rsp, %rbp
+       .cfi_def_cfa %rbp, 16
+
+       # align stack
+       andq $-16, %rsp
+


Hi Avi, is it possible that after this patch, something is wrong with the dwarf debugging data here?

It's very possible.


I won't pretend I remember anything from this CFI stuff (once I understood it pretty well, but I no longer do), but the "andq" without a matching cfi thing is most suspicious (?). Here is an example stack trace I'm seeing from inside __elf_resolve_pltgot:


We switch cfa to %rbp, which doesn't change on andq %rsp.  But it does mean we need to s/pushq_cfi/pushq/, for the same reason.


#0  elf::object::resolve_pltgot (this=<optimized out>, index=<optimized out>)
    at /home/nyh/osv/core/elf.cc:574
#1  0x0000000000339050 in elf_resolve_pltgot (index=<optimized out>,
    obj=<optimized out>) at /home/nyh/osv/core/elf.cc:1356
#2  0x0000000000370e82 in __elf_resolve_pltgot ()
    at /home/nyh/osv/arch/x64/elf-dl.S:47
#3  0x0000000000000010 in ?? ()
#4  0x000000000033b12d in elf::program::get_library (this=<optimized out>,
    name=..., extra_path=...) at /home/nyh/osv/core/elf.cc:999


Well that's clearly broken.


Frame #3 appears crap, and I'm missing a bunch of other frames (in this case I was expecting to see a call to some static constructor, which calls one function, which then calls the function that needs to be resolved so ends up calling __elf_resolve_pltgot().

By the way, I am curious - why is special stack-alignment code actually needed in __elf_resolve_pltgot? I would have thought that this function only gets called when the caller is prepared to call a real function, so the caller would already have ensured that regular ABI guarantees are followed before making this call. Isn't it so?

Interesting, I didn't think of that at all.  However it's better to be extra safe here and assume nothing.


 
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nadav Har'El

unread,
Jun 1, 2014, 10:30:58 AM6/1/14
to Avi Kivity, Osv Dev
On Sun, Jun 1, 2014 at 5:22 PM, Avi Kivity <a...@cloudius-systems.com> wrote:

By the way, I am curious - why is special stack-alignment code actually needed in __elf_resolve_pltgot? I would have thought that this function only gets called when the caller is prepared to call a real function, so the caller would already have ensured that regular ABI guarantees are followed before making this call. Isn't it so?

Interesting, I didn't think of that at all.  However it's better to be extra safe here and assume nothing.

My question also applies to the whole FPU-saving thing too - if the caller was about to call a function, it would also save its own FPU state, and we don't need to save it in the callee...

Did you see a problem before this patch? I wonder what it could be. Can __elf_resolve_pltgot ever be used for something other than function calls? Is it ever used to resolve variable names or something? (I didn't think so, but maybe?)

If this whole patch is not necessary I'm not sure the "extra safe" is warranted here, especially if it creates weird unwinding bugs (although those obviously can be solved).

Avi Kivity

unread,
Jun 1, 2014, 10:39:31 AM6/1/14
to Nadav Har'El, Osv Dev

On 06/01/2014 05:30 PM, Nadav Har'El wrote:
On Sun, Jun 1, 2014 at 5:22 PM, Avi Kivity <a...@cloudius-systems.com> wrote:

By the way, I am curious - why is special stack-alignment code actually needed in __elf_resolve_pltgot? I would have thought that this function only gets called when the caller is prepared to call a real function, so the caller would already have ensured that regular ABI guarantees are followed before making this call. Isn't it so?

Interesting, I didn't think of that at all.  However it's better to be extra safe here and assume nothing.

My question also applies to the whole FPU-saving thing too - if the caller was about to call a function, it would also save its own FPU state, and we don't need to save it in the callee...


No.  If it's calling a function with floating point arguments, then these would be loaded into the fpu registers just prior to calling that function.


Did you see a problem before this patch?

Yes, very early crashes with fpu-using memcpy.


I wonder what it could be. Can __elf_resolve_pltgot ever be used for something other than function calls? Is it ever used to resolve variable names or something? (I didn't think so, but maybe?)


No.


If this whole patch is not necessary I'm not sure the "extra safe" is warranted here, especially if it creates weird unwinding bugs (although those obviously can be solved).


I'll take a stab at fixing it.

Raphael S. Carvalho

unread,
Jun 1, 2014, 10:44:01 AM6/1/14
to Avi Kivity, Nadav Har'El, Osv Dev
On Sun, Jun 1, 2014 at 11:39 AM, Avi Kivity <a...@cloudius-systems.com> wrote:

On 06/01/2014 05:30 PM, Nadav Har'El wrote:
On Sun, Jun 1, 2014 at 5:22 PM, Avi Kivity <a...@cloudius-systems.com> wrote:

By the way, I am curious - why is special stack-alignment code actually needed in __elf_resolve_pltgot? I would have thought that this function only gets called when the caller is prepared to call a real function, so the caller would already have ensured that regular ABI guarantees are followed before making this call. Isn't it so?

Interesting, I didn't think of that at all.  However it's better to be extra safe here and assume nothing.

My question also applies to the whole FPU-saving thing too - if the caller was about to call a function, it would also save its own FPU state, and we don't need to save it in the callee...


No.  If it's calling a function with floating point arguments, then these would be loaded into the fpu registers just prior to calling that function.
I think Nadav means the FPU registers respective to the current function being saved prior to calling a new function (to allow reentrancy).


Did you see a problem before this patch?

Yes, very early crashes with fpu-using memcpy.


I wonder what it could be. Can __elf_resolve_pltgot ever be used for something other than function calls? Is it ever used to resolve variable names or something? (I didn't think so, but maybe?)


No.


If this whole patch is not necessary I'm not sure the "extra safe" is warranted here, especially if it creates weird unwinding bugs (although those obviously can be solved).


I'll take a stab at fixing it.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Raphael S. Carvalho

Avi Kivity

unread,
Jun 1, 2014, 11:03:49 AM6/1/14
to Raphael S. Carvalho, Nadav Har'El, Osv Dev

On 06/01/2014 05:44 PM, Raphael S. Carvalho wrote:



On Sun, Jun 1, 2014 at 11:39 AM, Avi Kivity <a...@cloudius-systems.com> wrote:

On 06/01/2014 05:30 PM, Nadav Har'El wrote:
On Sun, Jun 1, 2014 at 5:22 PM, Avi Kivity <a...@cloudius-systems.com> wrote:

By the way, I am curious - why is special stack-alignment code actually needed in __elf_resolve_pltgot? I would have thought that this function only gets called when the caller is prepared to call a real function, so the caller would already have ensured that regular ABI guarantees are followed before making this call. Isn't it so?

Interesting, I didn't think of that at all.  However it's better to be extra safe here and assume nothing.

My question also applies to the whole FPU-saving thing too - if the caller was about to call a function, it would also save its own FPU state, and we don't need to save it in the callee...


No.  If it's calling a function with floating point arguments, then these would be loaded into the fpu registers just prior to calling that function.
I think Nadav means the FPU registers respective to the current function being saved prior to calling a new function (to allow reentrancy).


f(1.0) is translated as

  movsd mem, %xmm0
  callq f@plt

this in turn calls __elf_resolve_pltgot, so we enter it with a dirty fpu which must be restored for the eventual call into the real f.

Reply all
Reply to author
Forward
0 new messages