[RFC] Relax handling of missimg symbols when relocating on load

11 views
Skip to first unread message

Waldemar Kozaczuk

unread,
Aug 15, 2019, 11:35:22 PM8/15/19
to osv...@googlegroups.com, Waldemar Kozaczuk
The commit https://github.com/cloudius-systems/osv/commit/f5cc12d56dd986d2c7982c5738b1f859702b07fb
addressed the issue #993 to relax handling of missing symbols in relocate_pltgot()
when loading ELF objects with BIND_NOW.

This patch on other hand attempts to close the gap of handling
missing symbols this time in relocate_rela() during loading phase of ELF objects
by dynamic linker. The solution is similar to what we do in relocate_pltgot()
- let missing symbol get ignored.

Given it is actually an RFC here are some questions:

1) Currently, unlike relocate_pltgot(), relocate_rela() does not take
into account presence of BIND_NOW tag at all and this patch
assumes that BIND_NOW is on. Should we take BIND_NOW into account?
Would it change anything?

2) Should we be letting missing symbol be ignored for all relocation
types below? Maybe not all? What about symbol_other() case?

Fixes #1023

Signed-off-by: Waldemar Kozaczuk <jwkoz...@gmail.com>
---
arch/x64/arch-elf.cc | 41 +++++++++++++++++++++++++++++------------
1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/x64/arch-elf.cc b/arch/x64/arch-elf.cc
index ebe40996..1d03fa4c 100644
--- a/arch/x64/arch-elf.cc
+++ b/arch/x64/arch-elf.cc
@@ -66,20 +66,28 @@ bool object::arch_relocate_rela(u32 type, u32 sym, void *addr,
case R_X86_64_NONE:
break;
case R_X86_64_COPY: {
- symbol_module sm = symbol_other(sym);
+ symbol_module sm = symbol_other(sym); //TODO
memcpy(addr, sm.relocated_addr(), sm.size());
break;
}
- case R_X86_64_64:
- *static_cast<void**>(addr) = symbol(sym).relocated_addr() + addend;
+ case R_X86_64_64: {
+ auto _sym = symbol(sym, true);
+ if (_sym.symbol) {
+ *static_cast<void**>(addr) = _sym.relocated_addr() + addend;
+ }
break;
+ }
case R_X86_64_RELATIVE:
*static_cast<void**>(addr) = _base + addend;
break;
case R_X86_64_JUMP_SLOT:
- case R_X86_64_GLOB_DAT:
- *static_cast<void**>(addr) = symbol(sym).relocated_addr();
+ case R_X86_64_GLOB_DAT: {
+ auto _sym = symbol(sym, true);
+ if (_sym.symbol) {
+ *static_cast<void**>(addr) = _sym.relocated_addr();
+ }
break;
+ }
// The next 3 types are intended to relocate symbols of thread local variables
// defined with __thread modifier
//
@@ -100,23 +108,32 @@ bool object::arch_relocate_rela(u32 type, u32 sym, void *addr,
} else {
// The thread-local variable being accessed is located
// in DIFFERENT shared object that the caller
- *static_cast<u64*>(addr) = symbol(sym).obj->_module_index;
+ auto _sym = symbol(sym, true);
+ if (_sym.symbol) {
+ *static_cast<u64*>(addr) = _sym.obj->_module_index;
+ }
}
break;
- case R_X86_64_DTPOFF64:
+ case R_X86_64_DTPOFF64: {
// The thread-local variable being accessed is located
// in DIFFERENT shared object that the caller
- *static_cast<u64*>(addr) = symbol(sym).symbol->st_value;
+ auto _sym = symbol(sym, true);
+ if (_sym.symbol) {
+ *static_cast<u64*>(addr) = _sym.symbol->st_value;
+ }
break;
+ }
case R_X86_64_TPOFF64:
// This type is intended to resolve symbols of thread-local variables in static TLS
// accessed in initial-exec mode and is handled to calculate the virtual address of
// target thread-local variable
if (sym) {
- auto sm = symbol(sym);
- sm.obj->alloc_static_tls();
- auto tls_offset = sm.obj->static_tls_end() + sched::kernel_tls_size();
- *static_cast<u64*>(addr) = sm.symbol->st_value + addend - tls_offset;
+ auto sm = symbol(sym, true);
+ if (sm.symbol) {
+ sm.obj->alloc_static_tls();
+ auto tls_offset = sm.obj->static_tls_end() + sched::kernel_tls_size();
+ *static_cast<u64*>(addr) = sm.symbol->st_value + addend - tls_offset;
+ }
} else {
// TODO: Which case does this handle?
alloc_static_tls();
--
2.20.1

Waldek Kozaczuk

unread,
Aug 15, 2019, 11:39:40 PM8/15/19
to OSv Development
Another question: with the patch as is, would missing symbol be handled when invoked same way as with relocate_pltgot() by elf_resolv_pltgot() or would we have to do something special for that?

Waldek 

Waldek Kozaczuk

unread,
Aug 16, 2019, 8:21:19 AM8/16/19
to OSv Development
I think my understanding is that OSv would NOT handle missing symbol gracefully when invoked later if we skipped its resolution per this patch. Unlike relocate_pltgot() there is no equivalent __elf_resolve_pltgot for relocate_rela(). 

As I understand the __elf_resolve_pltgot is used to handle standard lazy symbol binding. Is there something like "lazy relocation"? OSv code seems to be relocating symbols eagerly and does not care about BIND_NOW. Should it change?

I am thinking we need some kind of "missing relocated symbol handler" similar to __elf_resolve_pltgot which we would place address of when symbols are missing during relocation. That new handler would be called and at least print to the user the message that symbol is missing before abort(). Ideally, we should also say WHAT symbol was being invoked and that would complicate things as I am not how we would know that at this point? Would we need the handler per relocation type? For starters in the first iteration, it would be enough to just print a generic message that symbol was missing, no?

Ideas?

Waldek Kozaczuk

unread,
Aug 16, 2019, 11:43:48 PM8/16/19
to OSv Development
So here are some statistics from the real use case: running ffmpeg executable from host as is which actually has sparked my interest in trying to address this issue.

So most of ~100 missing symbols (around 90%) are already handled by delaying binding by arch_relocate_jump_slot(). The remaining ones are relocations this patch delays relocating of:

/usr/lib/libmount.so.1: ignoring missing symbol versionsort of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_destroy of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_setsigmask of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_setsigdefault of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawn_file_actions_adddup2 of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawn_file_actions_init of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawn_file_actions_destroy of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_setflags of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_init of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnp of type: 6
/usr/lib/libgmp.so.10: ignoring missing symbol obstack_vprintf of type: 1
/usr/lib/libasound.so.2: ignoring missing symbol versionsort of type: 6
/usr/lib/libnsl.so.1: ignoring missing symbol _libc_intl_domainname of type: 6
/usr/lib/libasyncns.so.0: ignoring missing symbol __res_query of type: 6
/usr/lib/libasyncns.so.0: ignoring missing symbol __res_search of type: 6

I have tweaked the patch to capture the relocation types and as you can see there are 2 types:
6 - R_X86_64_GLOB_DAT
1 - R_X86_64_64

What I do not understand why most of the R_X86_64_GLOB_DAT relocations reference function symbols when R_X86_64_GLOB_DAT is supposed to denote a global variable reference. Does it mean that R_X86_64_GLOB_DAT does not mean relocating a variable but sometimes a function?

Either way, it makes me think that placing an address of some sort of missing symbol handler function will not work for all the relocations that reference the variables, right?
Reply all
Reply to author
Forward
0 new messages