tst-tls-pie.so crashes with kernel compiled with most symbols hidden

22 views
Skip to first unread message

Waldek Kozaczuk

unread,
Jan 5, 2022, 11:56:17 PM1/5/22
to OSv Development
Hi,

When one compiles kernel with most symbols hidden (which also hides completely STD C++ library), the tst-tls-pie.so crashes like so:

#0  0x00000000402da5b2 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:26
#3  0x00000000402211a0 in abort (fmt=fmt@entry=0x4049aef3 "Aborted\n") at runtime.cc:137
#4  0x00000000402211b2 in abort () at runtime.cc:103
#5  0x000000004039cade in osv::generate_signal (siginfo=..., ef=0xffff800000e87068) at libc/signal.cc:130
#6  0x000000004039cb9f in osv::handle_mmap_fault (addr=addr@entry=18446744073709547520, sig=sig@entry=11, ef=ef@entry=0xffff800000e87068)
    at libc/signal.cc:145
#7  0x000000004028709b in mmu::vm_sigsegv (ef=0xffff800000e87068, addr=18446744073709547520) at core/mmu.cc:1334
#8  mmu::vm_fault (addr=18446744073709547520, addr@entry=18446744073709551592, ef=ef@entry=0xffff800000e87068) at core/mmu.cc:1354
#9  0x00000000402d3d90 in page_fault (ef=0xffff800000e87068) at arch/x64/mmu.cc:42
#10 <signal handler called>
#11 0x0000100000156cf7 in std::ostream::sentry::sentry (this=0x200000200df0, __os=...)
    at /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream.tcc:51
#12 0x000010000015743c in std::__ostream_insert<char, std::char_traits<char> > (__out=..., __s=0x40303a "PASS", __n=__n@entry=4)
    at /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream_insert.h:82
#13 0x0000000000401ccf in std::operator<< <std::char_traits<char> > (__s=<optimized out>, __out=...)
    at /usr/include/c++/10/bits/char_traits.h:371
#14 report (ok=<optimized out>, msg="v7 in init function") at /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:57
#15 0x00000000004014b3 in before_main () at /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:127
#16 0x00000000402972c5 in elf::object::run_init_funcs (this=0xffffa0000094a600, argc=argc@entry=1, argv=argv@entry=0xffffa0000094a400)
    at core/elf.cc:1178
#17 0x0000000040298a0b in elf::program::init_library (this=<optimized out>, argc=1, argv=0xffffa0000094a400) at core/elf.cc:1500
#18 0x000000004020c239 in osv::application::main (this=0xffffa0000094cc10) at core/app.cc:319
#19 0x0000000040365cb9 in operator() (app=<optimized out>, __closure=0x0) at core/app.cc:236
#20 _FUN () at core/app.cc:238
#21 0x0000000040398a66 in operator() (__closure=0xffffa00000abda00) at libc/pthread.cc:116
#22 std::__invoke_impl<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#23 std::__invoke_r<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at /usr/include/c++/10/bits/invoke.h:153
#24 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/10/bits/std_function.h:291
#25 0x00000000403391ea in sched::thread::main (this=0xffff800000e82040) at core/sched.cc:1267
#26 sched::thread_main_c (t=0xffff800000e82040) at arch/x64/arch-switch.hh:325
#27 0x00000000402d3b33 in thread_main () at arch/x64/entry.S:116

I have been researching this for a bit and realized that when I change the report() function not use std::cout but instead use printf() like so:

printf("%s: %s\n", ok ? "PASS" : "FAIL", msg.c_str());

the crash goes away.

I knew something was wrong with the initialization of some objects but could not quite pin it down. Then I tried to run the tst-tls-pie.so on the Linux host (it is a pie) and it crashed with a segmentation fault. This made me think that maybe something is wrong with the test program itself. 

Finally, I found this on the internet - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 - where somebody tried to compile and run a program with a similar construct: std::cout used in the __constructor__ annotated function. The stack trace looks very similar as well. The bug was rejected with the explanation that the std::ios_base::Init object may not be initialized yet if used in the __constructor__ annotated function.

What is interesting this test happens to work with kernel exposing all symbols including stdc++ because the kernel copy of std::ios_base::Init was already initialized.

So I think the tst-tls.cc needs to be changed to not to use std:cout in the report function or uses different report() from before_main(). 

Waldek

PS1. Regarding using std:cout in various __constructor__  kernel functions like parse_madt() (which calls debug()) and called by smp_init(), I wonder if we are just lucky that std::ios_base::Init is already initialized and we do not see similar crashes.

PS2: This printout with dynamic linker info suggests that all global objects in stdc++.so should have been initialized before calling before_main(). So why is std::ios_base::Init  not initialized yet?

ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing DT_INIT function
ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing DT_INIT function
ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing 12 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing 12 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:4, /tests/libtls.so]: Executing DT_INIT function
ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing DT_INIT function
ELF [tid:26, mod:4, /tests/libtls.so]: Executing 1 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing 1 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing DT_INIT function
ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing DT_INIT function
ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing 2 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing 2 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing DT_INIT function
ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Finished executing DT_INIT function
ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing 3 DT_INIT_ARRAYSZ functions
Aborted

[backtrace]
0x000000004039cadd <???+1077529309>
0x000000004039cb9e <???+1077529502>
0x000000004028709a <???+1076392090>
0x00000000402d3d8f <???+1076706703>
0x00000000402d2bb6 <???+1076702134>
0x00000000004050bf <???+4214975>

Nadav Har'El

unread,
Jan 6, 2022, 4:55:25 AM1/6/22
to Waldek Kozaczuk, OSv Development
On Thu, Jan 6, 2022 at 6:56 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
Hi,

When one compiles kernel with most symbols hidden (which also hides completely STD C++ library), the tst-tls-pie.so crashes like so:

#0  0x00000000402da5b2 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:26
#3  0x00000000402211a0 in abort (fmt=fmt@entry=0x4049aef3 "Aborted\n") at runtime.cc:137
#4  0x00000000402211b2 in abort () at runtime.cc:103
#5  0x000000004039cade in osv::generate_signal (siginfo=..., ef=0xffff800000e87068) at libc/signal.cc:130
#6  0x000000004039cb9f in osv::handle_mmap_fault (addr=addr@entry=18446744073709547520, sig=sig@entry=11, ef=ef@entry=0xffff800000e87068)
    at libc/signal.cc:145
#7  0x000000004028709b in mmu::vm_sigsegv (ef=0xffff800000e87068, addr=18446744073709547520) at core/mmu.cc:1334
#8  mmu::vm_fault (addr=18446744073709547520, addr@entry=18446744073709551592, ef=ef@entry=0xffff800000e87068) at core/mmu.cc:1354
#9  0x00000000402d3d90 in page_fault (ef=0xffff800000e87068) at arch/x64/mmu.cc:42
#10 <signal handler called>
#11 0x0000100000156cf7 in std::ostream::sentry::sentry (this=0x200000200df0, __os=...)
    at /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream.tcc:51

Since your kernel does *not* export the C++ library, where did this symbol come from? Did you build the tests with the host's C++ library?
I'll continue reading with the assumption that you did.

 
#12 0x000010000015743c in std::__ostream_insert<char, std::char_traits<char> > (__out=..., __s=0x40303a "PASS", __n=__n@entry=4)
    at /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream_insert.h:82
#13 0x0000000000401ccf in std::operator<< <std::char_traits<char> > (__s=<optimized out>, __out=...)
    at /usr/include/c++/10/bits/char_traits.h:371
#14 report (ok=<optimized out>, msg="v7 in init function") at /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:57
#15 0x00000000004014b3 in before_main () at /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:127
#16 0x00000000402972c5 in elf::object::run_init_funcs (this=0xffffa0000094a600, argc=argc@entry=1, argv=argv@entry=0xffffa0000094a400)
    at core/elf.cc:1178
#17 0x0000000040298a0b in elf::program::init_library (this=<optimized out>, argc=1, argv=0xffffa0000094a400) at core/elf.cc:1500
#18 0x000000004020c239 in osv::application::main (this=0xffffa0000094cc10) at core/app.cc:319
#19 0x0000000040365cb9 in operator() (app=<optimized out>, __closure=0x0) at core/app.cc:236
#20 _FUN () at core/app.cc:238
#21 0x0000000040398a66 in operator() (__closure=0xffffa00000abda00) at libc/pthread.cc:116
#22 std::__invoke_impl<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#23 std::__invoke_r<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at /usr/include/c++/10/bits/invoke.h:153
#24 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/10/bits/std_function.h:291
#25 0x00000000403391ea in sched::thread::main (this=0xffff800000e82040) at core/sched.cc:1267
#26 sched::thread_main_c (t=0xffff800000e82040) at arch/x64/arch-switch.hh:325
#27 0x00000000402d3b33 in thread_main () at arch/x64/entry.S:116

I have been researching this for a bit and realized that when I change the report() function not use std::cout but instead use printf() like so:

printf("%s: %s\n", ok ? "PASS" : "FAIL", msg.c_str());

the crash goes away.

I knew something was wrong with the initialization of some objects but could not quite pin it down. Then I tried to run the tst-tls-pie.so on the Linux host (it is a pie) and it crashed with a segmentation fault. This made me think that maybe something is wrong with the test program itself. 

Interesting. I remember tests/tst-tls.cc did work on Linux in the past, and the "pie" compilation which you added in 70547a6c64c56cbdf4c5b5d0c7eaa1e85badb0f7 should supposedly have worked too but I don't see that you documented it. Maybe you never ran it on Linux?


Finally, I found this on the internet - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 - where somebody tried to compile and run a program with a similar construct: std::cout used in the __constructor__ annotated function. The stack trace looks very similar as well. The bug was rejected with the explanation that the std::ios_base::Init object may not be initialized yet if used in the __constructor__ annotated function.

What is interesting this test happens to work with kernel exposing all symbols including stdc++ because the kernel copy of std::ios_base::Init was already initialized.

Very interesting, and nice detective work. Yes, I think this is exactly the same bug.
*However*, I'm not sure I understand why it shouldn't have worked:

If test-tls-pie's *depends* on libstdc++.so, unless I'm missing something, OSv should run libstdc++'s constructors *before* it runs test-tls-pie's.
So when tests/tst-tls's constructor function runs, libstdc++ should already have been fully constructed.
I see below you also raised this question....

That being said, I'm probably missing something because if I was correctly, it would run correctly on Linux and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 would not exist...



So I think the tst-tls.cc needs to be changed to not to use std:cout in the report function or uses different report() from before_main(). 

That sounds like a good workaround, but I would have liked to understand better if this problem is expected, or some sort of OSv initialization-order bug.


Waldek

PS1. Regarding using std:cout in various __constructor__  kernel functions like parse_madt() (which calls debug()) and called by smp_init(), I wonder if we are just lucky that std::ios_base::Init is already initialized and we do not see similar crashes.

I think we already ironed out these issues in the kernel log ago. We have an explicit order for some things (see include/osv/prio.hh) and all the C++ code which depends on other stuff being initialized happens after those things. I don't remember the details though..
 

PS2: This printout with dynamic linker info suggests that all global objects in stdc++.so should have been initialized before calling before_main(). So why is std::ios_base::Init  not initialized yet?

ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing DT_INIT function
ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing DT_INIT function
ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Executing 12 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:5, /usr/lib/libstdc++.so.6]: Finished executing 12 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:4, /tests/libtls.so]: Executing DT_INIT function
ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing DT_INIT function
ELF [tid:26, mod:4, /tests/libtls.so]: Executing 1 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:4, /tests/libtls.so]: Finished executing 1 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing DT_INIT function
ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing DT_INIT function
ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Executing 2 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:3, /usr/lib/libgcc_s.so.1]: Finished executing 2 DT_INIT_ARRAYSZ functions
ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing DT_INIT function
ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Finished executing DT_INIT function
ELF [tid:26, mod:2, /tests/tst-tls-pie.so]: Executing 3 DT_INIT_ARRAYSZ functions
Aborted

[backtrace]
0x000000004039cadd <???+1077529309>
0x000000004039cb9e <???+1077529502>
0x000000004028709a <???+1076392090>
0x00000000402d3d8f <???+1076706703>
0x00000000402d2bb6 <???+1076702134>
0x00000000004050bf <???+4214975>

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/d0c7f7c0-91f4-4478-9ccf-c2cfe13c15afn%40googlegroups.com.

Waldek Kozaczuk

unread,
Jan 6, 2022, 1:48:42 PM1/6/22
to OSv Development
I think I have all the answers. See below.

On Thursday, January 6, 2022 at 4:55:25 AM UTC-5 Nadav Har'El wrote:
On Thu, Jan 6, 2022 at 6:56 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
Hi,

When one compiles kernel with most symbols hidden (which also hides completely STD C++ library), the tst-tls-pie.so crashes like so:

#0  0x00000000402da5b2 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:26
#3  0x00000000402211a0 in abort (fmt=fmt@entry=0x4049aef3 "Aborted\n") at runtime.cc:137
#4  0x00000000402211b2 in abort () at runtime.cc:103
#5  0x000000004039cade in osv::generate_signal (siginfo=..., ef=0xffff800000e87068) at libc/signal.cc:130
#6  0x000000004039cb9f in osv::handle_mmap_fault (addr=addr@entry=18446744073709547520, sig=sig@entry=11, ef=ef@entry=0xffff800000e87068)
    at libc/signal.cc:145
#7  0x000000004028709b in mmu::vm_sigsegv (ef=0xffff800000e87068, addr=18446744073709547520) at core/mmu.cc:1334
#8  mmu::vm_fault (addr=18446744073709547520, addr@entry=18446744073709551592, ef=ef@entry=0xffff800000e87068) at core/mmu.cc:1354
#9  0x00000000402d3d90 in page_fault (ef=0xffff800000e87068) at arch/x64/mmu.cc:42
#10 <signal handler called>
#11 0x0000100000156cf7 in std::ostream::sentry::sentry (this=0x200000200df0, __os=...)
    at /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream.tcc:51

Since your kernel does *not* export the C++ library, where did this symbol come from? Did you build the tests with the host's C++ library?
I'll continue reading with the assumption that you did.

It turns out that even though the symbols are hidden and their names do not show up in the stack trace printed by the kernel, there are still in the original elf before stripping so when I connect 
with gdb pointing to loader.elf the symbol names show up nicely.

readelf -sW build/release/loader-stripped.elf | grep run_init_funcs #NOTHING

readelf -sW build/release/loader.elf | grep run_init_funcs
 14789: 0000000040297120   488 FUNC    LOCAL  DEFAULT    4 _ZN3elf6object14run_init_funcsEiPPc


 
#12 0x000010000015743c in std::__ostream_insert<char, std::char_traits<char> > (__out=..., __s=0x40303a "PASS", __n=__n@entry=4)
    at /usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream_insert.h:82
#13 0x0000000000401ccf in std::operator<< <std::char_traits<char> > (__s=<optimized out>, __out=...)
    at /usr/include/c++/10/bits/char_traits.h:371
#14 report (ok=<optimized out>, msg="v7 in init function") at /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:57
#15 0x00000000004014b3 in before_main () at /home/wkozaczuk/projects/osv-true-master/tests/tst-tls.cc:127
#16 0x00000000402972c5 in elf::object::run_init_funcs (this=0xffffa0000094a600, argc=argc@entry=1, argv=argv@entry=0xffffa0000094a400)
    at core/elf.cc:1178
#17 0x0000000040298a0b in elf::program::init_library (this=<optimized out>, argc=1, argv=0xffffa0000094a400) at core/elf.cc:1500
#18 0x000000004020c239 in osv::application::main (this=0xffffa0000094cc10) at core/app.cc:319
#19 0x0000000040365cb9 in operator() (app=<optimized out>, __closure=0x0) at core/app.cc:236
#20 _FUN () at core/app.cc:238
#21 0x0000000040398a66 in operator() (__closure=0xffffa00000abda00) at libc/pthread.cc:116
#22 std::__invoke_impl<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#23 std::__invoke_r<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at /usr/include/c++/10/bits/invoke.h:153
#24 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/10/bits/std_function.h:291
#25 0x00000000403391ea in sched::thread::main (this=0xffff800000e82040) at core/sched.cc:1267
#26 sched::thread_main_c (t=0xffff800000e82040) at arch/x64/arch-switch.hh:325
#27 0x00000000402d3b33 in thread_main () at arch/x64/entry.S:116

I have been researching this for a bit and realized that when I change the report() function not use std::cout but instead use printf() like so:

printf("%s: %s\n", ok ? "PASS" : "FAIL", msg.c_str());

the crash goes away.

I knew something was wrong with the initialization of some objects but could not quite pin it down. Then I tried to run the tst-tls-pie.so on the Linux host (it is a pie) and it crashed with a segmentation fault. This made me think that maybe something is wrong with the test program itself. 

Interesting. I remember tests/tst-tls.cc did work on Linux in the past, and the "pie" compilation which you added in 70547a6c64c56cbdf4c5b5d0c7eaa1e85badb0f7 should supposedly have worked too but I don't see that you documented it. Maybe you never ran it on Linux?

I think I never ran it on Linux or if I did and my memory serves me well it did crash but I never got to the bottom of why. For example the tst-getopt-pie.so always worked on Linux.


Finally, I found this on the internet - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 - where somebody tried to compile and run a program with a similar construct: std::cout used in the __constructor__ annotated function. The stack trace looks very similar as well. The bug was rejected with the explanation that the std::ios_base::Init object may not be initialized yet if used in the __constructor__ annotated function.

What is interesting this test happens to work with kernel exposing all symbols including stdc++ because the kernel copy of std::ios_base::Init was already initialized.

Very interesting, and nice detective work. Yes, I think this is exactly the same bug.
*However*, I'm not sure I understand why it shouldn't have worked:

If test-tls-pie's *depends* on libstdc++.so, unless I'm missing something, OSv should run libstdc++'s constructors *before* it runs test-tls-pie's.
So when tests/tst-tls's constructor function runs, libstdc++ should already have been fully constructed.
I see below you also raised this question....

That being said, I'm probably missing something because if I was correctly, it would run correctly on Linux and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94810 would not exist...

After more research and using the simpler program below (taken from gcc report above), it turns out that std::ios_base::Init::Init() that initializes std::cout is called by an init function that is generated by the compiler as part of the test program itself NOT as we thought as part of libstdc++.so. So indeed it all depends on the order of the init functions in the same object and it seems that the function calling std::ios_base::Init::Init() happens to be after our init function - before_main.

Here is a simpler example:
#include <iostream>

__attribute__((constructor)) void constructor() {
    std::cout << "Starting" << std::endl;
}

int main() {
    std::cout << "Hello, World!" << std::endl;
    return 0;
}

__attribute__((destructor)) void destructor() {
    std::cout << "Finishing" << std::endl;
}

It crashes on Linux and on OSv. When connected with gdb with a breakpoint in elf::run_init_funcs() I see this:

(gdb) p funcs
$2 = (void (**)(int, char **)) 0x403db8
(gdb) p funcs[0]
$3 = (void (*)(int, char **)) 0x401150 <frame_dummy>
(gdb) p funcs[1]
$4 = (void (*)(int, char **)) 0x401156 <constructor()>
(gdb) p funcs[2]
$5 = (void (*)(int, char **)) 0x401230 <_GLOBAL__sub_I__Z11constructorv>
(gdb) disas 0x401156
Dump of assembler code for function _Z11constructorv:
   0x0000000000401156 <+0>:        push   %rbp
   0x0000000000401157 <+1>:        mov    %rsp,%rbp
   0x000000000040115a <+4>:        lea    0xeb0(%rip),%rsi        # 0x402011
   0x0000000000401161 <+11>:        lea    0x2ed8(%rip),%rdi        # 0x404040 <_ZSt4cout@@GLIBCXX_3.4>
   0x0000000000401168 <+18>:        call   0x401040 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
   0x000000000040116d <+23>:        mov    %rax,%rdx
   0x0000000000401170 <+26>:        mov    0x2e69(%rip),%rax        # 0x403fe0
   0x0000000000401177 <+33>:        mov    %rax,%rsi
   0x000000000040117a <+36>:        mov    %rdx,%rdi
   0x000000000040117d <+39>:        call   0x401050 <_ZNSolsEPFRSoS_E@plt>
   0x0000000000401182 <+44>:        nop
   0x0000000000401183 <+45>:        pop    %rbp
   0x0000000000401184 <+46>:        ret    
End of assembler dump.
(gdb) disas 0x401230
Dump of assembler code for function _GLOBAL__sub_I__Z11constructorv:
   0x0000000000401230 <+0>:        push   %rbp
   0x0000000000401231 <+1>:        mov    %rsp,%rbp
   0x0000000000401234 <+4>:        mov    $0xffff,%esi
   0x0000000000401239 <+9>:        mov    $0x1,%edi
   0x000000000040123e <+14>:        call   0x4011e7 <_Z41__static_initialization_and_destruction_0ii>
   0x0000000000401243 <+19>:        pop    %rbp
   0x0000000000401244 <+20>:        ret    
End of assembler dump.
(gdb) disas _Z41__static_initialization_and_destruction_0ii
Dump of assembler code for function __static_initialization_and_destruction_0(int, int):
   0x00000000402029e0 <+0>:        push   %rbp
   0x00000000402029e1 <+1>:        mov    $0x405910d0,%edi
   0x00000000402029e6 <+6>:        mov    %rsp,%rbp
   0x00000000402029e9 <+9>:        sub    $0x170,%rsp
   0x00000000402029f0 <+16>:        call   0x403e9ec0 <_ZNSt8ios_base4InitC2Ev>
   0x00000000402029f5 <+21>:        mov    $0x40590888,%edx
   0x00000000402029fa <+26>:        mov    $0x405910d0,%esi
   0x00000000402029ff <+31>:        mov    $0x403ea6c0,%edi
   0x0000000040202a04 <+36>:        call   0x40221dc0 <__cxxabiv1::__cxa_atexit(destructor_t, void*, void*)>
   0x0000000040202a09 <+41>:        mov    $0x404a037e,%ecx
   0x0000000040202a0e <+46>:        mov    $0x404a0368,%r8d
   0x0000000040202a14 <+52>:        xor    %esi,%esi
   0x0000000040202a16 <+54>:        mov    $0x404a0598,%edx
   0x0000000040202a1b <+59>:        mov    $0x40591060,%edi
   0x0000000040202a20 <+64>:        call   0x40345500 <_ZN15tracepoint_baseC2EjRKSt9type_infoPKcS4_>
   0x0000000040202a25 <+69>:        mov    $0x40591060,%esi
   0x0000000040202a2a <+74>:        mov    $0x40234ca0,%edi
   0x0000000040202a2f <+79>:        mov    $0x40590888,%edx
   0x0000000040202a34 <+84>:        movq   $0x404a054f,0x38e641(%rip)        # 0x40591080 <_ZL15trace_tcp_state+32>
   0x0000000040202a3f <+95>:        call   0x40221dc0 <__cxxabiv1::__cxa_atexit(destructor_t, void*, void*)>
   0x0000000040202a44 <+100>:        mov    $0x404a03a0,%esi
   0x0000000040202a49 <+105>:        lea    -0x170(%rbp),%rdi
   0x0000000040202a50 <+112>:        mov    $0x2e,%ecx
   0x0000000040202a55 <+117>:        rep movsq %ds:(%rsi),%es:(%rdi)
   0x0000000040202a58 <+120>:        lea    -0x170(%rbp),%rax
   0x0000000040202a5f <+127>:        mov    %rbp,%rsi
   0x0000000040202a62 <+130>:        nopw   0x0(%rax,%rax,1)
   0x0000000040202a68 <+136>:        mov    (%rax),%rdx
   0x0000000040202a6b <+139>:        movzbl 0x8(%rax),%ecx
   0x0000000040202a6f <+143>:        add    $0x10,%rax
   0x0000000040202a73 <+147>:        mov    %cl,0x40590f60(%rdx)
   0x0000000040202a79 <+153>:        cmp    %rsi,%rax
   0x0000000040202a7c <+156>:        jne    0x40202a68 <__static_initialization_and_destruction_0(int, int)+136>
   0x0000000040202a7e <+158>:        leave  
   0x0000000040202a7f <+159>:        ret    
 
As you can see the _GLOBAL__sub_I__Z11constructorv which calls __static_initialization_and_destruction_0 which calls _ZNSt8ios_base4InitC2Ev is after the _Z11constructorv function defined as part of the test program. So indeed the std::cout is not initialized yet at this point.

So as the gcc report comment say, it is NOT valid to use std::cot (and others) in an init function as the order in which these function and generated by the compiler and called in undetermined. In other words that simple sample program above and our tst-tls-pie.so is not a valid C++ program.

The reason it works when symbols are NOT hidden is because we include and export libstdc++.so (statically linked) os the std::cout gets initialized as part of the kernel early on.




So I think the tst-tls.cc needs to be changed to not to use std:cout in the report function or uses different report() from before_main(). 

That sounds like a good workaround, but I would have liked to understand better if this problem is expected, or some sort of OSv initialization-order bug.
I think this is more than a workaround. It is actually fixing an invalid program that just happens to work when all symbols and libstdc++.so is exported. 

Waldek Kozaczuk

unread,
Jan 6, 2022, 2:49:10 PM1/6/22
to OSv Development
Also from this - https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html:

"
constructor
destructor
constructor (priority)
destructor (priority)
The constructor attribute causes the function to be called automatically before execution enters main (). Similarly, the destructor attribute causes the function to be called automatically after main () completes or exit () is called. Functions with these attributes are useful for initializing data that is used implicitly during the execution of the program.

On some targets, the attributes also accept an integer argument to specify a priority to control the order in which constructor and destructor functions are run. A constructor with a smaller priority number runs before a constructor with a larger priority number; the opposite relationship holds for destructors. Note that priorities 0-100 are reserved. So, if you have a constructor that allocates a resource and a destructor that deallocates the same resource, both functions typically have the same priority. The priorities for constructor and destructor functions are the same as those specified for namespace-scope C++ objects (see C++ Attributes). However, at present, the order in which constructors for C++ objects with static storage duration and functions decorated with attribute constructor are invoked is unspecified. In mixed declarations, attribute init_priority can be used to impose a specific ordering.

Using the argument forms of the constructor and destructor attributes on targets where the feature is not supported is rejected with an error."
Reply all
Reply to author
Forward
0 new messages