[llvm-dev] lldb stops on every call to dlopen

522 views
Skip to first unread message

Steve Ravet via llvm-dev

unread,
Apr 16, 2018, 9:27:14 PM4/16/18
to llvm...@lists.llvm.org
Hello lldb developers, I am running into a problem with lldb on Linux.  I am currently running llvm 6.0.0.

I have an executable that dynamically loads a large number of shared libraries at runtime.  These are explicitly loaded via dlopen (they are specified in a configuration file), and after loading a few (typically a dozen or so, but the number varies) lldb will halt during dlopen.  If I continue, it will load a few more then halt again, which makes debugging from startup impractical since there are so many libraries to be loaded (more than a hundred of them).

When I build and debug this same C++ on macOS, the debugger works fine.  I have verified that target.process.stop-on-sharedlibrary-events is false. I turned on dyld logging and I see lots of log messages about RendezvousBreakpoint being hit, but I don’t see anything that sheds light on why some libraries load without stopping but others don’t.

I have tried to recreate this in a trivial program that calls dlopen in a loop, but haven’t been able to reproduce.

Can your offer any suggestions for further debugging this?  More supporting evidence follows.

Here is the message when the debugger stops:

Process 120004 stopped
* thread #1, name = ‘xxxxxxxx', stop reason = trace
    frame #0: 0x00002aaaacfca6a0 libc.so.6`__restore_rt
libc.so.6`__restore_rt:
->  0x2aaaacfca6a0 <+0>: movq   $0xf, %rax
    0x2aaaacfca6a7 <+7>: syscall 
    0x2aaaacfca6a9 <+9>: nopl   (%rax)

libc.so.6`__libc_sigaction:
    0x2aaaacfca6b0 <+0>: subq   $0xd0, %rsp

I do not have the stop on shared library events setting enabled:

(lldb) settings show target.process.stop-on-sharedlibrary-events
target.process.stop-on-sharedlibrary-events (boolean) = false



The backtrace goes back to dlopen:

(lldb) bt
* thread #1, name = ‘xxxxx', stop reason = trace
  * frame #0: 0x00002aaaacfca6a0 libc.so.6`__restore_rt
    frame #1: 0x00002aaaaaab9eb0 ld-linux-x86-64.so.2
    frame #2: 0x00002aaaaaabdc53 ld-linux-x86-64.so.2`dl_open_worker + 499
    frame #3: 0x00002aaaaaab9286 ld-linux-x86-64.so.2`_dl_catch_error + 102
    frame #4: 0x00002aaaaaabd63a ld-linux-x86-64.so.2`_dl_open + 186
    frame #5: 0x00002aaaac39df66 libdl.so.2`dlopen_doit + 102
    frame #6: 0x00002aaaaaab9286 ld-linux-x86-64.so.2`_dl_catch_error + 102
    frame #7: 0x00002aaaac39e29c libdl.so.2`_dlerror_run + 124
    frame #8: 0x00002aaaac39dee1 libdl.so.2`__dlopen_check + 49

the dyld debug log has a lot of this:
209 intern-state     DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit pid 153501 stop_when_images_change=false
210 intern-state     DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit called for pid 153501
211 intern-state     DYLDRendezvous::Resolve address size: 8, padding 4
212 intern-state     DYLDRendezvous::Resolve cursor = 0x2aaaaaccc160
213 intern-state     DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit pid 153501 stop_when_images_change=false
214 intern-state     DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit called for pid 153501
215 intern-state     DYLDRendezvous::Resolve address size: 8, padding 4
216 intern-state     DYLDRendezvous::Resolve cursor = 0x2aaaaaccc160



thanks,
--steve


 In the woods too, a man casts off his years, as the snake his slough, and at what period soever of life, is always a child.



Pavel Labath via llvm-dev

unread,
Apr 17, 2018, 9:00:37 AM4/17/18
to steve...@apple.com, LLDB, LLVM Dev
[+lldb-dev]

Hello Steve,

thanks for the report.

The fact that you see the rendezvous breakpoint being hit many times is not
surprising. We get those every time the library is loaded (we need that to
load relevant debug info and set potential breakpoints). However, they
should generally not be surfaced to the user (unless you have the
stop-on-sharedlibrary-events setting set, which you don't).

The part that is suspicious to me is that __restore_rt shows up on the top
of the backtrace. This is a trampoline used to return from signal handlers,
and it would seem to indicate that you got some sort of a signal while
loading the libraries. I don't know why this would happen, but it could be
that this is confusing lldb's auto-resume logic.

The interesting part to see here is what lldb thinks are the stop reasons
for individual threads in the process (is the process multi-threaded?) for
the last couple of stops. The "lldb step" and "gdb-remote packets" log
categories are the most interesting to observe here. If you are able to
send me the log traces, I can help you interpret them.

regards,
pavel

> thanks,
> --steve

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Jim Ingham via llvm-dev

unread,
Apr 17, 2018, 12:27:23 PM4/17/18
to Pavel Labath, LLVM Dev, LLDB, steve...@apple.com
It is interesting that the stop reason on the thread that stopped is "trace". That's what you would expect returning from the single-step to step over the breakpoint. But it looks like we got a signal while single-stepping, but the stop reason was misreported by somebody.

Jim

> lldb-dev mailing list
> lldb...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Steve Ravet via llvm-dev

unread,
Apr 17, 2018, 1:28:59 PM4/17/18
to Jim Ingham, LLVM Dev, LLDB
Pavel asked for a dump of gdb-remote commands.  I got that and ran it through the gdbremote decoder, and trimmed to include what looks like the last successful continue after breakpoint and then the halt on dlopen.  Both cases stop on signal 5.

After the stop message the debugger issues two binary reads and then apparently makes the decision that it should stop rather than continue.  The stopping case is missing the equivalent of "Element 1: Single stepping past breakpoint site 2 at 0x2aaaaaab9eb0” which is in the continuing case.  I’ve attached the file here:

out
Reply all
Reply to author
Forward
0 new messages