I found this strange behavior is because the client code is crashing after
fork()ing, but before exec()ing the science app. So the forked process gets
a segfault, but the parent process (the core client) keeps running just
fine.
I've yet to find the exact cause.
Stacktrace:
ACTIVE_TASK::start(bool) at client/app_start.C:760
ACTIVE_TASK::resume_or_start(bool) at client/app_start.C:875
CLIENT_STATE::enforce_schedule() at client/cpu_sched.C:986
CLIENT_STATE::poll_slow_events() at client/client_state.C:605
boinc_main_loop() at client/main.C:475
main at client/main.C:738
Something there throws a std::logic_error.
Problem found. getenv returns NULL if the requested environment variable
isn't set. $LD_LIBRARY_PATH isn't set on my machine.
std::string libpath(getenv("LD_LIBRARY_PATH")) makes the constructor throw a
std::logic_error when getenv returns NULL.
The question is, why didn't it crash before r234 change? The original
printf-based code wasn't checking getenv return value.
Turns out glibc printf doesn't crash with NULL pointers. printf("%s", NULL)
prints the string "(null)". However, BOINC would have crashed with other C
implementations that don't handle the null pointer this way (like Solaris,
afaik). printf'ing a null pointer is undefined behavior.
I have reported the latter to boinc_dev mailing list. Fix to synecdoche
string-based code coming soon.
> Turns out glibc printf doesn't crash with NULL pointers. printf("%s", NULL)
> prints the string "(null)". However, BOINC would have crashed with other C
> implementations that don't handle the null pointer this way (like Solaris,
> afaik). printf'ing a null pointer is undefined behavior.
That's interessting. But I didn't see anything about this in the manpage
for printf. Seems to be nonstandard behaviour on Linux, which is bad
(but obviously we can't do anything about that).
Anyway, thanks for the fix.
Non-standard behavior for GNU libc, to be more specific.