Thread safety guarantees

48 views
Skip to first unread message

Francesco Biscani

unread,
Jul 23, 2019, 6:29:13 AM7/23/19
to ProjectChrono
Hello,

I need to integrate multiple Chrono systems simultaneously using multiple threads. Is this allowed or are there special considerations I need to be aware of?

My experiments seem to indicate that the parallel initialisation of multiple Chrono systems is not safe, the backtraces I have collected seem to indicate that the logging module might be involved. Perhaps a global variable which is not initialised in a thread safe fashion?

Further experiments indicate that another possible problem lies in the profiling framework (of which, curiously, there seem to exist two identical - or very similar - copies in different files with different names). I manually set the definitions BT_NO_PROFILE and CH_NO_PROFILE in btQuickprof.h and ChProfiler.h, and that seems to have taken care of some additional issues I was experiencing as well (e.g., memory leaks in the profiling).

I am currently at a point in which I am not experiencing any more data races/crashes in my usage scenario, but I am a bit wary that there might be something else I am missing (and, as well known, threading heisenbugs can be hard to reproduce).

May I ask if the Chrono developers have any guidance to provide in this regard?

Thanks a lot and kind regards,

  Francesco.

Alessandro Tasora

unread,
Jul 23, 2019, 9:39:36 AM7/23/19
to Francesco Biscani, ProjectChrono

Dear Francesco

In fact, except for the logging system, the Chrono library should be thread-safe. (Well, you noticed also the realtime profiler stuff, that is optional anyway by using those flags you mentioned)

In the past there were some places using static variables and other tiny details precluding thread safety, but now those have been removed. Well, I am not saying that now I am 100% sure chrono is thread-safe and reentrant,  but this is what we aim to.

Thank you for reporting the profiler and logger issues.

Alessandro Tasora

--
You received this message because you are subscribed to the Google Groups "ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email to projectchrono+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/projectchrono/4b05305a-fec4-455a-b829-e5a47f6e7754%40googlegroups.com.

Francesco Biscani

unread,
Jul 23, 2019, 9:55:04 AM7/23/19
to Alessandro Tasora, ProjectChrono
Hi Alessandro,

thanks for the reply!

Just to clarify, is my interpretation about the thread safety of the logging system in my original mail correct? That is, am I going to be ok as long as I don't initialise concurrently multiple Chrono systems, or should I take some special precautions (such as, for example, globally turning off all logging in some way)?

I was taking a quick look at the github page of the logging framework you are using (I believe), but it was not immediately clear to me how to switch all logging off from within the code...


Cheers,

  Francesco.

To unsubscribe from this group and stop receiving emails from it, send an email to projectchron...@googlegroups.com.

Radu Serban

unread,
Jul 24, 2019, 2:54:27 PM7/24/19
to projec...@googlegroups.com

Hi Francesco,

I'm a bit confused, are you using Chrono::Parallel (doesn't sound like it)?  easylogging is only used in Chrono::Parallel.  If that gives you trouble and you'd like to disable it, you can try to do what we already do for MSVC (easylogging does not work with that); see file chrono_parallel/ChParallelDefines.h (lines 44-50).  Or simply undefine LOGGINGENABLED (comment line 55).

--Radu

Francesco Biscani

unread,
Jul 25, 2019, 4:35:59 AM7/25/19
to Radu Serban, projec...@googlegroups.com
Hi Radu,

I was trying many different setups for my multithreaded simulations, and at one point I probably ended up running multiple Chrono::Parallel systems in different threads. I guess that probably the logging-related crash happened in one of these trial runs.

Thanks for the suggestion about the LOGGINGENABLED define!

Cheers,

  Francesco.

Francesco Biscani

unread,
Aug 2, 2019, 10:59:19 AM8/2/19
to Radu Serban, projec...@googlegroups.com
Hello Radu,

I have another threading-related question.

For a variety of reasons, I am experimenting running multiple Chrono::Parallel systems in parallel from multiple threads (specifically, from a TBB parallel for construct).

This does not work out of the box because, as I understand, Chrono::Parallel uses OpenMP by default for the CPU backend, and the simulation aborts complaining about nested parallel sections. I tried mucking around a bit with OpenMP env variables and settings, but did not go very far (I don't touch OpenMP often, I am much more of a TBB type of person).

I noticed however that Chrono has in the build system the option to enable TBB-based parallelisation instead of the default OpenMP backend. Thus, my first question is, is this option well-tested/production-ready? It did not compile out of the box as there are some remaining calls to OpenMP functions which are not ifdeffed out properly, but apart from that, it did build, and at least one parallel example/demo ran fine.

However, when I tried to use the TBB-compiled Chrono in my simulations, I started running quickly in memory-related crashes. It was not easy to get a proper backtrace, but I managed to produce something with the address sanitizer:

==26489==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060000191a0 at pc 0x7f49260fa8c8 bp 0x7f491dde6a70 sp 0x7f491dde6a60
WRITE of size 4 at 0x6060000191a0 thread T9
    #0 0x7f49260fa8c7 in f_Store_AABB_BIN_Intersection /home/yardbird/repos/chrono/src/chrono_parallel/collision/ChBroadphaseUtils.h:274
    #1 0x7f49260fa8c7 in chrono::collision::ChCBroadphase::OneLevelBroadphase() /home/yardbird/repos/chrono/src/chrono_parallel/collision/ChBroadphase.cpp:300
    #2 0x7f4926101757 in chrono::collision::ChCBroadphase::DispatchRigid() /home/yardbird/repos/chrono/src/chrono_parallel/collision/ChBroadphase.cpp:243
    #3 0x7f49261e975a in chrono::collision::ChCollisionSystemParallel::Run() /home/yardbird/repos/chrono/src/chrono_parallel/collision/ChCollisionSystemParallel.cpp:227
    #4 0x7f4925feaa42 in chrono::ChSystemParallel::Integrate_Y() /home/yardbird/repos/chrono/src/chrono_parallel/physics/ChSystemParallel.cpp:115
    #5 0x7f4926e507d2 in chrono::ChSystem::DoStepDynamics(double) /home/yardbird/repos/chrono/src/chrono/physics/ChSystem.cpp:1398
    #6 0x5590db1ee805 in integrate_single_ch_sys /home/yardbird/repos/gcoll/src/main_loop_integrate_ch_sys.cpp:242
    #7 0x5590db2097cd in operator()<tbb::blocked_range<long unsigned int> > /home/yardbird/repos/gcoll/src/main_loop_integrate_ch_sys.cpp:321
    #8 0x5590db209899 in run_body /usr/include/tbb/parallel_for.h:116

I tried to follow the code, but I am a bit at a loss on how to debug this. This would seem on first glance like a data race resulting from a shared state being mutated by multiple threads, but I am not sure...

Is there a chance that what I am trying to do might work?

Thanks and kind regards,

  Francesco.
Reply all
Reply to author
Forward
0 new messages