De-serialization and thread-safety?

124 views
Skip to first unread message

Erik Schnetter

unread,
Sep 20, 2014, 11:25:00 PM9/20/14
to julia...@googlegroups.com
I am trying to track down a segfault in a Julia application. Currently I am zooming in on "deserialize", as avoiding calling it seems to reliably cure the problem, while calling it (even if not using the result) seems to reliably trigger the segfault.

I am using many threads (tasks), and deserialize is called concurrently. Is this safe? I've been bitten in the past by this; e.g. I've accidentally added an "info" statement into a sequence of statements that needs to be atomic, and I/O apparently switches threads. Is there a list of known-to-be-safe or known-to-be-unsafe functions? Is deserialization thread-safe in this respect?

I am in particular deserializing function calls and lambda expressions, and I see global variables ("lambda_numbers", "known_lambda_data"). Are the respective data structures (WeakKeyDict and Dict) thread-safe?

Is there a locking mechanism in Julia? This would temporarily only allow a single thread (task) to run, aborting with an error if this thread becomes unrunnable. In other words, calling "yield" when holding a lock would be a no-op.

-erik

Tim Holy

unread,
Sep 21, 2014, 5:26:27 AM9/21/14
to julia...@googlegroups.com
Hi Erik,

First, one comment: tasks are not "true" (kernel) threads. Currently a julia
process is single-threaded. Tasks are better considered as a form of
cooperative multitasking.

Yes, I've also found that I/O causes task switching. I don't personally know a
great way around this. One option would presumably be to have some form of
message queue; I am pretty sure that push!ing a new message on it---as long as
you don't need to touch I/O to create the message---would not cause a switch.
You can also use time() and other markers to indicate the status of control
flow.

I haven't been reading things carefully enough to know whether there's any
history behind this, but if you haven't said so already...what does gdb (or
equivalent) say about the segfault?

--Tim

Erik Schnetter

unread,
Sep 21, 2014, 10:25:14 AM9/21/14
to julia...@googlegroups.com
I'm aware that Julia's threads are "green threads". The issue of
thread safety still remains; if one thread is suspended in a critical
region, another can enter that region. Storing handles in global data
structures and incrementing global variables are such actions, and I'm
not 100% sure that the respective region in serialize.jl are
yield-free, even without my info output. I was surprised to see that
I/O causes task switches -- maybe something else (hashing?
dictionaries? creating new lambdas in C?) also causes task switches?

gdb points to memory allocation routines in libc, called from gc.c or
array.c. I assume that something overwrites memory, destroying libc
malloc's data structures, leading to a crash later.

-erik
--
Erik Schnetter <schn...@cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/

Tim Holy

unread,
Sep 21, 2014, 1:04:27 PM9/21/14
to julia...@googlegroups.com
If you have/find a clean example, certainly posting an issue will make sense. I
can't comment on whether the task switch during I/O is inevitable.

--Tim

Erik Schnetter

unread,
Sep 21, 2014, 1:44:23 PM9/21/14
to julia...@googlegroups.com
Unfortunately I don't have a simple example that reproduces the
problem. So far, I've managed to whittle it down to an application
running in a single process without dependencies on external packages.

-erik

Jake Bolewski

unread,
Sep 21, 2014, 2:00:38 PM9/21/14
to julia...@googlegroups.com
I saw a couple of posts back that you are using MPI?  Any chance that MPI is issuing a callback on a different thread?  This could be an issue with c-interop and can be sometimes solved by following the steps in the  thread safety section of the manual.

Erik Schnetter

unread,
Sep 21, 2014, 2:30:26 PM9/21/14
to julia...@googlegroups.com
Yes, I thought the same. I thus removed the dependency on MPI; I'm now
serializing and deserializing directly, without using MPI. My current
code is at <https://bitbucket.org/eschnett/funhpc.jl/branch/memdebug>,
and running "julia Wave.jl" triggers the problem reliably in a few
seconds.

The deserialization call is in Comm.jl in the function recv_item.

-erik
Reply all
Reply to author
Forward
0 new messages