brian2 and multiprocessing- general considerations

78 views
Skip to first unread message

Wilhelm Braun

unread,
Feb 3, 2020, 11:36:08 AM2/3/20
to Brian Development
Dear all,

I intend to use brian2 (version 2.3) on larger clusters, using python3 with multiprocessing. I sometimes observe that, if too many processes are started on different nodes, some earlier started processes go to sleep and don't wake up again (i.e. the program is not executed anymore). Using

clear_cache('cython')

after cancelling the jobs that have fallen asleep restores the behavior and process can be started again.
Do people have experience in how to use brian2 with multiprocessing?

I am looking forward to any answers and suggestions.

Best, Wilhelm

Marcel Stimberg

unread,
Feb 3, 2020, 12:20:56 PM2/3/20
to brian-de...@googlegroups.com
Hi Wilhelm,

we have a mechanism in place that tries to prevent conflicts between
multiple processes that try to compile the same file (because they use
identical code). Your problem sounds as if there is a problem with this
mechanism, i.e. some deadlock between processes. It is hard to debug a
problem like this, but maybe we could add additional information to the
log file to help.

In the mean time, I can think of two potential workarounds:

1. You run one of your simulations first, so that it can create and
compile all the necessary files. You can then start all of the other
simulations in parallel, they then should all be able to fetch the
compiled files from disk without interfering with each other. In that
case, you could even run them with the
codegen.runtime.cython.multiprocess_safe preference set to false, since
our usual locking mechanism wouldn't be necessary anymore.
2. You can assign each process their own cache directory so that they
are completely independent of each other. You can configure this
directory with the codegen.runtime.cython.cache_dir preference – by
default, all processes use something like ~/.cython/brian_extensions as
their cache.

Best,

  Marcel


Wilhelm Braun

unread,
Feb 4, 2020, 7:50:08 AM2/4/20
to Brian Development
Hi Marcel,

thanks for your input. The error I get when the processes go to sleep is

WARNING    Cannot use Cython, a test compilation failed: [Errno 35] Resource deadlock avoided (OSError) [brian2.codegen.runtime.cython_rt.cython_rt.failed_compile_test]
INFO      
Cannot use compiled code, falling back to the numpy code generation target. Note that this will likely be slower than using compiled code. Set the code generation to numpy manually to avoid this message:
prefs
.codegen.target = "numpy" [brian2.devices.device.codegen_fallback]

I tried to make the cython cache use a different output directory, using the command you mention directly in the python simulation script, after importing brian2. I then get this error:

codegen.runtime.cython.cache_dir = '/home/wilhelm/projects/test_1'
AttributeError: module 'brian2.codegen.runtime' has no attribute 'cython'

This seems odd and points maybe to a bigger issue.

Thanks for your answer and suggestions.

Best, Wilhelm

Marcel Stimberg

unread,
Feb 4, 2020, 8:09:47 AM2/4/20
to brian-de...@googlegroups.com

Hi Wilhelm,


thanks for your input. The error I get when the processes go to sleep is

WARNING    Cannot use Cython, a test compilation failed: [Errno 35] Resource deadlock avoided (OSError) [brian2.codegen.runtime.cython_rt.cython_rt.failed_compile_test]
INFO      
Cannot use compiled code, falling back to the numpy code generation target. Note that this will likely be slower than using compiled code. Set the code generation to numpy manually to avoid this message:
prefs
.codegen.target = "numpy" [brian2.devices.device.codegen_fallback]


Interesting. The deadlock occurs not when compiling the code of your actual model but when compiling the test file used to determine whether Cython is available. You can avoid this compilation by setting

prefs.codegen.target = 'cython'

explicitly (by default, it is "auto"). But there might be something wrong with our implementation of the locking mechanism in general, I'm thinking of replacing it by a somewhat more sophisticated version: https://github.com/benediktschmitt/py-filelock


I tried to make the cython cache use a different output directory, using the command you mention directly in the python simulation script, after importing brian2. I then get this error:

codegen.runtime.cython.cache_dir = '/home/wilhelm/projects/test_1'
AttributeError: module 'brian2.codegen.runtime' has no attribute 'cython'

codegen.runtime.cython.cache_dir is a preference, you'll therefore have to do:

prefs.codegen.runtime.cython.cache_dir = ...

or

prefs['codegen.runtime.cython.cache_dir'] = ...


Best,

  Marcel


Marcel Stimberg

unread,
Feb 4, 2020, 9:33:30 AM2/4/20
to brian-de...@googlegroups.com
Hi again,

I swapped out the file locking mechanism we used, I'd be grateful if you
could try it out and see whether it fixes the issue for you. You'd have
to install brian2 from the filelock branch, e.g. by using

pip install https://github.com/brian-team/brian2/archive/filelock.zip

If you installed brian2 via conda, I'd recommend to uninstall it first.

Best,

  Marcel


Wilhelm Braun

unread,
Feb 4, 2020, 10:08:07 AM2/4/20
to Brian Development
Thanks for this suggestion, I did another install of brian2 using your link in a new virtual env and it worked. As a small detail,  now get a lot of these messages:

INFO       Lock 139655436729872 acquired on /home/wilhelm/.cython/brian_extensions/_cython_magic_ffbd4e816d2c35fa3fa6d87e8b22f05d.lock [brian2.utils.filelock]
INFO      
Lock 139655436729872 released on /home/wilhelm/.cython/brian_extensions/_cython_magic_ffbd4e816d2c35fa3fa6d87e8b22f05d.lock [brian2.utils.filelock]

which can be switched off using

logging.console_log_level = 'CRITICAL'

or similar.

What also worked with the old installation of brian2 was setting

prefs.codegen.target = 'numpy'

but this slows down the simulations by approx. a factor of 2.


Best,

Wilhelm

Marcel Stimberg

unread,
Feb 4, 2020, 10:12:29 AM2/4/20
to brian-de...@googlegroups.com
Thanks for this suggestion, I did another install of brian2 using your link in a new virtual env and it worked. As a small detail,  now get a lot of these messages:

INFO       Lock 139655436729872 acquired on /home/wilhelm/.cython/brian_extensions/_cython_magic_ffbd4e816d2c35fa3fa6d87e8b22f05d.lock [brian2.utils.filelock]
INFO      
Lock 139655436729872 released on /home/wilhelm/.cython/brian_extensions/_cython_magic_ffbd4e816d2c35fa3fa6d87e8b22f05d.lock [brian2.utils.filelock]

Oh, we definitely don't want this. Filelock puts these messages at the info level which we display by default, I'll downgrade them to debug instead.

Are you confident that the new version fixes all your deadlock issues (i.e., could you reliably trigger them before?) or do you need to test for a longer time?

Best,

  Marcel


Wilhelm Braun

unread,
Feb 4, 2020, 10:37:43 AM2/4/20
to Brian Development
Ok, great.

Sorry, I have to contradict my last post. The problem is not yet completely solved, but I can run simulations on larger parts of my cluster, however, I am not sure as to how large this part is. I'll look into it more in-depth and then let you know.

Best, Wilhelm

Wilhelm Braun

unread,
Feb 5, 2020, 5:53:58 AM2/5/20
to Brian Development
Hi,

another observation is that with the new Filelock, there seems to be a lot of overhead, i.e. the processes are constantly starting and stopping, which really slows down the computation. I don't understand why this sometimes happens, and sometimes not, even after clearing the cython cache.

Best, Wilhelm

Marcel Stimberg

unread,
Feb 5, 2020, 12:34:04 PM2/5/20
to brian-de...@googlegroups.com
Hi Wilhelm,

I opened a new github issue about this topic, let's continue discussing
there: https://github.com/brian-team/brian2/issues/1154

> another observation is that with the new Filelock, there seems to be a
> lot of overhead, i.e. the processes are constantly starting and
> stopping, which really slows down the computation. I don't understand
> why this sometimes happens, and sometimes not, even after clearing the
> cython cache.


Do you mean that processes are starting and stopping during the run? Or
are your runs so short that this is difficult to say? All the
compilation/locking should only be relevant at the very start of a run,
after that it should run all code from memory.

Best,

  Marcel

Wilhelm Braun

unread,
Feb 5, 2020, 2:31:51 PM2/5/20
to Brian Development
Hi Marcel,

I hope it's ok to answer your last question below here.

>Do you mean that processes are starting and stopping during the run? Or
>are your runs so short that this is difficult to say?

I observe two kinds of erroneous behavior. The first is that after starting many processes, they all go to sleep and never start. The second is that some start, and execute, but not all, which slows down the computation.

On a cluster with many nodes, what I also observed was that there is a 'critical' number of processes I can start on different nodes, all using the same cython cache directory, before all processes fall asleep indefinitely. All of this happens with the brian2 version using the new filelock.

Best, Wilhelm
Reply all
Reply to author
Forward
0 new messages