Running Dedalus at ARC4 at the University of Leeds

322 views
Skip to first unread message

Andrei Igoshev

unread,
Feb 11, 2022, 7:06:22 AM2/11/22
to Dedalus Users
Dear All,

Thank you very much for answering my previous questions.

I am trying to scale up my simulation and run it at the local cluster of University of Leeds.


I updated variables MPI_PATH, FFTW_PATH and HDF5_DIR to use pre-installed software on the cluster. I have also added flag --user to all pip install instructions to install files locally.
However, the tests do not work correctly.

Please find below some first lines showing the error output.

If I try to run my own code which works perfectly on my laptop, it starts, but each core computes everything. So Dedalus essentially behaves like multiple copies of the code run in parallel and each copy has only a single core to run on. What can I do to resolve this problem?

python3 -m dedalus test

2022-02-11 12:00:57,995 dedalus 0/1 WARNING :: Threading has not been disabled. This may massively degrade Dedalus performance.

2022-02-11 12:00:57,995 dedalus 0/1 WARNING :: We strongly suggest setting the "OMP_NUM_THREADS" environment variable to "1".

--------------------------------------------------------------------------

A process has executed an operation involving a call to the

"fork()" system call to create a child process.  Open MPI is currently

operating in a condition that could result in memory corruption or

other system errors; your job may hang, crash, or produce silent

data corruption.  The use of fork() (or system() or other calls that

create child processes) is strongly discouraged.


The process that invoked fork was:


  Local host:          [[17667,1],0] (PID 125498)


If you are *absolutely sure* that your application will successfully

and correctly survive a call to fork(), you may disable this warning

by setting the mpi_warn_on_fork MCA parameter to 0.

--------------------------------------------------------------------------

============================================================================================= test session starts ==============================================================================================

platform linux -- Python 3.7.4, pytest-7.0.0, pluggy-1.0.0

benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)

rootdir: /home/home01/amtai

plugins: cov-3.0.0, benchmark-3.4.1, parallel-0.1.1

collected 7010 items / 1618 deselected / 5392 selected                                                                                                                                                         

pytest-parallel: 40 workers (processes), 1 test per worker (thread)

Fatal Python error: Segmentation fault


Current thread 0x00007f1d440f4700 (most recent call first):

  File "/home/home01/amtai/.local/lib/python3.7/site-packages/dedalus/core/distributor.py", line 110 in __init__

  File "/home/home01/amtai/.local/lib/python3.7/site-packages/dedalus/tests/test_cartesian_operators.py", line 23 in build_FF

  File "/home/home01/amtai/.local/lib/python3.7/site-packages/dedalus/tools/cache.py", line 86 in __call__

  File "/home/home01/amtai/.local/lib/python3.7/site-packages/dedalus/tests/test_cartesian_operators.py", line 95 in test_skew_explicit

  File "/home/home01/amtai/.local/lib/python3.7/site-packages/_pytest/python.py", line 192 in pytest_pyfunc_call

  File "/home/home01/amtai/.local/lib/python3.7/site-packages/pluggy/_callers.py", line 39 in _multicall

Fatal Python error:   File "/Segmentation faultho


Curtis Saxton

unread,
Feb 11, 2022, 8:36:44 AM2/11/22
to Dedalus Users
I have installed and used Dedalus extensively on ARC4 (but perhaps not this latest version).  I can think of about six other colleagues who use/used it on ARC4 and ARC3.  Everyone ends up doing an individual user installation.  I would keep mine on the /nobackup disk.  Obstacles to installation arise on ARC4 more often than on ARC3.  I don't think that your problems are specific to the newest version of Dedalus, are they?

For me, it helps *greatly* to unload all modules before starting the installer.  This allows the dedalus installer to use its own preferred FFTW, MPI and HDF5, which minimises conflicts when you're running Dedalus later.  

Later on, our simulation job scripts might need to unload all standard modules and then explicitly load a minimal number of modules needed by dedalus. 

I guess you were one of the masked figures near my research group's luncheon today.  Several other people at that table are well practised at installing Dedalus on ARC4.  It's just a pity that nobody has a permanent office in the covid era.  In pre-covid times, someone would have fixed dedalus for you quickly.

I hope that these hints might be helpful. 
Persist, because it does work on ARC4.

Curtis Saxton

unread,
Feb 11, 2022, 8:43:13 AM2/11/22
to Dedalus Users
PS:  I've never previously seen Dedalus on arc4 try to run as multiple instances of a single-core process before.  Perhaps you need to put "mpirun" or "mpiexec" before "python3"?   If so, you will also need to write some options stating how many cores you're using.

On Friday, 11 February 2022 at 12:06:22 UTC ign...@gmail.com wrote:

Andrei Igoshev

unread,
Feb 11, 2022, 8:56:38 AM2/11/22
to dedalu...@googlegroups.com
Hi Curtis,

>I guess you were one of the masked figures near my research group's luncheon today. 
Thank you for your fast reply! Indeed, it was me.

I do believe that it is indeed possible to run Dedalus at ARC4. Probably I did some configuration wrong.
I load multiple libraries:

module list
Currently Loaded Modulefiles:
  1) licenses              2) sge                   3) intel/19.0.4          4) user                  5) fftw/3.3.8            6) hdf5/1.10.5           7) python/3.7.4          8) intelmpi/2019.4.243

Ok, I will try to remove Dedalus and install it with no loaded modules.

P.S. This is how I try to run it for testing purposes:

mpirun -n 4 python3 ambipolar1.py

2022-02-11 13:48:57,415 dedalus 0/1 WARNING :: Threading has not been disabled. This may massively degrade Dedalus performance.

2022-02-11 13:48:57,415 dedalus 0/1 WARNING :: We strongly suggest setting the "OMP_NUM_THREADS" environment variable to "1".

2022-02-11 13:48:57,471 dedalus 0/1 WARNING :: Threading has not been disabled. This may massively degrade Dedalus performance.

2022-02-11 13:48:57,471 dedalus 0/1 WARNING :: We strongly suggest setting the "OMP_NUM_THREADS" environment variable to "1".

2022-02-11 13:48:57,493 dedalus 0/1 WARNING :: Threading has not been disabled. This may massively degrade Dedalus performance.

2022-02-11 13:48:57,493 dedalus 0/1 WARNING :: We strongly suggest setting the "OMP_NUM_THREADS" environment variable to "1".

2022-02-11 13:48:57,502 dedalus 0/1 WARNING :: Threading has not been disabled. This may massively degrade Dedalus performance.

2022-02-11 13:48:57,503 dedalus 0/1 WARNING :: We strongly suggest setting the "OMP_NUM_THREADS" environment variable to "1".

2022-02-11 13:48:57,728 numexpr.utils 0/1 INFO :: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

2022-02-11 13:48:57,728 numexpr.utils 0/1 INFO :: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

2022-02-11 13:48:57,728 numexpr.utils 0/1 INFO :: NumExpr defaulting to 8 threads.

2022-02-11 13:48:57,728 numexpr.utils 0/1 INFO :: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

2022-02-11 13:48:57,728 numexpr.utils 0/1 INFO :: NumExpr defaulting to 8 threads.

2022-02-11 13:48:57,728 numexpr.utils 0/1 INFO :: NumExpr defaulting to 8 threads.

2022-02-11 13:48:57,736 numexpr.utils 0/1 INFO :: Note: NumExpr detected 40 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

2022-02-11 13:48:57,736 numexpr.utils 0/1 INFO :: NumExpr defaulting to 8 threads.

Numpy version:  1.21.5

Properties of MPI run: ncpu =  1

mesh =  [1]

Folder  run_81_cluster  already exists. We could rewrite some important previos results

Folder  run_81_cluster/init  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B  already exists. We could rewrite some important previos results

Folder  run_81_cluster/A  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B_surface_map  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B_surface  already exists. We could rewrite some important previos results

Numerical parameters of the simulations are as following:

vamb0 =  1.24562826980753e-10

Am0   =  0.03127457063344173

K     =  520.2477434344698

d1    =  0.006420228068296214

d2    =  0.0019221611484528424

d3    =  0.0019221611484528424

d4    =  1.0

s     =  1e-06

2022-02-11 13:49:03,766 subsystems 0/1 INFO :: Building subproblem matrices 1/3160 (~0%) Elapsed: 0s, Remaining: 15m 34s, Rate: 3.4e+00/s

Numpy version:  1.21.5

Properties of MPI run: ncpu =  1

mesh =  [1]

Folder  run_81_cluster  already exists. We could rewrite some important previos results

Folder  run_81_cluster/init  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B  already exists. We could rewrite some important previos results

Folder  run_81_cluster/A  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B_surface_map  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B_surface  already exists. We could rewrite some important previos results

Numerical parameters of the simulations are as following:

vamb0 =  1.24562826980753e-10

Am0   =  0.03127457063344173

K     =  520.2477434344698

d1    =  0.006420228068296214

d2    =  0.0019221611484528424

d3    =  0.0019221611484528424

d4    =  1.0

s     =  1e-06

2022-02-11 13:49:03,805 subsystems 0/1 INFO :: Building subproblem matrices 1/3160 (~0%) Elapsed: 0s, Remaining: 16m 06s, Rate: 3.3e+00/s

Numpy version:  1.21.5

Properties of MPI run: ncpu =  1

mesh =  [1]

Folder  run_81_cluster  already exists. We could rewrite some important previos results

Folder  run_81_cluster/init  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B  already exists. We could rewrite some important previos results

Folder  run_81_cluster/A  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B_surface_map  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B_surface  already exists. We could rewrite some important previos results

Numerical parameters of the simulations are as following:

vamb0 =  1.24562826980753e-10

Am0   =  0.03127457063344173

K     =  520.2477434344698

d1    =  0.006420228068296214

d2    =  0.0019221611484528424

d3    =  0.0019221611484528424

d4    =  1.0

s     =  1e-06

2022-02-11 13:49:04,266 subsystems 0/1 INFO :: Building subproblem matrices 1/3160 (~0%) Elapsed: 0s, Remaining: 20m 28s, Rate: 2.6e+00/s

Numpy version:  1.21.5

Properties of MPI run: ncpu =  1

mesh =  [1]

Folder  run_81_cluster  already exists. We could rewrite some important previos results

Folder  run_81_cluster/init  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B  already exists. We could rewrite some important previos results

Folder  run_81_cluster/A  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B_surface_map  already exists. We could rewrite some important previos results

Folder  run_81_cluster/B_surface  already exists. We could rewrite some important previos results

Numerical parameters of the simulations are as following:

vamb0 =  1.24562826980753e-10

Am0   =  0.03127457063344173

K     =  520.2477434344698

d1    =  0.006420228068296214

d2    =  0.0019221611484528424

d3    =  0.0019221611484528424

d4    =  1.0

s     =  1e-06

2022-02-11 13:49:04,357 subsystems 0/1 INFO :: Building subproblem matrices 1/3160 (~0%) Elapsed: 0s, Remaining: 20m 27s, Rate: 2.6e+00/s

--
You received this message because you are subscribed to a topic in the Google Groups "Dedalus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dedalus-users/WWYShpo9vJE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dedalus-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dedalus-users/a6e792b3-13e9-451e-a175-0c1e1fe870b9n%40googlegroups.com.

Curtis Saxton

unread,
Feb 11, 2022, 9:22:03 AM2/11/22
to Dedalus Users
I've never seen those "threading" warnings before.  Are they entirely responsible for stopping your simulation? 

If you copy and modify somebody else's ARC4 job submission script for dedalus, that might prevent some mysterious hidden problems involving the environment variables.  Everyone has a slightly different method, since everyone figures out a slightly different way of installing dedalus on arc4.  I could name colleagues who might help, or direct you to the local file-paths for some of my own arc4 scripts to copy/modify (but that might be inappropriate or insecure on a public discussion forum?).


Andrei Igoshev

unread,
Feb 11, 2022, 9:51:54 AM2/11/22
to dedalu...@googlegroups.com
Ok, I figured out the solution. 

Problem is solved if I use different MPI:

module remove intelmpi/2019.4.243
module add openmpi/3.1.4

Despite the fact that during the configuration I specified intelmpi path.

Thank you for your help!

Cheers,
Andrei

Keaton Burns

unread,
Feb 11, 2022, 12:07:10 PM2/11/22
to dedalu...@googlegroups.com
Hi Andrei,

Great! Also yes the threading warning is new — I strongly recommended testing with your scripts with the environment variable OMP_NUM_THREADS=1 — I’ve found it makes a big difference in performance on my systems.
 
Best,
-Keaton



You received this message because you are subscribed to the Google Groups "Dedalus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedalus-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dedalus-users/CAMa8c%2BTLOf2Ojh5pJaoHJvoJ_6DtuDAUYR9zfd4LjTg%2BcVVs5Q%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages