Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Unable to parallelize with high core count

29 views
Skip to first unread message

Ryan Kelly

unread,
Dec 5, 2024, 12:48:45 PM12/5/24
to Dedalus Users

Hi all,

 

I'm new to Dedalus and I'm trying to work up to running some large simulations. Therefore, I want to be able to utilize as many cores as possible; however, I seem to be reaching a limit and I can't figure out why. Details below.

 

=========== SYSTEM ===========

TACC Stampede3 cluster:

Skylake - 48 cores/node

Icelake - 80 cores/node

Sapphire Rapids - 112 cores/node

 

Python Version: 3.13.0

MPI Version: Intel MPI v21.11 (Also tried with MVAPICH v4.0.0 and v3.0.0)

OpenSSL Version: 3.4.0 (This may be relevant based on error message below)

 

=========== PROBLEM SETUP ===========

2D Rayleigh-Benard IVP from examples

I increased the size of the problem to be sure there's enough grid for higher parallelization core counts (2048 x 512); otherwise, I didn't change anything in the code.

 

=========== WORKING ===========

I can run all cores on a single node for all nodes listed above, and I get the expected outputs. I can also run on all 96 cores across 2 Skylake nodes, and up to 127 cores across 2 Icelake, 2 Sapphire Rapids, or 3 Skylake nodes, respectively.

I can also utilize any number of cores (well beyond 128) for other codes which use MPI in C, Fortran, and Python within the dedalus3 conda environment I created based on the installation instructions. So this seems to be something specifically plaguing Dedalus in my particular configuration.

 

=========== NOT WORKING ===========

When I try 128+ cores (across any of the nodes), I get the error message shown below prior to any outputs.

=========== ERROR CODE ===========

Traceback (most recent call last):
  File "/scratch/07445/rkelly19/dedalus_examples/RB/rayleigh_benard.py", line 11, in <module>
    import dedalus.public as d3 # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/dedalus/public.py", line 4, in <module>
    from .core.arithmetic import *
  File "/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/dedalus/core/arithmetic.py", line 17, in <module>
    from .field import Operand, Field
  File "/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/dedalus/core/field.py", line 14, in <module>
    import h5py
  File "/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/h5py/__init__.py", line 25, in <module>
    from . import _errors
ImportError: /opt/apps/xalt/xalt/lib64/libcrypto.so: version `OPENSSL_3.3.0' not found (required by /home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/h5py/../../../././libssl.so.3)

=======================================

 I have tried everything I can think of on my own, including:

- Reinstalling dedalus3 using conda

- A custom installation of dedalus3 using the Github repository

- Changing the MPI modules (mentioned above)

- Installing OpenSSL v3.3.0 to replace the default version (although the system failed to solve the environment with no error report other than it failed)

I have also tried reaching out to the TACC help desk, but so far I've had no luck with them, so I thought I'd throw it out to this group. Thanks in advance!

- Ryan

Keaton Burns

unread,
Dec 6, 2024, 10:50:56 AM12/6/24
to dedalu...@googlegroups.com
Hi Ryan,

The error indicates that this is an issue in importing the h5py library, not in Dedalus, so I’d suggest searching the h5py mailing list and issue tracker for help.

Best,
-Keaton


--
You received this message because you are subscribed to the Google Groups "Dedalus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedalus-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dedalus-users/9a7a1187-71c7-4cd1-8860-8d8b873b3842n%40googlegroups.com.

Ryan Kelly

unread,
Dec 6, 2024, 4:04:39 PM12/6/24
to Dedalus Users
Thanks for pointing that out, I missed seeing that was the root cause. Turns out all I needed to do was force reinstall h5py and it works fine.

Best,
Ryan

Reply all
Reply to author
Forward
0 new messages