Hi all,
I'm new to Dedalus and I'm trying to work up to running some large simulations. Therefore, I want to be able to utilize as many cores as possible; however, I seem to be reaching a limit and I can't figure out why. Details below.
=========== SYSTEM ===========
TACC Stampede3 cluster:
Skylake - 48 cores/node
Icelake - 80 cores/node
Sapphire Rapids - 112 cores/node
Python Version: 3.13.0
MPI Version: Intel MPI v21.11 (Also tried with MVAPICH v4.0.0 and v3.0.0)
OpenSSL Version: 3.4.0 (This may be relevant based on error message below)
=========== PROBLEM SETUP ===========
2D Rayleigh-Benard IVP from examples
I increased the size of the problem to be sure there's enough grid for higher parallelization core counts (2048 x 512); otherwise, I didn't change anything in the code.
=========== WORKING ===========
I can run all cores on a single node for all nodes listed above, and I get the expected outputs. I can also run on all 96 cores across 2 Skylake nodes, and up to 127 cores across 2 Icelake, 2 Sapphire Rapids, or 3 Skylake nodes, respectively.
I can also utilize any number of cores (well beyond 128) for other codes which use MPI in C, Fortran, and Python within the dedalus3 conda environment I created based on the installation instructions. So this seems to be something specifically plaguing Dedalus in my particular configuration.
=========== NOT WORKING ===========
When I try 128+ cores (across any of the nodes), I get the error message shown below prior to any outputs.
=========== ERROR CODE ===========
Traceback
(most recent call last):
File
"/scratch/07445/rkelly19/dedalus_examples/RB/rayleigh_benard.py",
line 11, in <module>
import dedalus.public as d3 # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/dedalus/public.py",
line 4, in <module>
from .core.arithmetic import *
File
"/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/dedalus/core/arithmetic.py",
line 17, in <module>
from .field import Operand, Field
File
"/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/dedalus/core/field.py",
line 14, in <module>
import h5py
File
"/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/h5py/__init__.py",
line 25, in <module>
from . import _errors
ImportError: /opt/apps/xalt/xalt/lib64/libcrypto.so: version `OPENSSL_3.3.0'
not found (required by
/home1/07445/rkelly19/.conda/envs/dedalus3/lib/python3.13/site-packages/h5py/../../../././libssl.so.3)
=======================================
I have tried everything I can think of on my own, including:
- Reinstalling dedalus3 using conda
- A custom installation of dedalus3 using the Github repository
- Changing the MPI modules (mentioned above)
- Installing OpenSSL v3.3.0 to replace the default version (although the system failed to solve the environment with no error report other than it failed)
I have also tried reaching out to the TACC help desk, but so far I've had no luck with them, so I thought I'd throw it out to this group. Thanks in advance!
- Ryan
--
You received this message because you are subscribed to the Google Groups "Dedalus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedalus-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dedalus-users/9a7a1187-71c7-4cd1-8860-8d8b873b3842n%40googlegroups.com.