Hello,
I am attempting to configure / build UPC++ (master branch) on LLNL Lassen (Summit testbed).
UPC++ configures , builds, and links, but it does not give correct scaling behavior.
I believe the `--with-prmirun-cmd` option / argument is the issue – no obvious variation on what I’ve highlighted below yields correct behavior.
My understanding is that that library has been successfully use for applications at scale (MetaHipMer) on Summit.
I need to use UPC++ on Lassen, so I’m hoping someone can help me figure out how to make the library work on Lassen.
FWIW, on an AMD / flux machine, you DO have an indication that the UPC++ build has PMI support (at the end of the build) – not so on the older, presumably more established , CUDA machine.
I’ve also pasted a possibly relevant email from John Gyllenhall from 2023 below on the same topic – somehow, the issue appeared to be resolved for a time , but that might simply be down to my not having tested the installation rigorously.
I would be grateful for any help you could offer on getting UPC++ correctly configured , built and running on Lassen.
Best,
AJP
#UPC++ Configure
../configure \
--with-cxx=mpicxx \
--with-cc=mpicc \
--with-pmirun-cmd="jsrun -p %N %C" \
--disable-pshm-posix \
--enable-pshm-sysv \
--disable-smp \
--enable-udp \
--enable-mpi \
--enable-cuda \
--with-default-network=ibv \
--enable-ibv \
--with-ibv-physmem-max=3/4 \
--enable-ibv-multirail \
--with-ibv-max-hcas=4 \
--with-ibv-ports="mlx5_0+mlx5_3" \
--disable-ibv-odp \
--disable-ibv-conn-thread \
--disable-ibv-rcv-thread \
--with-cxxflags=-std=c++17 \
--with-gasnet=https://bitbucket.org/berkeleylab/gasnet/downloads/GASNet-stable.tar.gz \
--prefix=${INSTALL_DIR}
#Configure Result (Lassen)
--------------------------------------------------------------------
GASNet configure warning summary:
It appears your system has the required support for ucx-conduit.
However, ucx-conduit is still experimental, and may have performance and correctness bugs.
You can enable experimental support with --enable-ucx.
Otherwise, you can disable this message with --disable-ucx
----------------------------------------------------------------------
GASNet configuration:
Portable conduits:
-----------------
Portable SMP-loopback conduit (smp) OFF (disabled)
OpenFabrics Interfaces conduit (ofi) OFF (disabled)
Portable UDP/IP conduit (udp) ON (enabled)
Portable MPI conduit (mpi) ON (enabled)
Native, high-performance conduits:
---------------------------------
Unified Communication X conduit (ucx) OFF (auto)
InfiniBand IB Verbs conduit (ibv) ON (enabled)
Memory kinds:
------------
GPUs with NVIDIA CUDA API (cuda-uva) ON (enabled)
GPUs with AMD HIP API (hip) OFF (not enabled)
GPUs with Intel oneAPI *EXPERIMENTAL* (ze) OFF (not enabled)
Some conduits and memory kinds require --enable-XXX configure flags and/or
additional variables providing the install location of vendor drivers.
See the GASNet documentation for details.
Misc Settings
-------------
MPI compatibility: yes
Pthreads support: yes
Segment config: fast
PSHM support: sysv
Atomics support: native
PMI support: no
#Configure Result (Tioga)
However, ucx-conduit is still experimental, and may have performance and correctness bugs.
You can enable experimental support with --enable-ucx.
Otherwise, you can disable this message with --disable-ucx
----------------------------------------------------------------------
GASNet configuration:
Portable conduits:
-----------------
Portable SMP-loopback conduit (smp) OFF (disabled)
OpenFabrics Interfaces conduit (ofi) ON (enabled)
Portable UDP/IP conduit (udp) ON (enabled)
Portable MPI conduit (mpi) ON (enabled)
Native, high-performance conduits:
---------------------------------
Unified Communication X conduit (ucx) OFF (auto)
InfiniBand IB Verbs conduit (ibv) OFF (disabled)
Memory kinds:
------------
GPUs with NVIDIA CUDA API (cuda-uva) OFF (not enabled)
GPUs with AMD HIP API (hip) ON (enabled)
GPUs with Intel oneAPI *EXPERIMENTAL* (ze) OFF (not enabled)
Some conduits and memory kinds require --enable-XXX configure flags and/or
additional variables providing the install location of vendor drivers.
See the GASNet documentation for details.
Misc Settings
-------------
MPI compatibility: yes
Pthreads support: yes
Segment config: fast
PSHM support: posix
Atomics support: native
PMI support: yes (Cray)
# John G.’s email in 2023
Hi Paul;
I will add that from ORNL’s summit documentation, it sounds like they set up special support there for upcxx under jsrun. They advise that it will only work using upcxx-jsrun and the upcxx-jsrun script appears mainly to set some magic environment variables that are of the form ORNL uses to do something special in their job launcher:
__UPCXX_RUN_SUMMIT_MODE=highbandwidth
__UPCXX_RUN_SUMMIT=1
__UPCXX_RUN_SUMMIT_nHCAS=1
Are there special daemons that get launched on summit when running upcxx?
Do you know the details of how upcxx support works on summit?
We are trying to figure out if there is something quick Amy can do. My tests with just jsrun have not been successful but at least runs on multiple nodes, which is a good first step.
Thanks,
-John G.