--
You received this message because you are subscribed to the Google Groups "UPC++" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/upcxx/a5233fbf-060a-4a55-b76e-08ec1f8e65f4n%40googlegroups.com.
Depending on the libfabric provider in use, there may be restrictions on how
mpi-based spawning is used. In particular, the psm2 provider has the
property that each process may only open the network adapter once. If you
wish to use mpi-spawner, please consult its README for advice on how to set
your MPIRUN_CMD to use TCP/IP.
If one is using mpi-conduit, or is expecting to run hybrid GASNet+MPI
applications with any conduit, then MPIRUN_CMD should be set as one would
normally use mpirun for MPI applications. However, since mpi-spawner itself
uses MPI only for initialization and finalization (and MPI is not used for
GASNet communications other than in mpi-conduit) one can potentially reduce
resource use by setting MPIRUN_CMD such that the MPI will use TCP/IP for its
communication. Below are recommendations for several MPIs. When recommended
to set an environment variable, the most reliable way is to prefix the mpirun
command. For instance
MPIRUN_CMD='env VARIABLE=value mpirun [rest of your template]'
- Open MPI
Pass "--mca btl tcp,self" to mpirun, OR
Set environment variable OMPI_MCA_BTL=tcp,self.
- MPICH or MPICH2
These normally default to TCP/IP, so no special action is required.
- MVAPICH family
These most often support only InfiniBand and therefore are not
recommended for use as a launcher for GASNet applications if one
is concerned with reducing resource usage.
- Intel MPI
Set environment variable I_MPI_DEVICE=sock.
- HP-MPI
Set environment variable MPI_IC_ORDER=tcp.
- LAM/MPI
Pass "-ssi rpi tcp" to mpirun, OR
Set environment variable LAM_MPI_SSI_rpi=tcp.
To view this discussion on the web visit https://groups.google.com/d/msgid/upcxx/1c516acb-d6d6-4192-a8b8-f1ee0c69dc30n%40googlegroups.com.
setting I_MPI_DEVICE=sock or I_MPI_OFI_PROVIDER=sockets does not work. These are some of the error messages:MPI startup(): I_MPI_DEVICE environment variable is not supported.MPI startup(): To check the list of supported variables, use the impi_info utility or refer to https://software.intel.com/en-us/mpi-library/documentation/get-started.FATAL ERROR (proc 0): in gasnetc_ofi_init() at 2.3.0_iofi/bld/GASNet-2022.3.0/ofi-conduit/gasnet_ofi.c:618: OFI provider 'psm2' selected at configure time is not available at run time and/or has been overridden by FI_PROVIDER='sockets' in the environment.
Instead I had to configure and "make all" again using --with-ofi-provider=sockets, which ended up working. This is the complete configure-command:./configure --prefix=upcxx-install-path --disable-smp --disable-udp --disable-ibv --enable-ofi --with-ofi-spawner=mpi --with-cc=mpiicc --with-cxx=mpiicpc --with-ofi-provider=sockets
I have some other questions:Before program execution I get following warning:WARNING: Using OFI provider (sockets), which has not been validated to provide
WARNING: acceptable GASNet performance. You should consider using a more
WARNING: hardware-appropriate GASNet conduit. See ofi-conduit/README.
WARNING: ofi-conduit is experimental and should not be used for
performance measurements.But because I want to measure the performance of my program like for example its scalability, is there something I have to consider, or can I safely ignore this message?
Also I noticed some inconsistencies of the output of my program. My program parallelizes a big for loop assigning the iterations to the processes withif (upcxx::rank_me() == i % upcxx::rank_n())where each iteration also outputs its loop iteration and upc++-process-rank. Using mpirun/mpiexec/upcxx-run with two processes in an interactive session all leads to pretty even outputs from both processes (i.e. both pretty evenly taking turns outputting about one message at a time). But as a batch-job srun results in a process outputting about 10 to 50 iterations at a time before the other process also outputs about 10 to 50 iterations, while for mpirun/mpiexec it's about 100 to 200 iterations per turn. upcxx-run still doesn't work in the batch-job with the same error message as in my very first post. Do you have an idea, what might cause all this?
Hi Dan,unfortunately, the patch didn't work and the error message was the same (OFI provider 'psm2' selected at configure time is not available at run time and/or has been overridden by FI_PROVIDER='sockets' in the environment.)
So I switched to an older version of Intel MPI (ver. 2018.4.274) and setting the environment variable I_MPI_FABRICS=tcp ended up working.
I still get this warning:WARNING: ofi-conduit is experimental and should not be used for
performance measurements.
Please see `ofi-conduit/README` for more details.But this warning can be ignored in my performance evaluation, right?
--
You received this message because you are subscribed to the Google Groups "UPC++" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upcxx+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/upcxx/d03ea7be-32ec-44a0-9ade-2f08b973dfb6n%40googlegroups.com.