Question around MPI_Comm_spawn, proc flags and btl/sm

8 views
Skip to first unread message

Florent GERMAIN

unread,
Apr 9, 2025, 10:52:24 AMApr 9
to Open MPI Developers
Hi,

I have a question regarding MPI_Comm_spawn and proc flags.

What I understand about procs and spawns in ompi:
Processes are identified by the proc structure.
proc structure stores proc_name and proc_flags (and many other things).
proc_flags defines locality related to the actual process.
proc_name is a unique couple (jobid, vpid) that identifies an ompi process.

proc_name.jobid is the generation id of the process.
In spawn case, origin processes and spawned processes have different jobids. (saw it in ompi4.x, hope it is still the case in ompi5.x)


    for (int32_t proc = 0; proc < (int32_t) nprocs; ++proc) {
        /* check to see if this proc can be reached via shmem (i.e.,
           if they're on my local host and in my job) */
        if (procs[proc]->proc_name.jobid != my_proc->proc_name.jobid
            || !OPAL_PROC_ON_LOCAL_NODE(procs[proc]->proc_flags)) {
            peers[proc] = NULL;
            continue;
        }

        if (my_proc != procs[proc] && NULL != reachability) {
            /* add this proc to shared memory accessibility list */
            rc = opal_bitmap_set_bit(reachability, proc);
            if (OPAL_SUCCESS != rc) {
                return rc;
            }
        }

        /* setup endpoint */
        rc = init_sm_endpoint(peers + proc, procs[proc]);
        if (OPAL_SUCCESS != rc) {
            break;
        }
    }
It prevents btl/sm to be selected between processes that are not in the same spawn generation (procs[proc]->proc_name.jobid != my_proc->proc_name.jobid).
A simple spawn test results in this error (mono-node test).

--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[58931,2],20]) is on host: pm0-nod48
  Process 2 ([[58931,1],0]) is on host: unknown!
  BTLs attempted: vader self

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------

It also seems like proc_flags are not valid:
OPAL_PROC_ON_LOCAL_NODE(procs[proc]->proc_flags) returns true for a process spawned on another node.

The ompi tested is based on 4.1.7 (+ some of our code), configured with pmix-5.0.3 and hwloc=internal, ran with salloc ... mpirun ...

(And the questions)

Is it intended?
Should I try to reproduce with ompi-5 and open an issue?

Thanks,

Florent GERMAIN

Ingénieur de développement – BDS-R&D
2 rue de la Piquetterie – Bruyères le Chatel – France
eviden.com
LinkedIn icon Twitter icon Instagram icon YouTube icon 

Eviden logo

an atos business

 

 

Gilles Gouaillardet

unread,
Apr 9, 2025, 11:08:43 AMApr 9
to de...@lists.open-mpi.org
Florent,

Long story short, yes, this is a known limitation.
btl/sm cannot be used for intra-nodes communications between processes from different "jobs" (MPI_Comm_spawn() creates a new job),
so you will use the interconnect (if it allows it) or btl/tcp (assuming pml/ob1 had been selected).

IIRC, the issue for using btl/sm is the size of the shared memory used for the inter-node communications.
A straightforward implementation requires the maximum size must be known when the application is started.
I think the idea to improve it (allocating for "n" slots per node, which means up to n MPI tasks at any given time can use btl/sm) was evoked but I do not remember someone tried a proof-of-concept.

I will let the developers shed some more light on that topic.

Cheers,

Gilles

To unsubscribe from this group and stop receiving emails from it, send an email to devel+un...@lists.open-mpi.org.
Reply all
Reply to author
Forward
0 new messages