Thurs Nov 6 Devel telecon

0 views
Skip to first unread message

Ralph Castain

unread,
Nov 3, 2025, 11:01:07 AMNov 3
to pm...@googlegroups.com
Hi folks

Just a friendly reminder to this month’s developer’s telecon on this Thursday Nov 6 at the usual time (11am US Eastern).

NOTE: The US has changed their clocks, dropping back one hour. Please adjust accordingly!

Topics for discussion:

Primarily want to discuss future plans for release strategies. We currently try to allow PRRTE to use earlier versions of PMIx - e.g., PRRTE v4 to use PMIx v5 and above. This is becoming increasingly difficult to support. I will provide a little history behind some of the design decisions that drive this problem, and will propose that we restrict PRRTE (via configure logic) more rigidly to a given PMIx series. This would, for example, mean that PRRTE v5 would be restricted to the PMIx v7 series.

Along that line, I’d like to discuss what to do with the next PMIx release. There has been a fair amount of change to the library, and I’m increasingly leery of trying to back port it all to the v6 series (absolutely no way it is going back to the v5). We have new interfaces (i.e., the new server module entries), new attributes, and some internal behavioral changes. I’m therefore thinking that the next release should be a completely new v7.0 branch from the master branch.

Ditto for PRRTE - it would be a new v5.0 branch.

Probably the more controversial part of this: I propose to drop all support for PMIx v5 and 6 series, along with all support for PRRTE v3 and v4. I confess that I am wearing out, and continual pings from all the various releases and combinations are just too much. So I propose to only support the new releases, and any problems found with prior ones will be met with a suggestion to upgrade.

Look forward to your thoughts!


Since we last met in October, the following things have been done.

PMIx:

* many warnings from the Coverity static code analyzer have been resolved. Our defect density is not effectively zero. This does not mean that there are no bugs, of course - but it is a milestone.

* Extended listener thread port specification to support ranges. Used the same attribute, but that attribute can now have either an integer (to specify the exact port to use) or a string of comma-delimited ranges of ports to try. The ports will be tried in the order provided until one succeeds.

* Switched from using thread mutex’s to atomic values for protecting internal flags. PMIx uses a series of flags internally for things like checking to see if the library has been initialized at the beginning of each API. Rather than using pthread mutex and condition variables to protect the flag during this check, just change the value to being atomic and use atomic set/check semantics.

* Removed unnecessary thread locks from inside the psec/munge module. I can’t find any reason for their existence, but someone with munge available should check it out

* Avoid use of PMIx APIs inside the PMIx initialization routines. We don’t want anything coming through those APIs until _after_ initialization has been completed

* Added support for tool_connection2 and log2 server module entries, per changes in the Standard

* Major update to the plog framework, including actual implementation of the smtp component. Reduced looping through the framework to simplify integration.

* Accepted contributions to fix a couple of bugs in the internal pmix_bitmap class and in flushing residual IO when releasing a namespace tracker

* Replace all usage of “sprintf” with “snprintf” as some compiler families have removed that function.


PRRTE:

* Accepted contributions to fix a bug in the setting of PMIX_JOB_RECOVERABLE and to clarify some help messages

* Added support for tool_connection2 and log2 server module entries, per changes in the Standard

* Fixed ordering of hosts when mapping to avoid always starting with the DVM controller

* Extended support for printing out process binding maps to allow users to request they be output in terms of physical (instead of logical) CPU IDs

* Ensure we error out when asymmetric topologies (e.g., as found in hypervisor-based systems) cannot support ppr mapping requests

* Do not require that seq and rankfile directives for MPMD jobs provide specific #procs for each application - we can compute them from the placement files

* Fixed relative node indexing problem with number of empty nodes to use

* Extended timeout support to child jobs so they are terminated along with their parent

* Added a “launching-apps” section to the online docs - ported from OMPI (their legacy location)

Reply all
Reply to author
Forward
0 new messages