New release candidates available to test

2 views
Skip to first unread message

Ralph Castain

unread,
Apr 29, 2026, 4:01:59 PMApr 29
to pm...@googlegroups.com, pmix-t...@googlegroups.com
There have been several critical patches made to both PMIx and PRRTE recently, and so it is necessary that we do new releases in both projects. I have prepared release candidates for each.

Please test these and let me know the results.
Ralph

--------------------
***Important***
This release contains the following important changes:

   * restoration/repair of the OmniPath support component
   * update of the Python bindings and fix to allow Python
      to retrieve all nspace-level info via PMIx_Get
   * fix messaging problem in the PMIx communication logic
      that could cause a message from one process to be
      incorrectly delivered to a pending receive for another
      process


--------------------

Detailed changes since v6.1.0:
 - PR #3856: Roll to rc1
 - PR #3855: Multiple commits
    - Cleanup error paths in net_get_hostname
    - Update Python bindings
    - Update Python-related workflow to actually test bindings
    - Add missing data types to value load/unload
    - Don't pass zero-byte stdin to host
    - test: add get() test without keys specified
    - bindings: capture None as NULL key in get()
    - Remove unnecessary hwloc version protection
    - Restore pnet/opa component
    - Add fn to print peer type
    - Fix messaging problem in ptl recv
    - Update NEWS


-----------------------
***Important***
This release contains the following important changes:

   * PBS launch support has been switched from direct linkage
      against the TM library to use of the pbs_tmrsh command
      in the ssh launcher to avoid library confusion due to
      mixed dependencies. The --with-tm configure option has
      therefore been removed. PBS users are strongly advised
      to upgrade to this PRRTE version as soon as possible.
   * significant improvement has been made in the handling
      of heterogeneous nodes. Accordingly, the default is now
      to assume possible heterogeneity. Users who _know_ they
      have uniform nodes and do not wish to use the new logic
      may add the --uniform-nodes cmd line option. Note that
      the --hetero-nodes option has been removed.

----------------------------

Detailed changes include:
 - PR #2437: Multiple commits
    - Default no-arg prun case to --help
    - Improve hetero node handling
    - Ignore race condition on IOF
    - Minor updates to slurm support
    - Switch PBS launch support to pbs_tmrsh
    - Update NEWS
    - Roll to rc1

Ralph Castain

unread,
Apr 30, 2026, 4:47:13 PMApr 30
to Felip Moll, pm...@googlegroups.com, pmix-t...@googlegroups.com
Can you pass along a description of the scenario (e.g., were you forwarding stdin, the app code being used, how it was executed) so I can try to locally reproduce? Can you configure `--enable-debug` to get a line number for the segfault?

The cited change was awhile ago, and we’ve seen nothing from PRRTE. You shouldn’t have to make any new calls.


On Apr 30, 2026, at 1:16 PM, Felip Moll <fm...@nvidia.com> wrote:

Hi,

I am testing this version with Slurm, and I've found a segfault when a process calls PMIx_Abort() (may happen in some other case though).
This is a backtrace of slurmstepd threads segfaulting in pmix. This does not happen on pmix 5.x versions. 

Thread 6 "slurmstepd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff35f7fe000 (LWP 654626)]
0x00007ff3700b0791 in write_output_line () from /home/lipi/bin/pmix_6.1.1rc1/lib/libpmix.so.2
(gdb) bt
#0  0x00007ff3700b0791 in write_output_line () from /home/lipi/bin/pmix_6.1.1rc1/lib/libpmix.so.2
#1  0x00007ff3700b112e in pmix_iof_write_output () from /home/lipi/bin/pmix_6.1.1rc1/lib/libpmix.so.2
#2  0x00007ff37004bf28 in _iofdeliver () from /home/lipi/bin/pmix_6.1.1rc1/lib/libpmix.so.2
#3  0x00007ff37032e188 in event_process_active_single_queue () from /lib64/libevent_core-2.1.so.7
#4  0x00007ff37032ff9f in event_base_loop () from /lib64/libevent_core-2.1.so.7
#5  0x00007ff37008f921 in progress_engine () from /home/lipi/bin/pmix_6.1.1rc1/lib/libpmix.so.2
#6  0x00007ff371d0a464 in start_thread () from /lib64/libc.so.6
#7  0x00007ff371d8d5ec in __clone3 () from /lib64/libc.so.6

I think this might come from commit 62c9fbdb. I am not sure if this is something that changed and requires some new call from slurmstepd itself (do I need to call PMIx_Progress_thread_stop()?)
Or this should be addressed from pmix itself.

Please let me know what you think and see.



De: 'Ralph Castain' via pmix-testers <pmix-t...@googlegroups.com>
Enviat el: dimecres, 29 d’abril de 2026 22:01
Per a: pm...@googlegroups.com <pm...@googlegroups.com>; pmix-t...@googlegroups.com <pmix-t...@googlegroups.com>
Tema: New release candidates available to test
 
External email: Use caution opening links or attachments 

-- 
You received this message because you are subscribed to the Google Groups "pmix-testers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmix-testers...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pmix-testers/2107A6B0-4C82-465A-B06D-BA288A8CC231%40pmix.org.
For more options, visit https://groups.google.com/d/optout.

Ralph Castain

unread,
May 18, 2026, 3:47:53 PM (3 days ago) May 18
to pm...@googlegroups.com, pmix-t...@googlegroups.com
Hello all

I have updated the release candidates to rc2, and added one for the PMIx v5.0 series with a couple of critical fixes. Please give them a check and let me know.

IMPORTANT


This release contains the following important changes:

  • restoration/repair of the OmniPath support component
  • Repair support of OMPI default MCA params

Detailed changes include:

  • PR #3869: Multiple commits
    • Update NEWS and VERSION
    • pmdl/ompi: use the right type to enumerate myenvars
    • fix var group lookup in component repository release
    • Prepend I/O residuals to correct redirected streams
    • Restore pnet/opa component
    • Don't pass zero-byte stdin to host
    • Correct the wrapper compiler man page
    • Default output to file to nocopy
    • Properly handle qualified values in client get
    • Do not double-process IOF formats
    • Correct cflags used for check_compiler_version.m4
    • Fully support return of static values


Important

This release contains the following important changes:

  • restoration/repair of the OmniPath support component
  • update of the Python bindings and fix to allow Python
    to retrieve all nspace-level info via PMIx_Get
  • fix messaging problem in the PMIx communication logic
    that could cause a message from one process to be
    incorrectly delivered to a pending receive for another
    process

    Detailed changes since v6.1.0:

    • PR #3868: Multiple commits
      • Update NEWS
      • pmdl/ompi: use the right type to enumerate myenvars
      • Add a little overflow protection
      • fix var group lookup in component repository release
      • Prepend I/O residuals to correct redirected streams
    • PR #3855: Multiple commits
        • Cleanup error paths in net_get_hostname
        • Update Python bindings
        • Update Python-related workflow to actually test bindings
        • Add missing data types to value load/unload
        • Don't pass zero-byte stdin to host
        • test: add get() test without keys specified
        • bindings: capture None as NULL key in get()
        • Remove unnecessary hwloc version protection
        • Restore pnet/opa component
        • Add fn to print peer type
        • Fix messaging problem in ptl recv
        • Update NEWS

      Important


      This release contains the following important changes:

      • PBS launch support has been switched from direct linkage

      • against the TM library to use of the pbs_tmrsh command
        in the ssh launcher to avoid library confusion due to
        mixed dependencies. The --with-tm configure option has
        therefore been removed. PBS users are strongly advised
        to upgrade to this PRRTE version as soon as possible.
      • significant improvement has been made in the handling
        of heterogeneous nodes. Accordingly, the default is now
      • to assume possible heterogeneity. Users who know they

      • have uniform nodes and do not wish to use the new logic
        may add the --uniform-nodes cmd line option. Note that
        the --hetero-nodes option has been removed.

      Detailed changes include:

      • PR #2450: Multiple commits
        • Ensure cmd line MCA params are recognized
        • Ignore irrelevant MCA params for prted cmd line
        • Transfer the default mapby modifiers to jobs
        • config: make m4 type-name sanitization locale-independen
        • Update NEWS and VERSION
      • PR #2437: Multiple commits
          • Default no-arg prun case to --help
          • Improve hetero node handling
          • Ignore race condition on IOF
          • Minor updates to slurm support
          • Switch PBS launch support to pbs_tmrsh
          • Update NEWS
          • Roll to rc1

        --
        You received this message because you are subscribed to the Google Groups "pmix" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to pmix+uns...@googlegroups.com.
        To view this discussion visit https://groups.google.com/d/msgid/pmix/2107A6B0-4C82-465A-B06D-BA288A8CC231%40pmix.org.

        Reply all
        Reply to author
        Forward
        0 new messages