PMIx Tools WG - Meeting 8 - December 8th 2021

16 views
Skip to first unread message

Isaías Alberto Comprés Ureña

unread,
Dec 9, 2021, 9:37:34 AM12/9/21
to PMIx Standard Tools Working Group
Hello everyone,

Please find our minutes from meeting number 8 below.

Participants (in no particular order):
- Josef Weidendorfer
- Ralph Castain
- Norbert Eicker
- Simon Pickartz
- Isaias Compres

Dagstuhl Seminar on Dynamic Resource Management
- Martin Schulz' presentation about malleability with MPI Sessions
  - Proposal covers the resource adaptation of a running MPI application
    - 3 states: current allocation, transition, new allocation:
      1.- Application runs as normal with its current Session
      2.- A new session is opened to peek at allocation changes
        - Low overhead operation; a local handle to the MPI runtime
        - If there are no changes to allocation metadata, the Session is closed and the applications continues as normal
        - If allocation metadata changes, the application parses the data and makes a decision
        3.a - To reject the changes, the new Sessions is closed; original session is kept and the application continues as normal
        3.b - To accept the allocation update, the application holds both sessions momentarily, does a repartition of its domain, and then closes the original Sessions while keeping the new one to continue its progress.
  - No current proposal around negotiation: either RM driven or Application driven
  - We have to work with this proposal while the standarization efforts are ongoing.
    - No clear timeline for new malleable API to be approved.

- Flux resource manager: deeply hierarchical design with graph-based job requirements
- Co-scheduling discussions: need to utilizy ever larger, more parallel, single-nodes 
- Cloud computing RMs (e.g. Kubernetes) or traditional schedulers for Supercomputing, an open question

Slurm fork for PMIx, Open PMIx integration and testing:
- Ralph has created a new for of Slurm for rapid development
  - Decouples our experimentation from upstream approval of patches
  - Found on GitHub:
    https://github.com/slurm-pmix/slurm
  - Ralph brings Open PMIx expertise, while other help with Slurm internals
  - Aligned with DEEP-SEA activities
  - Ralph integrated previous work and early testing is done
    - Need to help with testing first
    - Some issues are known and marked
      - Need to sort out some PMIx standard violations on caller rules
      - May need to do large refactorings to make the existing code more manageable
      - For malleability: need to revice threading and copying of allocation metadata
        - Isaias will look at threading and 'agent' use in Slurm
        - Isaias: will identify which plugins can have PMIx versions, such as the 'launch' plugin

Organization:
- Incomming holiday season: next meeting on 12th of January, 2022
- In that meeting: rescehdule on a monthly basis


* I wish you all happy holidays, and we continue our meetings after the new year! *
Reply all
Reply to author
Forward
0 new messages