PMIx Tools WG - Minutes from meeting 17 (14.8.2022)

10 views
Skip to first unread message

Isaías Alberto Comprés Ureña

unread,
Sep 22, 2022, 10:06:31 AM9/22/22
to PMIx Standard Tools Working Group
Hello everyone,

Please find my notes below, and don't hesitate to add to them.  

- Dynamic Workflow WG's status 
  - Have not heard from the DW WG folks yet.
  - Today the discussion is relevant to this WG.

- Propose changes to PMIx_allocation_request
  - A new request cancellation directive was discussed
    - General agreement that this is desirable and is not a big change in the standard
    - Unfortunately, the implementation itself is not as simple (thanks to Ralph for insights)
      - Careful evaluation of possible race conditions is necessary
      - Fortunately, these should be exceptional occurrences, but nevertheless, the implementation needs to cover for these.  For example:
        - Message to cancel a request occurs while a scheduler is in the process of making an allocation decision
        - A scheduling decision has been made and the notification message is in flight, while the application has also sent a cancellation request.
  - We may not need to add a request ID, if we restrict applications/runtimes to only 1 request at a time.
  
- Can the application reject an approved allocation request at the time of receiving its approval notification?
  - General agreement that this is a bad idea.
  - Why would an application reject something it has requested?
    - Some examples were discussed, but seems more like corner cases:
      - E.g. applications that have unpredictable progress, unbounded number of iterations, etc.  Where the situation has changed between its request and the approval.
      - Based on TUM's research, this feature is not necessary, but was discussed regardless.
        - TUM's scheduler design should produce request responses in the sub 10 second range, even on extreme cases.  This can be discussed further in future meetings.
          - However, other scheduling approaches (probably superior, coming in the future) may have higher response times, comparable to current heuristics.

- Reverse malleability approach: scheduler and monitor driven.
  - In this case, the scheduler will produce resource offers
    - Ralph called these "resource advertisements", which is also an accurate way of calling them
  - This approach will be explored in future developments in the European community.
  - In this scenario, the application and/or runtime system needs to reply to these offers, unless it is required to accept.
  - What about timing: how long does the application have to accept an offer?
    - Resources may no longer be available based on new job submissions and any other change of state at the scheduler and the system.
  - This mode of operation will be discussed in future meetings.

Reply all
Reply to author
Forward
0 new messages