Hello everyone,
Please find my notes below, and don't hesitate to add to them.
- Dynamic Workflow WG's status
- Have not heard from the DW WG folks yet.
- Today the discussion is relevant to this WG.
- Propose changes to PMIx_allocation_request
- A new request cancellation directive was discussed
- General agreement that this is desirable and is not a big change in the standard
- Unfortunately, the implementation itself is not as simple (thanks to Ralph for insights)
- Careful evaluation of possible race conditions is necessary
- Fortunately, these should be exceptional occurrences, but nevertheless, the implementation needs to cover for these. For example:
- Message to cancel a request occurs while a scheduler is in the process of making an allocation decision
- A scheduling decision has been made and the notification message is in flight, while the application has also sent a cancellation request.
- We may not need to add a request ID, if we restrict applications/runtimes to only 1 request at a time.
- Can the application reject an approved allocation request at the time of receiving its approval notification?
- General agreement that this is a bad idea.
- Why would an application reject something it has requested?
- Some examples were discussed, but seems more like corner cases:
- E.g. applications that have unpredictable progress, unbounded number of iterations, etc. Where the situation has changed between its request and the approval.
- Based on TUM's research, this feature is not necessary, but was discussed regardless.
- TUM's scheduler design should produce request responses in the sub 10 second range, even on extreme cases. This can be discussed further in future meetings.
- However, other scheduling approaches (probably superior, coming in the future) may have higher response times, comparable to current heuristics.
- Reverse malleability approach: scheduler and monitor driven.
- In this case, the scheduler will produce resource offers
- Ralph called these "resource advertisements", which is also an accurate way of calling them
- This approach will be explored in future developments in the European community.
- In this scenario, the application and/or runtime system needs to reply to these offers, unless it is required to accept.
- What about timing: how long does the application have to accept an offer?
- Resources may no longer be available based on new job submissions and any other change of state at the scheduler and the system.
- This mode of operation will be discussed in future meetings.