Hi Andrey,
I’m a GSoC applicant interested in Kubeflow Trainer and I went through the KEP-3328 proposal.
The autoconf plugin approach looks very promising, especially the use of runtimePatches for maintaining clear ownership.
I had a question regarding how the system would handle heterogeneous GPU clusters where devices differ in memory and compute capabilities and would the recommendation logic be device-aware?
Also, is there any plan to incorporate runtime feedback to improve recommendation accuracy over time?
Looking forward to the discussion.
Thanks,
Ayush