[GSoC 2026] Inquiry regarding Project 10: Dynamic LLM Trainer Framework (KEP-2839)

31 views
Skip to first unread message

tejesh venkat

unread,
Mar 3, 2026, 8:23:40 AM (12 days ago) Mar 3
to kubeflow-discuss

Dear Kubeflow Maintainers,

I am Tejeshvenkat, an M.Tech student at IIT Madras specializing in Distributed Systems. I am writing to express my strong interest in pursuing Project 10: Dynamic LLM Trainer Framework as a Large (350-hour) GSoC 2026 project.

I have been deep-diving into the KEP-2839 design and the current v2 trainer implementation. My background aligns closely with the architectural challenges of this project:

  • Infrastructure Experience: I have built a globally distributed task scheduler focusing on resource allocation, worker heartbeats, and fault-tolerant priority queuing.

  • Technical Interest: I am particularly interested in refactoring the hardcoded TorchTune integration into a modular LLMBackend registry. I am also keen on ensuring the controller-runtime integration handles backend lifecycle management and checkpoint recovery efficiently.

I have already introduced myself on the CNCF Slack (#kubeflow-contributors) and am currently setting up my local development environment to begin contributing to the kubeflow/trainer repository.

I would appreciate any guidance on specific technical areas or "entry point" issues (refactoring or testing) that the mentors would like me to focus on to demonstrate my proficiency with the v2 architecture.

Thank you for your time and for the opportunity to contribute to the Kubeflow ecosystem.

Best regards,

Tejeshvenkat 

[https://github.com/tejeshvenkat

[www.linkedin.com/in/tejesh-venkat-776a32289]

Reply all
Reply to author
Forward
0 new messages