GSoC 2026: Interest in Project 1: Agentic RAG on KubeFlow — Sameer Ali Khan

31 views
Skip to first unread message

Sameer Ali khan

unread,
Mar 18, 2026, 2:49:32 PMMar 18
to kubeflow-discuss

Hi Kubeflow community ;

I'm Sameer Ali Khan — M.Sc. CS student at Osmania University, Hyderabad. I'm applying for GSoC 2026 and Project 1 (Agentic RAG on KubeFlow) is my top choice.

I went through the docs-agent repo and the official ideas page before writing this. 

The current setup is clear — KFP handles the ETL (GitHub scrape → chunk → embed via sentence-transformers → Milvus upsert), KServe runs Llama 3.1-8B with vLLM and tool calling enabled, and the HTTPS API streams responses via WebSocket with citation tracking. Solid foundation.

The way I read the GSoC scope: 

the real shift is from a reactive retrieval tool to an actual agentic loop — where the model reasons about whether what it retrieved is sufficient, and can re-query or decompose the problem before responding. The ideas page mentions both LangGraph and Kagent as candidate frameworks. I'm curious whether there's already a preference between them from the mentors' side, or if that's an open design decision for the contributor to propose.

On Golden Data — is the expectation to define a fixed schema upfront (query, context, expected answer, source doc), or is the design intentionally open-ended at this stage? That choice affects how the evaluation pipeline gets built downstream, so I want to make sure I'm scoping the proposal correctly.

I also noticed the README flags serving the embedding model as a KServe service (instead of reinstalling sentence-transformers every pipeline run) as a future improvement — that feels like something worth tackling as part of the ingestion pipeline work.

My Background: Full Stack dev (React, FastAPI, Node) pivoting into AI/ML built RAG pipelines with LangChain, worked with LangGraph for agentic flows, comfortable with Kubernetes. The end-to-end perspective helps here — I think about how the whole system behaves under load, not just the model layer.

GitHub: https://github.com/SameerAliKhan-git 

LinkedIn: https://linkedin.com/in/sameeralikhan1/

Looking forward to the discussion! 

Sameer Ali Khan

Reply all
Reply to author
Forward
0 new messages