Dear Jenkins Developers, Contributors, and Users,
My name is Chirag Gupta, and I am honored to be a Google Summer of Code 2025 contributor working on a project titled:“Jenkins Domain specific LLM based on actual Jenkins usage using ci.jenkins.io data.”
The core motivation behind this project is to develop a specialized Large Language Model (LLM) tailored for Jenkins. By fine-tuning an existing open sourced models with data sourced from ci.jenkins.io, the aim is to create a tool that can significantly aid in diagnosing Jenkins failures, reducing troubleshooting time, and ultimately helping teams, including the Jenkins infrastructure team, discover ways to troubleshoot more effectively. This project is for the community, and its success will be greatly enhanced by the community's collective expertise.
The primary focus will be to deliver:
A functional fine-tuned LLM/Agent capable of assisting with common Jenkins infra failure diagnosis.
A pipeline using which the model was trained so that more models can be trained as newer, better base models are introduced and more datasets can be added as time progresses as well.
The project will span the GSoC 2025 timeline, from May to September.
Some technical aspects
Data Source: Leveraging the invaluable, real-world usage data from ci.jenkins.io.
Model Selection: Evaluating and fine-tuning robust base models. Current candidates for consideration include Microsoft's Phi 4 and Qwen3 (14B & 8B) for mid-sized models , alongside smaller models like Phi - 4 mini Instruct and Qwen3 (4B & 1.7B).
Data Curation & Preparation: Implementing a thorough strategy for cleaning, structuring, and preparing the ci.jenkins.io dataset, addressing challenges like log noise and token inflation.
Evaluation Strategy: Employing a combination of targeted benchmarks (e.g., IFEval for instruction following, SimpleQA for factual accuracy adapted for Jenkins), a custom diagnostic benchmark, and qualitative human evaluation.
Seeking Your Expertise and Feedback – A Community Effort:
To ensure this project truly benefits the Jenkins community, your input is crucial. In the spirit of open source, this project is being built for the community, and ideally, with the community. Your collective experience can significantly shape its development and help us create a tool that effectively reduces the time we all spend deciphering build failures.
I would be incredibly grateful for any suggestions, insights, or feedback you might have, particularly if you have experience with:
Jenkins infrastructure and common failure patterns.
Analyzing ci.jenkins.io logs.
Specific types of failures that are consistently challenging to diagnose.
Existing tools or techniques for log analysis that have proven useful.
Potential data sources or specific aspects of ci.jenkins.io data that would be most valuable for training.
Any advice, common pain points, or even specific examples of tricky build failures would be immensely helpful in guiding the development and fine-tuning of this LLM.
Thank you for your time and consideration. I look forward to your valuable contributions and to building a tool that can benefit the entire Jenkins ecosystem.
Best regards,
Chirag Gupta
GSoC 2025 Contributor@Jenkins