GSoC 2020 idea: Git plugin automatic caching for faster clone

259 views
Skip to first unread message

Mark Waite

unread,
Dec 26, 2019, 3:36:52 PM12/26/19
to jenkinsci-gsoc-all-public
Faster git fetch performance using existing local workspaces as a cache

The Jenkins git plugin clones remote git repositories into Jenkins workspaces on agents.  The Jenkins agents frequently have other copies of the remote git repository which could be used as a local cache to reduce network data transfer from the remote git server to the local workspace. Those local copies could be in workspaces for previous builds of this job or they could be in workspaces for other branches of the same repository. Those local copies might also be maintained by administrators with periodic updates from the central repository.

Identify and use a well-chosen existing copy of the remote git repository as a reference repository for the git fetch that is being performed in the workspace. Default to dissociate the workspace repository from the reference repository during the fetch so that deletion of the reference repository will not harm the workspace repository.

Refer to https://docs.google.com/document/d/1M9Na1eJxwGZ7fY1L-PsBGv62jL_pPBtqmGePpwogUBw/edit# as a draft location for the project proposal and for refinement of the proposal.

Mark Waite

RAJAT BANSAL

unread,
Dec 28, 2019, 8:03:49 AM12/28/19
to jenkinsci-gsoc-all-public
Hi Mark and others,
I am Rajat Bansal, a 4th-year Computer Science and Applied Mathematics student at IIIT Delhi and would love to contribute to an esteemed organization like Jenkins. I recently joined the mailing list and really liked the idea you have proposed above. I would like to be assigned to this task if possible however I think that it's already been done since I found this. Also, I am planning to apply for GSoC 2020 under Jenkins and would like to know about any other pending projects/issues to get familiar with the code-base. 
Looking forward to some help and guidance. 

Regards,
Rajat Bansal  

Mark Waite

unread,
Dec 28, 2019, 8:29:20 AM12/28/19
to RAJAT BANSAL, jenkinsci-gsoc-all-public
Hi Rajat!

Thanks for your interest.  That's great!

The project definitely has not been done.  Using a reference repository as described in that article requires that the reference repository be updated by some process, usually outside Jenkins.  That is a lot of work that the Jenkins user or administrator should not need to do.  The project idea is seeking ways to make the use of reference repositories much more automatic and much more effective for Jenkins users.

I'm personally inexperienced with how the Google Summer of Code student selection process works.  I believe it is described at https://google.github.io/gsocguides/mentor/selecting-a-student .  I know that the Google Summer of Code subproject of Jenkins has lots of experience with the process and will help and coach as projects are discussed, evaluated, refined, selected, and assigned. 

If you're interested in starting on a project even before being selected as a Google Summer of Code student, you may be better served by starting with one of the "friendly issues" in the Jenkins project.  Friendly issues will give you experience with the Jenkins project and its infrastructure and take much less time than the Google Summer of Code project ideas.  

The participate/code page and an excellent blog post on becoming a Jenkins contributor will get you started.  The Jenkins contributing page also provides a link to friendly issues or you can use the two dimensional filter to find friendly issues by project and severity of the issue.

--
You received this message because you are subscribed to the Google Groups "jenkinsci-gsoc-all-public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-gsoc-all...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-gsoc-all-public/0905c63c-09c5-44d6-b192-3e39baf5323f%40googlegroups.com.


--
Thanks!
Mark Waite

Jeff

unread,
Dec 28, 2019, 1:16:48 PM12/28/19
to Mark Waite, RAJAT BANSAL, jenkinsci-gsoc-all-public
Hi Rajat.

I'm one of the org admins for GSoC in Jenkins. I appreciate your enthusiasm, and I hope you'll apply when we're ready to start taking applications in a few months.


At this point, we're not ready to accept applications. We expect to start engaging with students February 20, 2020, and the application period will be March 16 - 31. You can consult the GSoC Timeline for full information.


I encourage you to get involved in one of the "newbie friendly issues" to get acquainted with the Jenkins project.  The ultimate goal of GSoC is to encourage students to get involved in open source, so we'd definitely welcome any contributions.


Best 
Jeff

RAJAT BANSAL

unread,
Dec 28, 2019, 2:03:29 PM12/28/19
to Jeff, Mark Waite, jenkinsci-gsoc-all-public
Thanks a lot Mark and Jeff for the guidance. I will start working on some of the friendly issues then to get better acquainted with Jenkins and hope to be of much help. 

Regards,
Rajat Bansal

rishabhb...@gmail.com

unread,
Feb 23, 2020, 4:19:52 PM2/23/20
to jenkinsci-gsoc-all-public
Before asking questions regarding this project, please be patient with my understanding of this project. 

Each SCMHead and SCMRevision has a SCMFileSystem associated to it. This Filesystem is created by locking the git repository using a cache
The objective of this project seems to use that cache as a reference repository (in the local machine:agent node) which would basically reduce network and local storage costs as fewer objects would have to be copied from the repo being cloned.

My queries regarding the project are: 

1) The parameter to choose an existing copy of the remote git repo is how many git objects the "local copy" and the remote git repo share? The repo which shares the most would be the latest or the preferable candidate for this job?

2) I have seen the PR: 502. The comment made by Jesse Glick about Mercurial Plugin's way of storing two levels of cache -- on the master and on the agents seems like a good way to implement caching for the git-plugin. Would you consider that as a good choice?

Also, I want to make the case of creating this functionality into a separate extension plugin which then can be an opt-in feature as using cache could be a potential downside that may affect some users.

Gyanesha Prajjwal

unread,
Mar 5, 2020, 2:32:58 PM3/5/20
to jenkinsci-gsoc-all-public
Hi Everyone,

I am Gyanesha Prajjwal. I am really interested to contribute to Jenkins and work for this esteemed organisation in GSoC 2020. 

I went through the ideas list and I found "Git Repository Caching" really interesting. 

Before submitting any proposal, I would really like to get familiarised with the Jenkins code-base. I have made some pull requests on git-plugin and git-client-plugin before that I have contributed to jenkinsci/analysis-model and jenkinsci/warnings-ng-plugin. 

Also, I would like to understand more about the aforementioned project.

Looking forward for some guidance.

Thank you and Regards,
Gyanesha Prajjwal

Marky Jackson

unread,
Mar 5, 2020, 2:34:01 PM3/5/20
to Gyanesha Prajjwal, jenkinsci-gsoc-all-public
--
You received this message because you are subscribed to the Google Groups "jenkinsci-gsoc-all-public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-gsoc-all...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages