Resetting our Maven repository caches

22 views
Skip to first unread message

Daniel Beck

unread,
Jun 14, 2019, 7:05:48 AM6/14/19
to Jenkins Developers
Hi everyone,

Until very recently, most of the caches of various repositories ('remote repositories') we operate on repo.jenkins-ci.org (Maven Central, JCenter, and a bunch of others) were configured to use plain HTTP to download artifacts. This has now been fixed where possible. While I didn't find any unexplainable differences during a rudimentary check of the dependencies used in the most popular Jenkins plugins, this still isn't a great situation and I'd like to get rid of these existing caches. These repos are part of the virtual repository repo.jenkins-ci.org/public referenced from pom.xml files, so are involved in dependency resolution from Jenkins components like plugins.

Note that I wrote 'unexplainable differences'. Our caches have, in some cases, diverged from upstream. All occurrences of this that I've seen can be traced to a project re-releasing an artifact after it was already cached by us.

Wiping our caches would therefore result in changes to some artifacts, or possibly lose them altogether. Not wiping caches OTOH isn't a reasonable alternative either if we want to be able to trust these artifacts. Manually confirming our caches are identical, and, in case of differences, confirm they're legitimate, would be extremely time consuming, given the number of cached artifacts[1], and we still wouldn't know what to do about any artifacts cached by us that are gone from upstream[2].

My suggestion would be that we configure our current cache repositories to be "blacked out". That's an Artifactory feature that allows repositories to remain, but disables downloads and artifact resolution. In their place, we would add new remote repositories to the same upstream repos, but configured to use HTTPS from the start. (Alternatively, we could remove these old remote repositories from the 'public' virtual repository referenced in pom.xml files, largely accomplishing the same goal.)

If we actually encounter problems due to unintended, major changes to artifacts, we will be able to restore them from these blacked out caches to a "manual cache" repository with higher precedence than whatever would be available from remote repositories.

WDYT?

Daniel

(And while we're at it: Do we want our caches to expire, or have consistent builds in favor of alignment with upstream re-releases?)


1: JCenter: 1M files / 360.37 GB; Maven Central: 5.6M files / 1.45 TB
2: I gave up on this task for our cache of Atlassian's repo after 76830 cached artifacts, of which 29 actually existed upstream.


James Nord

unread,
Sep 23, 2019, 11:29:08 AM9/23/19
to Jenkins Developers
Whatever you choose if you change what you would build as a result then you have to take into account that others downstream will have cached something they downloaded (even if it is a developer machine rather than a Nexus/Artifactory etc) and then you are allowing different builds in your CI than a developer or another CI consuming those artifacts would get.  I would heavily caution you against doing anything that knowingly introduces a difference between what would happen in CI and on a developer machine configured with the Jenkins maven proxy.

given most plugins are released from someone own machine that will have cached these libraries, you will continue to have these libraries used and included in plugins.



> (And while we're at it: Do we want our caches to expire, or have consistent builds in favor of alignment with upstream re-releases?)

No never.  (unless it is a snapshot). - especially because of 2 but also If it expires you are then allowing changes and then you will get a different result locally and on a developer machine.
Re releasing artifacts is considered a very very bad thing to do, if a repo is doing that its likely doing other bad things and we need to tread very carefully.






Reply all
Reply to author
Forward
0 new messages