Artifactory cost control

202 views
Skip to first unread message

Mark Waite

unread,
Sep 15, 2020, 7:43:46 PM9/15/20
to Jenkins Infrastructure
JFrog has hosted the Jenkins artifact repository at https://repos.jenkins-ci.org/ since 2012.  We're deeply grateful for their donation of the hosting and the bandwidth to host the released artifacts, pre-release artifacts, cached dependencies, and incremental releases.

Unfortunately, we learned in a conference call today that the costs for the storage and bandwidth of the Jenkins Artifactory instance are much too high for JFrog to continue bearing those costs alone.

JFrog has recommended some initial changes to the repository so that we can continue to use the repository and bring the costs under control.  Daniel Beck and I will meet with them again for more detailed planning.

Some of the initial changes proposed include:
  • Require authentication for artifact read (disable anonymous read)
  • Remove outdated cached copies of artifacts from other repositories
More information will be shared as we learn more.

Mark Waite

Gavin Mogan

unread,
Sep 15, 2020, 9:49:53 PM9/15/20
to Jenkins Infrastructure
Would disabling anonymous read prevent people from building plugins? Or can maven central make up the difference?

--
You received this message because you are subscribed to the Google Groups "Jenkins Infrastructure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkins-infr...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/jenkins-infra/c27aef9a-8e50-41a5-861d-1ffa6fb57c8cn%40googlegroups.com.

Jesse Glick

unread,
Sep 16, 2020, 9:33:38 AM9/16/20
to jenkin...@googlegroups.com
On Tue, Sep 15, 2020 at 7:43 PM Mark Waite <mark.ea...@gmail.com> wrote:
> Require authentication for artifact read (disable anonymous read)

Unless I am missing something, this would break almost every job on
ci.jenkins.io as well as routine builds of Jenkins-related sources for
thousands of people. What would be the purpose, anyway?


If the `incrementals` repo is a significant portion of the problem, we
can discuss solutions.
https://github.com/jenkins-infra/iep/blob/master/iep-009/README.adoc#garbage-collection
was proposed but never implemented. Ideas include:

· Delete (or move to cheap “glacial” storage) artifacts which were
never downloaded according to HTTP access logs.
· Delete (or, again, glaciate) artifacts older than some threshold,
say a year, or not downloaded within that period.
· Stop publishing artifacts automatically at the end of a stable build
and switch to some sort of explicit gesture. (Not trivial to deal with
authentication & authorization, and sometimes it is desirable to
publish artifacts from `master`, not just PRs.)

Oleg Nenashev

unread,
Sep 16, 2020, 9:47:18 AM9/16/20
to jenkin...@googlegroups.com
It would be great to get some details about their cost expectations before we deep dive.

If the main concern is traffic, we need to see how much of that comes from ci.jenkins.io. Removal of the Azure-internal cache has likely caused a significant rise in Traffic on the Artifactory side. 

For artifact persistency, incrementals GC cleaning up unused history are nice steps to consider. There are components (e.g. Hudson WARs or old API plugin lib versions) which can be safely archived

Best regards,
Oleg

--
You received this message because you are subscribed to the Google Groups "Jenkins Infrastructure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkins-infr...@googlegroups.com.

Mark Waite

unread,
Sep 16, 2020, 10:23:51 AM9/16/20
to Jenkins Infrastructure
On Wed, Sep 16, 2020 at 7:33 AM Jesse Glick <jgl...@cloudbees.com> wrote:
On Tue, Sep 15, 2020 at 7:43 PM Mark Waite <mark.ea...@gmail.com> wrote:
> Require authentication for artifact read (disable anonymous read)

Unless I am missing something, this would break almost every job on
ci.jenkins.io as well as routine builds of Jenkins-related sources for
thousands of people. What would be the purpose, anyway?


The suggestion came from the suspicion that there may be many people 
that are using https://repos.jenkins-ci.org/ as a convenient mirror of other
repositories rather than using it for work that serves the Jenkins project.

JFrog wants to continue supporting the Jenkins project, but the current
costs are too high for them to carry alone.

They indicated that public anonymous access to an artifact repository is
not as common now as it was in 2012 when the repository was created.
 

If the `incrementals` repo is a significant portion of the problem, we
can discuss solutions.
https://github.com/jenkins-infra/iep/blob/master/iep-009/README.adoc#garbage-collection
was proposed but never implemented. Ideas include:

· Delete (or move to cheap “glacial” storage) artifacts which were
never downloaded according to HTTP access logs.
· Delete (or, again, glaciate) artifacts older than some threshold,
say a year, or not downloaded within that period.
· Stop publishing artifacts automatically at the end of a stable build
and switch to some sort of explicit gesture. (Not trivial to deal with
authentication & authorization, and sometimes it is desirable to
publish artifacts from `master`, not just PRs.)


The initial look at space utilization indicated that we'll need to apply
one or more of those ideas to the incrementals repo.  However, we
still need to understand the contribution of various parts to the total
cost before we choose which items to implement.  I'm not yet clear
if bandwidth use is the greatest cost element or if disc use is the 
greatest cost element.

Mark Waite

unread,
Sep 16, 2020, 10:26:42 AM9/16/20
to Jenkins Infrastructure
On Wed, Sep 16, 2020 at 7:47 AM Oleg Nenashev <o.v.ne...@gmail.com> wrote:
It would be great to get some details about their cost expectations before we deep dive.


Yes, I agree that we need to understand how much we need to reduce costs in order to
be acceptable to them.  We'll take that topic in upcoming meetings with them.
 
If the main concern is traffic, we need to see how much of that comes from ci.jenkins.io. Removal of the Azure-internal cache has likely caused a significant rise in Traffic on the Artifactory side. 


That's a good point.  I think that we need to understand whether bandwidth
is the greatest cost contributor or if it is storage space or something else.
 
For artifact persistency, incrementals GC cleaning up unused history are nice steps to consider. There are components (e.g. Hudson WARs or old API plugin lib versions) which can be safely archived


Agreed.
 

Gavin Mogan

unread,
Oct 21, 2020, 12:58:07 AM10/21/20
to Jenkins Infrastructure
So looks like this was enabled. I'm seeing various chat and issues regarding people getting 401s from repo.jenkins-ci.org

For example (both related)


So i think its the right decision, but I think an official announcement should get made so it can get pointed to when people comment.

Gavin


Reply all
Reply to author
Forward
0 new messages