github quota limit when scanning with the addition of tags

483 views
Skip to first unread message

j.kn...@travelaudience.com

unread,
Jan 3, 2018, 9:19:44 AM1/3/18
to Jenkins Users
Now that we've added Discover tags[1] and a Build everything[2] strategy, we're running into Github quota limits quite frequently.

18:58:09 GitHub API Usage: Current quota has 1110 remaining (5 over budget). Next quota of 5000 in 13 min. Sleeping for 29 sec.

We've had to extend the Scan Organization Triggers -> Periodically if not otherwise run setting to be 8 hours, to help limit the amount of scans, but that hasn't completely solved  this issue, nor is it the goal we want to achieve. 

There's an open bug about the time setting and github quota limits  (JENKINS-47154[3]), but it's not relevant in this case. 
So I'm wondering if it's a bug in the github-branch-source-plugin? or in the Build everything extension? or is there simply an easy way to request Jenkins to have a higher API quota from GitHub?


REF:
1. https://issues.jenkins-ci.org/browse/JENKINS-34395
2. https://github.com/jenkinsci/github-branch-source-plugin/pull/158#issuecomment-332842623
3. https://issues.jenkins-ci.org/browse/JENKINS-47154

Stephen Connolly

unread,
Jan 3, 2018, 9:42:15 AM1/3/18
to jenkins...@googlegroups.com
This is the limitation of 5000 requests per hour.

Ideally we would look into caching the github responses so that duplicate requests could be eliminated... but my preliminary analysis shows that would basically save about 50% of the requests.

The recommendation for "Scan Organization Triggers -> Periodically if not otherwise run" is at least 8 hours more likely somewhere between 24h and 7 days depending on how long you are willing to wait for a failure to deliver an event from GitHub.

There are only two good reasons to scan periodically:

1. To recover from missed events (keep in mind that follow-up commits will typically recover anyway, so the only case here is a commit before bedtime not being built by morning because that event was not delivered by GitHub)
2. To run the orphaned item strategies (which is probably fine at once per week for most people)

The only other reason to scan periodically is a bad one, namely

* You cannot set up push notification from GitHub



--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/4079f366-003e-4ac4-8aea-462ef4ed2090%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Connolly

unread,
Jan 3, 2018, 9:47:07 AM1/3/18
to jenkins...@googlegroups.com
On 3 January 2018 at 14:41, Stephen Connolly <stephen.al...@gmail.com> wrote:
This is the limitation of 5000 requests per hour.

Ideally we would look into caching the github responses so that duplicate requests could be eliminated... but my preliminary analysis shows that would basically save about 50% of the requests.

The recommendation for "Scan Organization Triggers -> Periodically if not otherwise run" is at least 8 hours more likely somewhere between 24h and 7 days depending on how long you are willing to wait for a failure to deliver an event from GitHub.

There are only two good reasons to scan periodically:

1. To recover from missed events (keep in mind that follow-up commits will typically recover anyway, so the only case here is a commit before bedtime not being built by morning because that event was not delivered by GitHub)
2. To run the orphaned item strategies (which is probably fine at once per week for most people)

The only other reason to scan periodically is a bad one, namely

* You cannot set up push notification from GitHub



On 3 January 2018 at 14:19, <j.kn...@travelaudience.com> wrote:
Now that we've added Discover tags[1] and a Build everything[2] strategy, we're running into Github quota limits quite frequently.

18:58:09 GitHub API Usage: Current quota has 1110 remaining (5 over budget). Next quota of 5000 in 13 min. Sleeping for 29 sec.

We've had to extend the Scan Organization Triggers -> Periodically if not otherwise run setting to be 8 hours, to help limit the amount of scans, but that hasn't completely solved  this issue, nor is it the goal we want to achieve. 

There's an open bug about the time setting and github quota limits  (JENKINS-47154[3]), but it's not relevant in this case. 
So I'm wondering if it's a bug in the github-branch-source-plugin? or in the Build everything extension? or is there simply an easy way to request Jenkins to have a higher API quota from GitHub?
Good luck with that... They seem to follow the principle that 5000/hr is all anyone gets... if you must have more I think they want you to go GitHub Enterprise.

At some point we will probably need to move to the v4 API, that might let us fetch responses with more tuning... but that still has a limit approximate to 5000/hr: https://developer.github.com/v4/guides/resource-limitations/ the only difference is we might be able to bulk fetch in a single request a lot of the things that we need to make 3-4 requests to collect.

We will still hit issues when we then need to check for marker files as I do not think that is something that can be done in a single v4 API call

j.kn...@travelaudience.com

unread,
Jan 3, 2018, 10:52:03 AM1/3/18
to Jenkins Users
There are only two good reasons to scan periodically:
1. To recover from missed events (keep in mind that follow-up commits will typically recover anyway, so the only case here is a commit before bedtime not being built by morning because that event was not delivered by GitHub)
From my experience working with developers, that isn't the only use case. The more common use case (when a missed event happens) is that they pushed a commit and are waiting for it to proceed through the pipeline and notify them. Fast notification is a key to good CI/CD. So while missed events are not a frequent occurrence, waiting 7 days isn't an option, and the only other solution for a developer is to have an in-depth knowledge of Jenkins and know that this issue exists. 
 
2. To run the orphaned item strategies (which is probably fine at once per week for most people)
Totally agree, that's fine


As we already have a few repos with over 500 tags (and mind you these are still new repos), I expect that this issue will impact others as they begin to implement the ability to scan tags even with a 24 hour interval. 

----

Also, the recommendation in the UI for the interval setting is:
Subsequent commits should trigger indexing anyway and result in the commit being picked up, so most people will pick between 4 hours and 1 day
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-use...@googlegroups.com.

j.kn...@travelaudience.com

unread,
Jan 4, 2018, 8:27:42 AM1/4/18
to Jenkins Users
@Stephen
You mention that caching the responses would "save about 50% of the requests." That seems like a significant savings to me.

I'm also wondering, I'm seeing a lot of things like this in the scan log:
Checking tag v0.28.1
     
Jenkinsfile not found
   
Does not meet criteria
19:01:38 GitHub API Usage: Current quota has 901 remaining (4 under budget). Next quota of 5000 in 10 min

   
Checking tag v0.28.2
     
Jenkinsfile not found
   
Does not meet criteria
19:01:38 GitHub API Usage: Current quota has 897 remaining (0 under budget). Next quota of 5000 in 10 min

   
Checking tag v0.28.3
     
Jenkinsfile not found
   
Does not meet criteria
19:01:38 GitHub API Usage: Current quota has 894 remaining (3 over budget). Next quota of 5000 in 10 min. Sleeping for 27 sec.
19:02:06 GitHub API Usage: Current quota has 894 remaining (26 under budget). Next quota of 5000 in 9 min 53 sec


That seems to me like each tag invokes an api request? And with 500+ tags, that seems like a lot of unneeded calls (most especially when Jenkins doesn't even track/build the tag). Or am I reading the logs incorrectly? If that is the case then a cache might save over 90% of the requests in this case. 
Should I create a Jira ticket for this? 

Stephen Connolly

unread,
Jan 4, 2018, 9:02:11 AM1/4/18
to jenkins...@googlegroups.com
On 4 January 2018 at 13:27, <j.kn...@travelaudience.com> wrote:
@Stephen
You mention that caching the responses would "save about 50% of the requests." That seems like a significant savings to me.

I'm also wondering, I'm seeing a lot of things like this in the scan log:
Checking tag v0.28.1
     
Jenkinsfile not found
   
Does not meet criteria
19:01:38 GitHub API Usage: Current quota has 901 remaining (4 under budget). Next quota of 5000 in 10 min

   
Checking tag v0.28.2
     
Jenkinsfile not found
   
Does not meet criteria
19:01:38 GitHub API Usage: Current quota has 897 remaining (0 under budget). Next quota of 5000 in 10 min

   
Checking tag v0.28.3
     
Jenkinsfile not found
   
Does not meet criteria
19:01:38 GitHub API Usage: Current quota has 894 remaining (3 over budget). Next quota of 5000 in 10 min. Sleeping for 27 sec.
19:02:06 GitHub API Usage: Current quota has 894 remaining (26 under budget). Next quota of 5000 in 9 min 53 sec


That seems to me like each tag invokes an api request? And with 500+ tags, that seems like a lot of unneeded calls (most especially when Jenkins doesn't even track/build the tag).

Why are you discovering tags if you don't want tags?

Every branch/tag/PR you discover needs at least one request to verify that the marker file is present.

If you don't want tags, don't discover them and you will save a lot of requests.
  
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/628123fe-ab0f-4139-b307-15b4c4470b66%40googlegroups.com.

j.kn...@travelaudience.com

unread,
Jan 4, 2018, 9:13:06 AM1/4/18
to Jenkins Users
I do want tags. I want tags very much. I'm very happy this feature is finally available. 
There just happens to be some tags in that repo that reference commits in which no Jenkinsfile exists, and I happened to copy those examples.

Here is a better example:
  Checking tag v1.1.0
     
Jenkinsfile found
   
Met criteria
No changes detected: v1.1.0 (still at d11d5c94130db1b43dea147091c2cfc2d260b2c1)
19:05:09 GitHub API Usage: Current quota has 677 remaining (0 under budget). Next quota of 5000 in 6 min 50 sec

   
Checking tag v1.1.1
     
Jenkinsfile found
   
Met criteria
No changes detected: v1.1.1 (still at 20b7a9ccd47f9e10165268ccc252bc4b793a61fc)
19:05:09 GitHub API Usage: Current quota has 675 remaining (2 over budget). Next quota of 5000 in 6 min 50 sec. Sleeping for 26 sec.
19:05:36 GitHub API Usage: Current quota has 675 remaining (26 under budget). Next quota of 5000 in 6 min 23 sec



Stephen Connolly

unread,
Jan 4, 2018, 9:27:37 AM1/4/18
to jenkins...@googlegroups.com
If you know those tags will never match, you could add a filter to exclude them from discovery.

Part of the issue here is that Multibranch doesn't know if the SCMCriteria has changed from the last time it saw that revision (because Jenkins config is a filesystem, who knows what was restored, edited with vi, etc)... on top of that, this is a tag that was not discovered, so it doesn't actually have a place to store the revision.

Consequently, it will check for the Jenkinsfile every time you do a full scan.

To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/3bdd60ea-91b2-4e82-9b24-b4e4583f1d08%40googlegroups.com.

j.kn...@travelaudience.com

unread,
Jan 4, 2018, 11:43:38 AM1/4/18
to Jenkins Users
Ok, than I think I misunderstand what the scan is doing. 
During the scan, Jenkins creates a list in memory of all branches, tags, PRs. It does it from a single api call? Or from an api call for each type?
And then while iterating over that list, for each entity Jenkins makes an api call to get the Jenkinsfile (or to find out it doesn't exist)?

If that's the case, it doesn't sound like there is much to be done in the current setup. 

This is a problem though, because as more and more tags come, there is no logical way to keep adding them to the filter if Jenkins is the only source of truth on if the tag has already been built. As in, those few tags that don't reference a commit with a Jenkinsfile could just be deleted from github, but it doesn't fix the problem, just delays it a couple weeks.


Stephen Connolly

unread,
Jan 4, 2018, 12:01:02 PM1/4/18
to jenkins...@googlegroups.com
On 4 January 2018 at 16:43, <j.kn...@travelaudience.com> wrote:
Ok, than I think I misunderstand what the scan is doing. 
During the scan, Jenkins creates a list in memory of all branches, tags, PRs. It does it from a single api call? Or from an api call for each type?

At least one API call for each requested (or implied requested) type.

e.g. If there are more than 100 branches then it will take more than one request to get all branches as the page size is 100
e.g. If you request to build branches that are not also filed as pull requests, then that implies we need the list of pull requests (even if you didn't select Discover Pull Requests) 
 
And then while iterating over that list, for each entity Jenkins makes an api call to get the Jenkinsfile (or to find out it doesn't exist)?

Correct.
 

If that's the case, it doesn't sound like there is much to be done in the current setup. 

A quick win might be to maintain a secondary state file that tracks the hash of the XML config for the SCMSourceCriteri and the hash of the XML of each revision for each discovered "head". If the hashes are the same, then we can assume no need to recheck.
  
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/0acbea7c-c882-437d-abfe-859a7763996b%40googlegroups.com.

Alicia Doblas

unread,
Jan 5, 2018, 11:13:43 AM1/5/18
to Jenkins Users
Hi,

a couple of months ago we had the same problem. After improving the discover filter of the branch/tag/PR calls, there weren't too much to do...so we decided to "by-pass" the api of github by using different users. Looks like the api limit applies to a single user, so the "solution" for us was to split our jobs into different groups, each of them using a different user for the api call. 

By changing this configuration we can manage 5000 req/hour x N groups.

R. Tyler Croy

unread,
Jan 5, 2018, 11:30:40 PM1/5/18
to jenkins...@googlegroups.com
(replies inline)
This is very very much against the GitHub.com terms of service, which states
that one legal entity can have one free machine account.

https://help.github.com/articles/github-terms-of-service/#b-account-terms



- R. Tyler Croy

------------------------------------------------------
Code: <https://github.com/rtyler>
Chatter: <https://twitter.com/agentdero>
xmpp: rty...@jabber.org

% gpg --keyserver keys.gnupg.net --recv-key 1426C7DC3F51E16F
------------------------------------------------------
signature.asc

R. Tyler Croy

unread,
Jan 7, 2018, 10:07:08 AM1/7/18
to jenkins...@googlegroups.com
Sender: jenkins...@googlegroups.com
On-Behalf-Of: ty...@monkeypox.org
Subject: Re: github quota limit when scanning with the addition of tags
Message-Id: <20180106042952....@blackberry.coupleofllamas.com>
Recipient: andre...@nab.com.au

The information contained in this email and its attachments may be confidential.
If you have received this email in error, please notify the sender by return email,
delete this email and destroy any copy.

Any advice contained in this email has been prepared without taking into
account your objectives, financial situation or needs. Before acting on any
advice in this email, National Australia Bank Limited (NAB) recommends that
you consider whether it is appropriate for your circumstances.
If this email contains reference to any financial products, NAB recommends
you consider the Product Disclosure Statement (PDS) or other disclosure
document available from NAB, before making any decisions regarding any
products.

If this email contains any promotional content that you do not wish to receive,
please reply to the original sender and write "Don't email promotional
material" in the subject.

signature.asc

j.kn...@travelaudience.com

unread,
Oct 4, 2018, 9:29:19 AM10/4/18
to Jenkins Users
I've started looking into this issue again. It's been an ongoing problem, that only continues to get worse over time. We're using the basic-branch-build-strategies-plugin at the moment, which provides an option for not building tags older than a week. But this option provides no value in limiting our quota usage. And based the investigation I just did, no matter what build strategy that gets applied, the issue will persist. 

The GitHubSCMSource plugin processes each tag with an api request
BEFORE the build strategies get evaluated

This is a bit frustrating because the information that is used to filter (tagDate) is already known before making the additional request to Github.

--------------

In related news, I noticed that the github cache is going to be added back in. That may help eliminate some of the pain this topic is causing us, but regardless, there seems to me to be a fault in the way build strategies and api requests work.
Reply all
Reply to author
Forward
0 new messages