[JIRA] (JENKINS-47154) GitHub Rate Limits are compared using the Jenkins master time not the http response's Date

0 views
Skip to first unread message

calvarez@libon.com (JIRA)

unread,
Apr 11, 2019, 6:09:02 AM4/11/19
to jenkinsc...@googlegroups.com
Carmen Alvarez commented on Bug JENKINS-47154
 
Re: GitHub Rate Limits are compared using the Jenkins master time not the http response's Date

Our Jenkins machine's clock is configured correctly.

We are often finding jobs blocked for 5, 10, 20 minutes or more, because of supposed quota issues, but we're actually far from going over quota. Example:

09:47:29 GitHub API Usage: The quota may have been refreshed earlier than expected, rechecking...
09:47:30 GitHub API Usage: Current quota has 3104 remaining (331 over budget). Next quota of 5000 in 50 min. Sleeping for 5 min 42 sec.

It would be nice to be able to disable this check, and just take the risk that a quota limit may be reached during a job. I don't think that would actually be the case for us, given that we have plenty of requests remaining.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)

stephen.alan.connolly@gmail.com (JIRA)

unread,
Apr 11, 2019, 7:36:03 AM4/11/19
to jenkinsc...@googlegroups.com

Current quota has 3104 remaining (331 over budget). Next quota of 5000 in 50 min.

It would be nice to be able to disable this check, and just take the risk that a quota limit may be reached during a job. I don't think that would actually be the case for us, given that we have plenty of requests remaining.

So it may help to understand how the plugin manages the quota.

The quota gets divided into three parts:

  • Burst (this is about 15-20%)
  • Normal (this is about 75-80%)
  • Exception (this is about 5%)

The idea is that the Normal quota is divided up over the whole hour. The Burst quota is for "bursty" use cases. The Exception quota is in case any requests accidentally use more than we have managed (because all your Jenkins instances are - per the GitHub ToS - supposed to be using the same API Key, so if you have 5 masters, without the Exception quota, 4 of them could erroneously think that there is an API call unused and then they would fail)

Further complications arise because the GitHub API Java client library we are using does not make it easy to predict how many API calls will be made. For example, if you ask for a list of all the repositories in an organization... that is a paged API and the GitHub API Java client library will just return an Iterator that masks all the background API calls while you iterate the list... it may even serve the request from cached state... so the API call can be anywhere from 0 to infinite requests (realistically no more than 3-4 requests for most organizations)... its worse for listing PRs where you can have many thousands and the page size is typically 50-100.

 

So what we do (all numbers are from memory and for illustrative purposes only):

  • is we use a linear allocation strategy for the Normal quota... e.g. we allow approx 4000 requests per hour, at 10 minutes into the hour the budget says we should have 4000*(60-10)/60 requests left in the Normal quota, e.g. 3333.
  • Then we add on the Exception quota (approx 150) 
  • This gives the budget plan for now, i.e. 3333+150 = 3483 requests should be remaining
  • If the actual requests remaining is more than this number, the request will be made. If the requests remaining is less than this number then you get the log message you see. For example if the requests remaining is 3104 then we have spent 379 requests more than the budget. Since we know the rate at which the Normal quota will be divided out, we can then sleep until such a time as the spend will be expected to be back within budget, at 66 requests per minute that overspend of 379 will be back on track if we make no more requests for the next approx 6 minutes.

So what has happened is: in the first 10 minutes you already burned through the Burst allocation of 1000 requests. Yes we could burst more, but that just means that probabilistically your builds will be delayed for an hour once the bigger burst is burned through.

The current strategy means that all jobs have an equal chance of getting the "allocation". The larger the burst the less even the spread of those 1.4 requests per second will be and then you run the risk that specific jobs will never win and thus never get CI.

I did a lot of experimenting on different strategies, this one was the least worst. We could tweak the Burst to Normal ratio somewhat, but 1:4 produced fairer results overall while allowing for faster response to webhook notifications

calvarez@libon.com (JIRA)

unread,
Apr 11, 2019, 7:42:03 AM4/11/19
to jenkinsc...@googlegroups.com

Thanks for the explanations on the strategy.

Would still like an option to disable the check. Maybe the strategy is currently the best possible on average for most cases. But in case it's not adapted for a particular project, it would be nice to skip the check.

stephen.alan.connolly@gmail.com (JIRA)

unread,
Apr 11, 2019, 8:50:01 AM4/11/19
to jenkinsc...@googlegroups.com

I have yet to see an actual project where a larger burst results in better behaviour. I suggest you build a custom version of the plugin as I predict you will come to the same conclusion once you see the side-effect of a larger burst. I believe you only need to change this line: https://github.com/jenkinsci/github-branch-source-plugin/blob/c3cc6c52992ccdf9f7fd03dc7e56264cfb3f1f26/src/main/java/org/jenkinsci/plugins/github_branch_source/Connector.java#L584 probably the `Math.max(200, rateLimit.limit / 5)` to `Math.max(200, rateLimit.limit * 4 / 5)` which would swap from 1:4 to 4:1 between burst and normal quotas

nathan@nightsys.net (JIRA)

unread,
Jul 11, 2019, 3:24:02 PM7/11/19
to jenkinsc...@googlegroups.com

> because all your Jenkins instances are - per the GitHub ToS - supposed to be using the same API Key

before reading this - I had always wondered if we could setup multiple GH "bot" accounts with keys to try parallelize things and work around this issue, but the ToS specifically says "You may not share API tokens to exceed GitHub's rate limitations.", so that idea is out  I understand why they specify that, and it's a valid clause.

would it be possible to implement some form of log of the qty of each type of API call to GitHub (maybe by endpoint?) to see which areas of the usage are hurting the quotas the most? are there places where information can be cached to try cut down on API call volumes?

or to rephrase - should <someone> be trying to make the plugin's API call usage more lean and efficient as a solution to the root cause?

ingwar@ingwar.eu.org (JIRA)

unread,
Oct 8, 2019, 7:11:03 AM10/8/19
to jenkinsc...@googlegroups.com

I dont think its plugin issue..

I found out BUG in GH api that sometimes shift reset time..
I created even ticket for GH.. but so far they did nothing about it..

Script that allow checking bug yourself (sometimes it needs few H to catch it for us)..

def client = // here configure your client
def oldStats = client.getRateLimit()
while (true) {
    GHRateLimit stats = client.getRateLimit()
    if (oldStats.remaining < 4800 && stats.remaining < oldStats.remaining && stats.resetDate > oldStats.resetDate) {
       log.error("Rate limits error: ${client.myself.login}\nold: ${oldStats}\nnew: ${stats}")
    }
    sleep 10000
}

 

And results are:

[main] ERROR java.lang.Class - Rate limits error: xx
old: GHRateLimit{remaining=4355, limit=5000, resetDate=Wed Sep 04 00:01:33 CEST 2019}
new: GHRateLimit{remaining=4303, limit=5000, resetDate=Wed Sep 04 00:04:58 CEST 2019}
[main] ERROR java.lang.Class - Rate limits error: xx
old: GHRateLimit{remaining=1258, limit=5000, resetDate=Wed Sep 04 00:04:58 CEST 2019}
new: GHRateLimit{remaining=1205, limit=5000, resetDate=Wed Sep 04 00:16:08 CEST 2019}
[main] ERROR java.lang.Class - Rate limits error: xx
old: GHRateLimit{remaining=3888, limit=5000, resetDate=Wed Sep 04 00:38:51 CEST 2019}
new: GHRateLimit{remaining=3856, limit=5000, resetDate=Wed Sep 04 00:43:13 CEST 2019}

As you see remaining limit was not reset but time was shifted.. and in that time our jenkins just stop waiting for quota..

 

 

This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
Atlassian logo

mshade@mshade.org (JIRA)

unread,
Oct 29, 2019, 4:44:05 PM10/29/19
to jenkinsc...@googlegroups.com

> I have yet to see an actual project where a larger burst results in better behaviour.

 

Consider a project (or org) whose API key usage is shared across multiple tools (as the GH TOS require). What we're finding is that sometimes when another tool consumes a burst of requests, Jenkins is rate limiting itself to the overall detriment. We may never drop below, say, 3k available API requests, yet are unable to spend them on queued jobs in Jenkins. It's maddening to hit this and not have a TOS-compliant workaround available within the plugin.

Any chance of reconsidering a simple toggle to disable the rate limit function? Forking and separately maintaining the plugin for this purpose seems excessive.

 

mshade@mshade.org (JIRA)

unread,
Oct 29, 2019, 4:44:06 PM10/29/19
to jenkinsc...@googlegroups.com
Mike Shade edited a comment on Bug JENKINS-47154
> {quote} I have yet to see an actual project where a larger burst results in better behaviour.
{quote}
 

Consider a project (or org) whose API key usage is shared across multiple tools (as the GH TOS require). What we're finding is that sometimes when another tool consumes a burst of requests, Jenkins is rate limiting itself to the overall detriment. We may never drop below, say, 3k available API requests, yet are unable to spend them on queued jobs in Jenkins. It's maddening to hit this and not have a TOS-compliant workaround available within the plugin.

Any chance of reconsidering a simple toggle to disable the rate limit function? Forking and separately maintaining the plugin for this purpose seems excessive.

 

bitwiseman@gmail.com (JIRA)

unread,
Nov 4, 2019, 10:04:05 PM11/4/19
to jenkinsc...@googlegroups.com

The 10 hours issue may have been due to a date parsing issue in github-api: https://github.com/github-api/github-api/commit/4802c97e89b0386fe6bb9cac63c97783e314b61d#diff-199201938f94e2f9ab096fe20ec4ab78 

As for disabling the rate limit function we've already got a PR in the works to provide a simpler implementation.  https://github.com/jenkinsci/github-branch-source-plugin/pull/242  But the last bit of testing seems to have stalled out.  Helping get this PR over the line would seem to be a much better use of your time than forking.

However, not checking it is not an option.  If you exceed the rate limit, GitHub will make you feel their pain.

 

 

 

bitwiseman@gmail.com (JIRA)

unread,
Nov 4, 2019, 10:04:06 PM11/4/19
to jenkinsc...@googlegroups.com
Liam Newman edited a comment on Bug JENKINS-47154

bitwiseman@gmail.com (JIRA)

unread,
Nov 6, 2019, 9:09:03 PM11/6/19
to jenkinsc...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages