What does "Dry run: Exceeded global retry quota" mean?

71 views
Skip to first unread message

Brett Wilson

unread,
Jul 1, 2015, 1:22:10 PM7/1/15
to Chromium-dev
This seems to be the current version of "Your try run failed" error message but it doesn't make any sense to me.

Can we replace it with something more clear, like passive-aggressive insults about my (*) coding abilities?

Brett


(*) By "my" I mean the patch author. Please do not blame Brett for every failed try run, tempting though it may be.

Ryan Tseng

unread,
Jul 1, 2015, 2:50:19 PM7/1/15
to Brett Wilson, infr...@chromium.org, Chromium-dev
Do you have a link?

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Brett Wilson

unread,
Jul 1, 2015, 2:58:11 PM7/1/15
to Ryan Tseng, infr...@chromium.org, Chromium-dev
On Wed, Jul 1, 2015 at 11:49 AM, Ryan Tseng <hin...@chromium.org> wrote:
Do you have a link?

It's a commit-bot message. You can see 3 such examples on this CL: https://codereview.chromium.org/1220653002
 
Brett

Nodir Turakulov

unread,
Jul 1, 2015, 5:20:01 PM7/1/15
to Brett Wilson, Ryan Tseng, tan...@chromium.org, aku...@chromium.org, infr...@chromium.org, Chromium-dev

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CABiGVV-UgAkQC9WAGDbz7UqgmRvgcWEiK-j_3Ru4%2BNyB_KfdGg%40mail.gmail.com.

Nico Weber

unread,
Jul 1, 2015, 6:19:57 PM7/1/15
to Nodir Turakulov, Brett Wilson, Ryan Tseng, tan...@chromium.org, aku...@chromium.org, infr...@chromium.org, Chromium-dev
It seems like this just means "something is wrong with your patch, but you need to click through to the bots to see what". On a few CLs that I saw this on today, it was caused by the patch not applying on trunk and needing to be rebased. One of your red bots was a compile failure.

--

Primiano Tucci

unread,
Jul 1, 2015, 6:49:42 PM7/1/15
to Nico Weber, Adrian Kuegel, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org

Paweł Hajdan, Jr.

unread,
Jul 2, 2015, 11:24:09 AM7/2/15
to Primiano Tucci, Nico Weber, Adrian Kuegel, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
Sorry about that - I admit it's a poor error message and a bad experience. I have landed a CL to make it better.

Curious, do you mostly look at email for CQ status, or something else? FWIW I usually just visit the CL page in Rietveld and corresponding CQ status page. Maybe that doesn't reflect the real usage scenarios, and so I'd like to understand them more and make improvements - do you have suggestions what we could further change?

For a general discussion of retry quotas, see above bug link provided by Primiano. I'd certainly be interested in your thoughts about this and what is the most important thing for CQ. Please note cycle time and false rejection rate are quite related - we can trade one for another quite easily, but keeping both low may require some more work - e.g. making tests less flaky in general.

Paweł

Brett Wilson

unread,
Jul 2, 2015, 11:44:58 AM7/2/15
to Paweł Hajdan, Jr., Primiano Tucci, Nico Weber, Adrian Kuegel, Ryan Tseng, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
On Thu, Jul 2, 2015 at 8:22 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Sorry about that - I admit it's a poor error message and a bad experience. I have landed a CL to make it better.

Curious, do you mostly look at email for CQ status, or something else? FWIW I usually just visit the CL page in Rietveld and corresponding CQ status page. Maybe that doesn't reflect the real usage scenarios, and so I'd like to understand them more and make improvements - do you have suggestions what we could further change?

I only look at the CL page.

Brett

Nico Weber

unread,
Jul 2, 2015, 5:24:12 PM7/2/15
to Paweł Hajdan, Jr., Primiano Tucci, Adrian Kuegel, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
On Thu, Jul 2, 2015 at 8:22 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Sorry about that - I admit it's a poor error message and a bad experience. I have landed a CL to make it better.

Curious, do you mostly look at email for CQ status, or something else? FWIW I usually just visit the CL page in Rietveld and corresponding CQ status page. Maybe that doesn't reflect the real usage scenarios

I mostly look at email and click through to rietveld if I get a "things failed" email. I look at the CQ status page only every now and then, when a CL feels like it takes too long to land and I want to see why.

Dana Jansens

unread,
Jul 2, 2015, 7:42:11 PM7/2/15
to Nico Weber, Paweł Hajdan, Jr., Primiano Tucci, Adrian Kuegel, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
On Thu, Jul 2, 2015 at 2:23 PM, Nico Weber <tha...@chromium.org> wrote:
On Thu, Jul 2, 2015 at 8:22 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Sorry about that - I admit it's a poor error message and a bad experience. I have landed a CL to make it better.

Curious, do you mostly look at email for CQ status, or something else? FWIW I usually just visit the CL page in Rietveld and corresponding CQ status page. Maybe that doesn't reflect the real usage scenarios

I mostly look at email and click through to rietveld if I get a "things failed" email. I look at the CQ status page only every now and then, when a CL feels like it takes too long to land and I want to see why.

This.

Paweł Hajdan, Jr.

unread,
Jul 7, 2015, 9:18:06 AM7/7/15
to Dana Jansens, Nico Weber, Primiano Tucci, Adrian Kuegel, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
Alright. Do you have a suggestion what we could do to improve the workflow?

Paweł

Dana Jansens

unread,
Jul 7, 2015, 4:03:47 PM7/7/15
to Paweł Hajdan, Jr., Nico Weber, Primiano Tucci, Adrian Kuegel, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
On Tue, Jul 7, 2015 at 6:16 AM, Paweł Hajdan, Jr. <phajd...@chromium.org> wrote:
Alright. Do you have a suggestion what we could do to improve the workflow?

Why isn't this error just an internal state of the CQ, and it continue to make progress when it's able to? Why does this error require human visibility/interaction?

Nico Weber

unread,
Jul 7, 2015, 4:07:17 PM7/7/15
to Paweł Hajdan, Jr., Dana Jansens, Primiano Tucci, Adrian Kuegel, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
Last week, this message appeared when bots failed for any reason (if a patch didn't apply or whatnot). In these cases "Jobs failed, check your try bots to see what's up" would be a better message.

Adrian Kuegel

unread,
Jul 10, 2015, 7:49:10 AM7/10/15
to Dana Jansens, Paweł Hajdan, Jr., Nico Weber, Primiano Tucci, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
Just to clarify: this is not an internal error of CQ, it is a sign that there might be serious problems with the CL being tested, because it fails on several builders. Unfortunately it also happens if there is some general serious problem (for example tree is broken), but this is difficult for the CQ to distinguish. So unless we can distinguish these two cases, there is no point in "continuing to make process", we would just waste resources.

Nico Weber

unread,
Jul 10, 2015, 10:54:51 AM7/10/15
to Adrian Kuegel, Dana Jansens, Paweł Hajdan, Jr., Primiano Tucci, Ryan Tseng, Brett Wilson, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
As suggested above, maybe a good first step would be to change the error message to be less cryptic?

Brett Wilson

unread,
Jul 10, 2015, 11:34:33 AM7/10/15
to Nico Weber, Adrian Kuegel, Dana Jansens, Paweł Hajdan, Jr., Primiano Tucci, Ryan Tseng, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
On Fri, Jul 10, 2015 at 7:54 AM, Nico Weber <tha...@chromium.org> wrote:
As suggested above, maybe a good first step would be to change the error message to be less cryptic?

Yes, it was obvious when I got this message that my patch failed. But then this message should say "CQ run failed." If it fails sometimes due to other issues, then that's fine, I'm used to things being a little bit flaky. But I've never known what a global retry quota is.

Brett

Brett Wilson

unread,
Jul 10, 2015, 3:29:17 PM7/10/15
to Nico Weber, Adrian Kuegel, Dana Jansens, Paweł Hajdan, Jr., Primiano Tucci, Ryan Tseng, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
Actually, this message should just be deleted. The CQ messages are:

Dry run: Try jobs failed on following builders:
  ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED,

(exceeded global retry quota)

The last line there provides no information and is just misleading. It implies to me that my patch failed for reasons *other* than my patch is bad.

Brett

Dirk Pranke

unread,
Jul 10, 2015, 5:11:30 PM7/10/15
to Brett Wilson, Nico Weber, Adrian Kuegel, Dana Jansens, Paweł Hajdan, Jr., Primiano Tucci, Ryan Tseng, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
+1

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.

Paweł Hajdan, Jr.

unread,
Jul 17, 2015, 12:22:44 PM7/17/15
to Dirk Pranke, Brett Wilson, Nico Weber, Adrian Kuegel, Dana Jansens, Primiano Tucci, Ryan Tseng, Nodir Turakulov, infr...@chromium.org, Chromium-dev, tan...@chromium.org
I removed the confusing message. Thanks for the feedback about that.

Dana, I agree CQ could be smarter and do some retries "in the background". It's just not obvious how to cleanly do that in the current design/architecture.

We're moving to next generation system based on Dungeon Master (and not depending on buildbot), which will make smarter logic possible.

Paweł
Reply all
Reply to author
Forward
0 new messages