PSA: Cancel your old Try pushes

Ryan VanderMeulen

unread,

Apr 15, 2016, 11:47:13 AM4/15/16

to dev-platform, Firefox Dev

I'm sure most of you have experienced the pain of long backlogs on Try
(Windows in particular). While we'd all love to have larger pools of test
machines (and our Ops people are actively working on improving that!), one
often-overlooked thing people can do to help with the backlog Right Now is
to cancel pending jobs on pushes they no longer need (i.e. newer push to
Try, broken patch, already pushed to inbound, etc).

Treeherder makes it easy to do this - just hit the little circle with an X
icon on the right hand side adjacent to the "XX% - Y in progress" text
along the top bar of the push. You will be prompted whether you really want
to cancel all jobs on the push. Just hit OK and you're done.

Killing off unnecessary jobs can have a significant impact on wait times
and backlog, so your consideration is greatly appreciated!

Thanks,
Ryan

Tim Guan-tin Chien

unread,

Apr 15, 2016, 1:10:01 PM4/15/16

to Ryan VanderMeulen, dev-platform, Firefox Dev

I wonder if there is any use cases to do multiple Try pushes of different
changesets but with the same bug number. Should we automatically cancel the
old ones when there is a new one?

> _______________________________________________
> firefox-dev mailing list
> firef...@mozilla.org
> https://mail.mozilla.org/listinfo/firefox-dev
>
>

James Graham

unread,

Apr 15, 2016, 1:20:20 PM4/15/16

to dev-pl...@lists.mozilla.org

On 15/04/16 18:09, Tim Guan-tin Chien wrote:
> I wonder if there is any use cases to do multiple Try pushes of different
> changesets but with the same bug number. Should we automatically cancel the
> old ones when there is a new one?

Unfortunately there are legitimate uses for e.g. comparing the effects
of two different changesets related to the same bug.

On the other hand, without thinking too hard about the implementation
details (which I am inclined to believe would be more complex than you
might expect due to missing APIs, auth, etc.), it seems like it might be
possible to extend |mach try| to prompt to cancel old pushes for the
same bug.

Jonas Sicking

unread,

Apr 15, 2016, 3:37:04 PM4/15/16

to James Graham, dev-platform

We could also make the default behavior be to cancel old pushes. And
then enable push message syntax for opting in to not cancelling.

/ Jonas

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Jim Blandy

unread,

Apr 15, 2016, 3:45:05 PM4/15/16

to Jonas Sicking, James Graham, dev-platform

On Fri, Apr 15, 2016 at 12:36 PM, Jonas Sicking <jo...@sicking.cc> wrote:

> We could also make the default behavior be to cancel old pushes. And
> then enable push message syntax for opting in to not cancelling.
>
>

This could be very frustrating (and cause farm work to be wasted) if it
happened accidentally.

Perhaps it would be less error-prone to require an explicit choice of
overlapping or cancellation, and immediately reject pushes that haven't
chosen one or the other, for bugs that already have running try pushes.

Mike Connor

unread,

Apr 15, 2016, 3:46:32 PM4/15/16

to Ryan VanderMeulen, dev-platform, Firefox Dev

If this is a serious problem, and I can easily believe that it is, have we
considered having a default behaviour of cancelling all unfinished Try jobs
running for a given user when they push again? Based on how I've seen
people use Try over the years, I suspect a significant majority of pushes
are updated versions of previous pushes.

For cases where a developer needs to run multiple runs at once, we can add
an override in the trychooser syntax. I think that's a corner case, and I
don't think it would be a major burden vs. the cost/benefit for everyone
else.

I think the treeherder path we wouldn't need to auto-cancel, but we would
prompt when the user adds jobs from the web interface.

-- Mike

On Fri, Apr 15, 2016 at 11:47 AM, Ryan VanderMeulen <

Gijs Kruitbosch

unread,

Apr 15, 2016, 3:56:54 PM4/15/16

to Mike Connor, Firefox Dev

On 15/04/2016 20:46, Mike Connor wrote:
> For cases where a developer needs to run multiple runs at once, we can add
> an override in the trychooser syntax. I think that's a corner case

It isn't when you do any kind of talos "need to fix / not regress perf"
work.

I agree with Jim that we should either force the user to choose or
"only" warn when duplicate-bug trypushes happen.

~ Gijs

Steve Fink

unread,

Apr 15, 2016, 3:57:02 PM4/15/16

to dev-pl...@lists.mozilla.org

See bug 593096. (Hah! I anticipated you guys by almost 700,000 bugs!)
It's currently WONTFIX, but there's some relevant discussion in there.

On 04/15/2016 12:46 PM, Mike Connor wrote:
> If this is a serious problem, and I can easily believe that it is, have we
> considered having a default behaviour of cancelling all unfinished Try jobs
> running for a given user when they push again? Based on how I've seen
> people use Try over the years, I suspect a significant majority of pushes
> are updated versions of previous pushes.
>

> For cases where a developer needs to run multiple runs at once, we can add

Boris Zbarsky

unread,

Apr 15, 2016, 4:02:47 PM4/15/16

to

On 4/15/16 1:09 PM, Tim Guan-tin Chien wrote:
> I wonder if there is any use cases to do multiple Try pushes of different
> changesets but with the same bug number.

A significant fraction of my try pushes are like this, actually.
Typically this happens when I do a try push of a multi-changeset queue
for a bug, get some mysterious failures, then do several pushes of more
and more of the queue in an attempt to narrow down which changeset is
causing the failures.

> Should we automatically cancel the old ones when there is a new one?

If we set this up, I would probably just add whatever the "do not
cancel" override is to my try syntax for everything, to avoid the
inevitable footguns. Unless there were a way to uncancel.

-Boris

Kartikaya Gupta

unread,

Apr 15, 2016, 4:48:52 PM4/15/16

to Boris Zbarsky, dev-platform

I also often have multiple pushes going at the same time. My
suggestion to solve this problem is: have a cron job that detects
users who have more than N pushes with jobs still going, and send them
an email saying "you have a lot of jobs going, here's the list; you
might find something you should cancel in there".

Personally, I do keep an eye on my pushes and try to cancel stuff as
it becomes unnecessary, but sometimes there might be a few pending
jobs on very old pushes that I've neglected to cancel, and will likely
not push with that bug number anymore. An email reminder that lists
those old pushes would make me realize I can cancel them.

kats

Brian Grinstead

unread,

Apr 15, 2016, 5:07:56 PM4/15/16

to Jim Blandy, James Graham, dev-platform, Jonas Sicking

Explicit choice sounds good. I'd rather it not be required before pushing if it were a prompt. If it were a try syntax option I would likely set "do not cancel" as a default to prevent accidental cancellation.

My proposal: enhance mach try to surface this information and allow convenient cancellation. And if it were pushed using some other manner like a web ui or hg push then the default behavior would remain as it is today (to prevent losing work by default). So something like this:

$ ./mach try args

remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 2 changesets with 1 changes to 2 files (+1 heads)
remote: recorded push in pushlog
remote:
remote: View your changes here:
remote: https://hg.mozilla.org/try/rev/REV1
remote: https://hg.mozilla.org/try/rev/REV2
remote:
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=REV2
remote: recorded changegroup in replication log in 0.093s

Please help make Try faster by canceling old jobs. You have two existing builds for this bug:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=REV3 (50% complete)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=REV4 (90% complete)

Would you like to cancel these jobs? (Y/N)

Brian

Xidorn Quan

unread,

Apr 15, 2016, 8:11:55 PM4/15/16

to Ryan VanderMeulen, dev-platform, Firefox Dev

On Sat, Apr 16, 2016 at 1:47 AM, Ryan VanderMeulen <

rvande...@mozilla.com> wrote:

> I'm sure most of you have experienced the pain of long backlogs on Try
> (Windows in particular). While we'd all love to have larger pools of test
> machines (and our Ops people are actively working on improving that!), one
> often-overlooked thing people can do to help with the backlog Right Now is
> to cancel pending jobs on pushes they no longer need (i.e. newer push to
> Try, broken patch, already pushed to inbound, etc).
>
> Treeherder makes it easy to do this - just hit the little circle with an X
> icon on the right hand side adjacent to the "XX% - Y in progress" text
> along the top bar of the push. You will be prompted whether you really want
> to cancel all jobs on the push. Just hit OK and you're done.
>
> Killing off unnecessary jobs can have a significant impact on wait times
> and backlog, so your consideration is greatly appreciated!
>

Can we probably provide an additional banner on the top of the treeherder
which shows all try pushes one has pushed in progress? I suppose it would
make people easier to find and switch between their own try pushes, and
also make it more convenient to cancel old pushes they no longer need
without adding much annoyance.

- Xidorn

Steve Fink

unread,

Apr 15, 2016, 8:24:11 PM4/15/16

to dev-pl...@lists.mozilla.org

Doesn't everyone keep a tab open to their try page? eg I have
https://treeherder.mozilla.org/#/jobs?repo=try&author=sf...@mozilla.com
open all the time. I used to use the try emails to find my previous
pushes, which was a PITA. But that page is really very nice, and
provides an easy way to cancel pushes too.

Though I'd kind of like to be able to click on something to remove one
of those from the view, to make it easier to keep track of what's still
relevant to me.

Xidorn Quan

unread,

Apr 15, 2016, 8:33:31 PM4/15/16

to Steve Fink, dev-pl...@lists.mozilla.org

No, I don't. I open separate pages for each try push, and put them under in
the subtree of the tab of their corresponding bug.

A page showing all data of all my try pushes could be too long to be
useful. I think a brief information of in-progress pushes which serves as a
light warning would be the best.

- Xidorn

KWierso

unread,

Apr 15, 2016, 9:16:58 PM4/15/16

to

There's a Treeherder bug on file for providing a view like that.

Gijs Kruitbosch

unread,

Apr 16, 2016, 5:50:32 AM4/16/16

to

On 16/04/2016 01:24, Steve Fink wrote:
> Doesn't everyone keep a tab open to their try page? eg I have
> https://treeherder.mozilla.org/#/jobs?repo=try&author=sf...@mozilla.com
> open all the time.

No. Treeherder is too resource-intensive to keep open for long periods
of time. I tend to see multi-second pauses on my regular browser (beta)
on Windows. No idea what it does, but it's not good.

~ Gijs

L. David Baron

unread,

Apr 16, 2016, 1:12:19 PM4/16/16

to Gijs Kruitbosch, dev-pl...@lists.mozilla.org

Same for me. (Especially when interacting with
https://bugzilla.mozilla.org/show_bug.cgi?id=1229654 , although I've
had a smaller session the last few months.)

So I generally keep track of try pushes in email (using a
combination of the email try server sends, and an email that my
push-to-try alias sends to myself with headers that cause the two to
thread together), and only open the treeherder windows briefly when
I think to look at them.

On another note: I think we have two separate problems:
(a) some people use Try too much (trigger too many builds)
(b) some people don't use Try enough (break the tree)
There are probably a decent number of people in neither group.
There are probably very few if any people in both groups.

The two groups should get different advice about how to use try.
Advice that is important for one group may be counterproductive for
the other group, since people may not have a great idea of which
group (if either) that they're in.

We should instead use data to target advice on use of try to the
correct people, and also use that data to allow people to see where
they fit in in terms of ratios of Try resource usage to pushes, and
breaking the tree to pushes.

-David

--
𝄞 L. David Baron http://dbaron.org/ 𝄂
𝄢 Mozilla https://www.mozilla.org/ 𝄂
Before I built a wall I'd ask to know
What I was walling in or walling out,
And to whom I was like to give offense.
- Robert Frost, Mending Wall (1914)

signature.asc

Nicholas Nethercote

unread,

Apr 17, 2016, 12:38:04 AM4/17/16

to L. David Baron, dev-platform, Gijs Kruitbosch

On Sun, Apr 17, 2016 at 3:11 AM, L. David Baron <dba...@dbaron.org> wrote:
>
> We should instead use data to target advice on use of try to the
> correct people, and also use that data to allow people to see where
> they fit in in terms of ratios of Try resource usage to pushes, and
> breaking the tree to pushes.

We still have https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/highscores/highscores.html
which indicates who is using try resources the most. Slightly more
detail, such as percentages, could be useful.

We have no viewable data that I know of about who breaks the tree. I
personally would love to know where I stand in relation to others on
that front, so that I can adjust my habits if my rate is relatively
high :)

I use T-pushes on try a lot, where I build on all platforms (debug and
opt) but run tests only on one platform (debug and opt), usually
Linux64. I.e.: "try: -b do -p all -u none[x64] -t none[x64]".

- Building on all platforms is a good idea because it's easy to
introduce compile errors on only one platform. Likewise for building
both on debug and opt.

- Linux64 is a good choice for tests because they run on AWS and so
they rarely get backlogged, and they are fast, and you also get ASAN
and Valgrind and static analysis coverage.

- But testing only on one platform saves a lot of machine time because
tests take a lot longer than builds.

Nick

Steve Fink

unread,

Apr 17, 2016, 1:55:10 PM4/17/16

to dev-pl...@lists.mozilla.org

That's a very good point about keeping the treeherder page open eating
resources. I guess I've found that (1) my personal try push page takes
much, much longer to get overloaded than eg the inbound page (or worse,
the *full* try page); and (2) Firefox is so flaky on me for other
reasons that I end up having to restart it fairly often anyway. (And no,
it's not because of that page -- half the time I need to restart, that
tab isn't even loaded.)

Generally speaking, Firefox's stability has not been good for me for 2-3
months. I'd like to file a bug, but I've already used up my quota of
unactionable bugs, and if I dug into all of my idiosyncratic issues I'd
never get any work done. I seem to do things differently, in some
problematic way. I wish I could get more use out of the profiler &
cleopatra, but when I'm having issues it often can't manage to load that
page successfully. Plus, for some reason I am mentally incapable of
getting useful (actionable) data out of that UI (or the necessary data
isn't there? not sure.) PEBKAC, I'm sure.

On 04/16/2016 08:32 PM, Nicholas Nethercote wrote:
> I use T-pushes on try a lot, where I build on all platforms (debug and
> opt) but run tests only on one platform (debug and opt), usually
> Linux64. I.e.: "try: -b do -p all -u none[x64] -t none[x64]".

I think you meant "try: -b do -p all -u all[x64] -t all[x64]".

Mike Conley

unread,

Apr 17, 2016, 5:56:19 PM4/17/16

to Gijs Kruitbosch, Mozilla dev-platform mailing list mailing list

If I had to guess, I'd say that it's just consuming more and more memory as
more and more nodes are getting added to the DOM, and more AngularJS stuff
gets instantiated.

I would put good money on those multi-second pauses being attempts to GC /
CC

Mike Conley

unread,

Apr 17, 2016, 6:02:12 PM4/17/16

to Steve Fink, Mozilla dev-platform mailing list mailing list

> Generally speaking, Firefox's stability has not been good for me for 2-3
months. I'd like to file a bug, > but I've already used up my quota of
unactionable bugs, and if I dug into all of my idiosyncratic
> issues I'd never get any work done. I seem to do things differently, in
some problematic way. I wish I > could get more use out of the profiler &
cleopatra, but when I'm having issues it often can't manage
> to load that page successfully. Plus, for some reason I am mentally
incapable of getting useful
> (actionable) data out of that UI (or the necessary data isn't there? not
sure.) PEBKAC, I'm sure.

(Sorry for the dupe, sfink, since the first version only went to you)

This kind of sentiment concerns me a bit. Oftentimes, people will report
odd or undesirable behaviour from Firefox and we have to do painstaking
(and sometimes lossy) diagnosis over Bugzilla. When it's somebody within
the organization, I really don't think there's an excuse for us not finding
some kind of explanation for what's going in a situation like this - like,
we have (sorta[1]) access to the machine. I feel like that should be a
golden opportunity.

If we can't keep people in the org satisfied with the performance and
stability of Firefox, then I think that's a problem.

sfink - it's not clear to me yet what exact issues you've been experiencing
for the past 2-3 months, but I can try to help you dig into it, bit by bit
- I will also happily try to extract useful information from any Profiles
you're able to gather (the fact that you sometimes have difficulty
gathering them is concerning). Feel free to reach out to me on IRC.

[1]: Since I believe sfink is remote, but in the Bay Area.

Kartikaya Gupta

unread,

Apr 17, 2016, 8:42:24 PM4/17/16

to Steve Fink, dev-platform

On Apr 17, 2016 1:55 PM, "Steve Fink" <sf...@mozilla.com> wrote:
>
> Generally speaking, Firefox's stability has not been good for me for 2-3
months. I'd like to file a bug, but I've already used up my quota of
unactionable bugs, and if I dug into all of my idiosyncratic issues I'd
never get any work done.

Also (and I don't mean to single out sfink; I've heard similar things from
other people, and have been guilty of this myself) - filling bugs and
digging into issues *is* work, so saying "I'd never get any work done"
really means "I won't be able to get around to what I'm supposed to be
doing instead".

My view is that if we have a lot of bugs and regressions, time spent
investigating and fixing those naturally acts as a backflow to new feature
work, which prevents the introduction of even more bugs and regressions. So
really time spent investigating these issues is good in that it encourages
a self-correcting cycle, rather than just adding more regressions and work
for everybody that will never get done.

Of course, the tradeoff is that we have this goals system where you have to
state up front what you are going to do in a quarter and that makes it
harder to justify spending time on these other issues that often take a lot
of time as you poke around in unfamiliar code. Also falling behind on
features means we fall behind other browsers from a user perspective. My
interpretation of the push on quality, though, is that this is correct
tradeoff to make for us at this time, because great bug-free experiences
are more useful to us right now than a pile of half-baked features.

Cheers,
kats

Gijs Kruitbosch

unread,

Apr 18, 2016, 4:55:31 AM4/18/16

to Mike Conley, Steve Fink, Andrew Overholt

On 17/04/2016 22:54, Mike Conley wrote:
>> Generally speaking, Firefox's stability has not been good for me for 2-3
> months. I'd like to file a bug, > but I've already used up my quota of
> unactionable bugs, and if I dug into all of my idiosyncratic
>> issues I'd never get any work done. I seem to do things differently, in
> some problematic way. I wish I > could get more use out of the profiler &
> cleopatra, but when I'm having issues it often can't manage
>> to load that page successfully. Plus, for some reason I am mentally
> incapable of getting useful
>> (actionable) data out of that UI (or the necessary data isn't there? not
> sure.) PEBKAC, I'm sure.
>
> (Sorry for the dupe, sfink, since the first version only went to you)
>
> This kind of sentiment concerns me a bit. Oftentimes, people will report
> odd or undesirable behaviour from Firefox and we have to do painstaking
> (and sometimes lossy) diagnosis over Bugzilla. When it's somebody within
> the organization, I really don't think there's an excuse for us not finding
> some kind of explanation for what's going in a situation like this - like,
> we have (sorta[1]) access to the machine. I feel like that should be a
> golden opportunity.
>
> If we can't keep people in the org satisfied with the performance and
> stability of Firefox, then I think that's a problem.

I'll bite on a somewhat related issue to sfink's original point:
reasonably often, perf bugs get filed that reach a stalling point
sometime after we have a cleopatra profile and before we know what's
going on. Sometimes I can't reproduce (
https://bugzilla.mozilla.org/show_bug.cgi?id=1205110 ) and sometimes I
can ( https://bugzilla.mozilla.org/show_bug.cgi?id=1225900 ).

Unfortunately, usually (not always) I'm in the same place as sfink in
that I struggle to find something specific in the profiler to point at
and use to triage the bug, irrespective of whether the profile is mine
or provided by someone else. Some folks on the QA team ping me about
bugs every now and then, I take a look, and if I get stuck, too, I'm not
really sure where to go with them.

So the more general question I have here is: what to do with bugs that
get "stuck" like this? Obviously it doesn't scale to just point mconley
to all these bugs (awesome though he may be), or to do the same with
:overholt (though I notice that the second bug has since gotten at least
slightly more traction through his efforts - thanks!). Web perf bugs
don't seem to have a clear home before we triage what, specifically, is
causing the issue, which is often difficult on production sites with
minified JS, A/B testing, region-specific ads etc.. Am I just missing
something? Is there someone / a team / a flag that takes point on stuff
like this?

~ Gijs

Patrick McManus

unread,

Apr 18, 2016, 10:49:26 AM4/18/16

to Jonas Sicking, James Graham, dev-pl...@lists.mozilla.org

Default should probably be fail push rather than auto cancel.. But +1 to
opting into parallel push explicitly. I've certainly used that on a few
occasions.

But the PSA here may be the most important part..

On Apr 15, 2016 3:37 PM, "Jonas Sicking" <jo...@sicking.cc> wrote:

We could also make the default behavior be to cancel old pushes. And
then enable push message syntax for opting in to not cancelling.

/ Jonas

On Fri, Apr 15, 2016 at 10:19 AM, James Graham <ja...@hoppipolla.co.uk>
wrote:

> On 15/04/16 18:09, Tim Guan-tin Chien wrote:
>>
>> I wonder if there is any use cases to do multiple Try pushes of different

>> changesets but with the same bug number. Should we automatically cancel

>> the
>> old ones when there is a new one?
>
>

> Unfortunately there are legitimate uses for e.g. comparing the effects of
> two different changesets related to the same bug.
>
> On the other hand, without thinking too hard about the implementation
> details (which I am inclined to believe would be more complex than you
might
> expect due to missing APIs, auth, etc.), it seems like it might be
possible
> to extend |mach try| to prompt to cancel old pushes for the same bug.
>
>

William Lachance

unread,

Apr 18, 2016, 2:46:54 PM4/18/16

to

Treeherder did trigger a rather large memory leak which got fixed in the
browser a while back (Dec 2015), so please consider revisiting it if you
gave up around then:

https://bugzilla.mozilla.org/show_bug.cgi?id=1223445

(I've also since fixed the offending animation code that was triggering
some other problems)

It's possible we're just doing something dumb (it's been a while since I
last checked), but the bottom line is that treeherder is loading and
rendering a fair bit of data when it comes to presenting the jobs list.
This part of the UI *doesn't* use AngularJS (at least when it comes to
DOM modification), so I don't think that's the culprit if this problem
is still there.

If anyone feels like profiling and submitting patches, we'd welcome the
help. :) Getting a UI-only development environment going is trivial:
http://treeherder.readthedocs.org/ui/installation.html#installation

Will

[1] This jobs representation gets loaded and inserted into the DOM every
time you load a push (so 5x on the default landing page):
https://treeherder.mozilla.org/api/project/mozilla-inbound/jobs/?count=2000&result_set_id=30202&return_type=list

William Lachance

unread,

Apr 18, 2016, 2:54:30 PM4/18/16

to

On 2016-04-18 2:46 PM, William Lachance wrote:
>
> Treeherder did trigger a rather large memory leak which got fixed in the
> browser a while back (Dec 2015), so please consider revisiting it if you
> gave up around then:

> ...

> If anyone feels like profiling and submitting patches, we'd welcome the
> help. :) Getting a UI-only development environment going is trivial:
> http://treeherder.readthedocs.org/ui/installation.html#installation

Just realized that this last comment might make it seem like I'm passing
the buck, which wasn't my intention. :) I do have a lot of other things
to do, but if you continue to have problems using treeherder due to
memory leaks or whatever feel free to file a bug against treeherder and
needinfo me; I promise to give it my attention.

Will

Tim Guan-tin Chien

unread,

Apr 21, 2016, 7:32:31 PM4/21/16

to dev-platform

Any conclusions out of the discussion here? Try is getting slower as we
speak...

I would opt to less disruptive way at first, per what Brian Grinstead said.
We don't even have to implement interactive prompt first. If that makes
people cancel old Try runs more, great, if not, we could consider other
workflow-breaking solutions.

I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1266602 for this.

On Tue, Apr 19, 2016 at 2:54 AM, William Lachance <wlac...@mozilla.com>
wrote:

Jan Beich

unread,

Apr 25, 2016, 9:57:23 AM4/25/16

to Ryan VanderMeulen, dev-platform, Firefox Dev

Ryan VanderMeulen

<rvandermeulen-4eJt...@public.gmane.org> writes:

> Treeherder makes it easy to do this - just hit the little circle with an X
> icon on the right hand side adjacent to the "XX% - Y in progress" text
> along the top bar of the push. You will be prompted whether you really want
> to cancel all jobs on the push. Just hit OK and you're done.

Try usage requires level 1 access but cancelling one's own jobs via
Treeherder requires at least level 3. This may only matter if level 1 /
level 3 ratio of pushes is high.

What's the advice for such contributors? Skimp on testing for re-Try pushes?

signature.asc

James Graham

unread,

Apr 26, 2016, 9:02:27 AM4/26/16

to dev-pl...@lists.mozilla.org, firef...@mozilla.org

On 15/04/16 16:47, Ryan VanderMeulen wrote:
> I'm sure most of you have experienced the pain of long backlogs on Try
> (Windows in particular). While we'd all love to have larger pools of test
> machines (and our Ops people are actively working on improving that!), one
> often-overlooked thing people can do to help with the backlog Right Now is
> to cancel pending jobs on pushes they no longer need (i.e. newer push to
> Try, broken patch, already pushed to inbound, etc).

Based on a conversation yesterday, it seems that the features of |mach
try| are not well known. In particular it allows running only a subset
of tests in cases that you are doing an experimental push that you
expect to affect mainly one area of the code. For example:

mach try -b do -p linux64 dom

would run every test under dom/ on linux64 only. The other command line
arguments work like trychooser syntax. For technical reasons the
resulting tests will all be run in a single chunk (hopefully TaskCluster
will eventually allow this limitation to be lifted).

Gabor Krizsanits

unread,

Apr 26, 2016, 9:49:18 AM4/26/16

to Ryan VanderMeulen, dev-platform, Firefox Dev

As someone who was high on the list of try server usage for two weeks....
My problem was a test I tried to fix for both e10s and non-e10s, and it
timed out _sometimes_ on _some_ platforms even depending on debug/release
build. It was a whack-a-mole game by fiddling with the test and a complex
patch. I did stop old builds but I did not run only the test in question
but the rest of them as well because of the invasive nature of the patch
the whole thing was sitting on. Probably I could have been smarter, BUT...

What would have helped me a lot in this case and most cases when I rely on
the try server is the ability to push a new changeset on top of my previous
one, and tell the server to use the previous session instead of a full
rebuild (if there is only a change in the tests that's even better, no
rebuild at all) and then tell the server exactly which tests I want to
re-run with those changes (as it can be an empty set this can be used to
trigger additional tests for a previous push). This could all be done by an
extensions to the try syntax like -continue [hash]. As an addition this
follow up push would also kill the previous job.

Maybe there is already such functionality available, just I'm not aware of
it (I would be so happy if this were the case, and would feel bad for the
machine hours I wasted...), if so please let me know.

- Gabor

On Fri, Apr 15, 2016 at 5:47 PM, Ryan VanderMeulen <
rvande...@mozilla.com> wrote:

> I'm sure most of you have experienced the pain of long backlogs on Try
> (Windows in particular). While we'd all love to have larger pools of test
> machines (and our Ops people are actively working on improving that!), one
> often-overlooked thing people can do to help with the backlog Right Now is
> to cancel pending jobs on pushes they no longer need (i.e. newer push to
> Try, broken patch, already pushed to inbound, etc).
>

> Treeherder makes it easy to do this - just hit the little circle with an X
> icon on the right hand side adjacent to the "XX% - Y in progress" text
> along the top bar of the push. You will be prompted whether you really want
> to cancel all jobs on the push. Just hit OK and you're done.
>

> Killing off unnecessary jobs can have a significant impact on wait times
> and backlog, so your consideration is greatly appreciated!
>

> Thanks,
> Ryan

Gijs Kruitbosch

unread,

Apr 26, 2016, 4:04:33 PM4/26/16

to James Graham, firef...@mozilla.org

On 26/04/2016 14:01, James Graham wrote:
> Based on a conversation yesterday, it seems that the features of |mach
> try| are not well known. In particular it allows running only a subset
> of tests in cases that you are doing an experimental push that you
> expect to affect mainly one area of the code. For example:
>
> mach try -b do -p linux64 dom

What is the equivalent try syntax and how do I generate it from/in/with
mozreview, which is generally how I push to try these days?

~ Gijs

Mike Hommey

unread,

Apr 26, 2016, 5:54:44 PM4/26/16

to Gabor Krizsanits, arm...@mozilla.com, Ryan VanderMeulen, dev-platform, Firefox Dev

On Tue, Apr 26, 2016 at 03:49:11PM +0200, Gabor Krizsanits wrote:
> As someone who was high on the list of try server usage for two
> weeks.... My problem was a test I tried to fix for both e10s and
> non-e10s, and it timed out _sometimes_ on _some_ platforms even
> depending on debug/release build. It was a whack-a-mole game by
> fiddling with the test and a complex patch. I did stop old builds but
> I did not run only the test in question but the rest of them as well
> because of the invasive nature of the patch the whole thing was
> sitting on. Probably I could have been smarter, BUT...
>
> What would have helped me a lot in this case and most cases when I
> rely on the try server is the ability to push a new changeset on top
> of my previous one, and tell the server to use the previous session
> instead of a full rebuild (if there is only a change in the tests
> that's even better, no rebuild at all) and then tell the server
> exactly which tests I want to re-run with those changes (as it can be
> an empty set this can be used to trigger additional tests for a
> previous push). This could all be done by an extensions to the try
> syntax like -continue [hash]. As an addition this follow up push would
> also kill the previous job.
>
> Maybe there is already such functionality available, just I'm not
> aware of it (I would be so happy if this were the case, and would feel
> bad for the machine hours I wasted...), if so please let me know.

You can do that more or less with moz-ci. IIRC, the setup is detailed
somewhere on Armen's blog (CCed, he might be able to point you there)

Mike

Kartikaya Gupta

unread,

Apr 27, 2016, 1:53:32 AM4/27/16

to Gijs Kruitbosch, dev-platform

Running that mach try command with the additional --no-push argument
produces this mouthful:

try: -b do -p linux64 -u
crashtest,crashtest-e10s,mochitest-1,mochitest-browser-chrome-1,mochitest-e10s-1,mochitest-e10s-browser-chrome-1,mochitest-o,reftest,reftest-e10s,xpcshell
-t none --try-test-paths browser-chrome:dom chrome:dom crashtest:dom
mochitest:dom reftest:dom xpcshell:dom

Armen Zambrano G.

unread,

Apr 28, 2016, 9:24:49 AM4/28/16

to Mike Hommey, Gabor Krizsanits, Ryan VanderMeulen, dev-platform, Firefox Dev

It is possible for Buildbot jobs (not TaskCluster) with a python script,
you need to specify where to find the builds and test bundles are [1]

However, this is not optimal as it is.
You need to upload the new test bundle somewhere and point to that.

I've filed this as https://bugzilla.mozilla.org/show_bug.cgi?id=1268481
and tracking it under making try awesome meta bug.

[1]
https://github.com/mozilla/mozilla_ci_tools/blob/master/scripts/trigger.py#L87

--
Zambrano Gasparnian, Armen
Automation & Tools Engineer
http://armenzg.blogspot.ca

Armen Zambrano G.

unread,

Apr 28, 2016, 9:30:09 AM4/28/16

to Mike Hommey, Gabor Krizsanits, Ryan VanderMeulen, dev-platform, Firefox Dev

On 2016-04-26 05:54 PM, Mike Hommey wrote: