Happily some of these things are getting taken care of. Ted and
Johnathan have been rolling out a hook to disable landing on a closed
tree, I've pretty much finished a patch to make try server do unit
tests. I figure that once the mad dash to the finish line of this
release is over we might have chance to take the time to take care of
some more of these issues. So I think it would be useful to get some
idea of the things that we think need to be improved.
So I just want people to chime in with what ideas they've had. Ideally
lets have specific ideas. Just "Tinderbox sucks" isn't helpful, but
"Tinderbox would be better if ..." is.
I'll get the ball rolling with a few of my ideas:
The graph server would be far more useful if we could see the hg
changesets associated with the graph. Ideally selecting a range in the
graph and getting a table of the perf values against changesets would be
awesome, but just being able to see the changeset for each point would
speed up figuring out what might have caused the regression immensely
for me.
Reinstating the C links on tinderbox to show the changes between one
cycle and the previous cycle would help show what changesets might have
caused problems. This should be as simple as linking to a search range
in the pushlog.
The unit test machines take too longs to cycle. We could speed this up
by distributing the different types of tests across multiple builders.
The actual pull and compile tends to take about 10 minutes for the
slaves, so that is kind of wasted time by doing this, but it should mean
we can get an entire test run down to around 30-40 minutes rather than
the current 1hr 30.
If I may be blunt, the graph server would be far more useful if it was
fast enough that one could get/check data in a decent time, and for me,
it would be vastly more useful if we could display various Thunderbird
and SeaMonkey graphs there.
While the old graphs stuff had its own share of problems, it was very
fast to load graphs and linked from all relevant tinderbox trees
directly where they printed their data.
> Reinstating the C links on tinderbox to show the changes between one
> cycle and the previous cycle would help show what changesets might have
> caused problems. This should be as simple as linking to a search range
> in the pushlog.
Reinstating both C and D links on tinderbox would be very nice for
developers and testers.
Also, while you're mentioning pushlog, showing touched files in the
pushlog (see bonsai) would help in finding what checkin can be the cause
of a regression, and being able to search the pushlog for changes to a
certain directory makes it easier for people to watch e.g ported code
they should sync with or theme/locale changes they want to sync with
(bugs are filed for both those pushlog improvements).
> The unit test machines take too longs to cycle. We could speed this up
> by distributing the different types of tests across multiple builders.
That only makes sense if we test builds compiled by a different slave,
as else the build time on all slaves would make us pay more penalty in
sum than what we are doing now. What I see with SeaMonkey is that
mochitests take actually most of the time of the whole testerbox run.
Robert Kaiser
1) I filed a bug awhile back about reordering the tinderbox columns so
that those which require looking at to see issues (talos boxes) require
less scrolling. https://bugzilla.mozilla.org/show_bug.cgi?id=450988
2) The tinderbox blame column would be good to have back, of course.
3) hg annotate produces HTML output that's much bigger (hence slower to
download) than bonsai blame and much much slower to render (4 minutes vs
8 seconds for nsDocShell.cpp blame on Gecko trunk), while not being as
informative (e.g. no popup checkin comments). This is
<https://bugzilla.mozilla.org/show_bug.cgi?id=459823>.
4) In hg annotate, it would be really nice if I could look at a
particular line and easily go back to blame for "the revision before
this line was last modified". Maybe there is already a quick way to do
this, but I'm not aware of it. Note that just going up the parent chain
doesn't quite give what I want, because the parent of the current
changeset may be very very old, with lots of other changes that happened
in between, if the changeset graph looks like this
---B -- C -- D ---
/ \
A merge
\________E_________/
If all the changesets involved change the line in question, it would be
good to be able to see what it was at the various points. Not sure what
a good way to expose that is. :(
5) Maybe a better tinderbox display that by default makes it very small
so you can see the whole thing, but mousing over a particular cell makes
it bigger (using CSS transforms, maybe?) Not sure how easy or desirable
this is, since it makes it even harder to see the talos box numbers.
6) An easy way to tell whether a changeset has cycled on all boxes (such
that it makes sense to go look at the performance graphs). Not sure how
to best set this up.
7) An easy way to diff per-page data for talos runs would be wonderful.
Something like the output the sunspider test run diff produces, say.
8) A way to get stacks when builds crash during tests would be really
helpful. Maybe just a way for said builds to automatically send in
breakpad reports without user interaction?
9) Integration of dromaeo into our tinderbox tests.
10) Integration of some of Patrick McManus' "slow network" (or rather
"realistic network") tests. At the very least, it would be interesting
to run Tp against a server that simulates reasonable cable/dsl
bandwidths and latencies.
11) Separation of Tp from other talos runs, possibly (makes the
turnaround on the other tests a _lot_ faster, at the cost of more boxes).
-Boris
FYI:
I have a flock small pet projects locally right now, living in
http://hg.mozilla.org/users/axel_mozilla.com/django-site/. Largely
undocumented, partly untested.
Two are probably interesting in particular, "pushes" is a django app
that's cloning a series of pushlog dbs from hg.m.o, and adding
information about touched files. It does already support querying for
changesets touching particular files, and supports querying across
multiple repositories. The other is mbdb, short for mozilla build data
base. It models most of what buildbot knows about builds in a database,
filled in by nothing but a scheduler and a status plugin per master.
This would allow for some richer UI on top of our build data.
I created both of these to get a better handle on l10n repos and builds,
but I guess that other tasks in engineering could be helped with this, too.
Axel
Something I've just thought of that is a real pane on the Firefox
tinderbox is sideways scrolling. I frequently want to look at box x, on
a certain time, so I go down to the time, then look up and down, then
loose my place and have to go sideways again.
Having a table which could lock the left-hand 2 columns (i.e. date &
blame) and the top row (i.e. tinderbox name) to always be there, would
make this kind of lookup much easier.
This might make the scrolling mechanism a bit harder, but certainly
doing a spreadsheet style version may make things easier.
Standard8
> The graph server would be far more useful if we could see the hg
> changesets associated with the graph. Ideally selecting a range in
> the graph and getting a table of the perf values against changesets
> would be awesome, but just being able to see the changeset for each
> point would speed up figuring out what might have caused the
> regression immensely for me.
Further to this, I think it would be immensely useful if we could get
some sort of warning / indicator on the main tinderbox page of whether
or not the performance metrics we watch have improved or regressed;
even if it's just a straight percentage delta from the previous run
(ie: Ts 1305.23 (+4%)).
cheers,
mike
Locking the left is hard, because you don't want it to lock. You want
it to scroll up/down but not left/right. Locking just the top sorta
works, but not quite perfectly due to some bugs we have with scrollable
rowgroups. Here's a bookmarklet that I use to do just that:
javascript:var%20t%20=%20document.getElementsByTagName("table")[2];%20var%20b1%20=%20document.createElement("tbody");%20/*%20Header%20is%202%20rows%20*/%20b1.appendChild(t.rows[0]);%20b1.appendChild(t.rows[0]);%20var%20b2%20=%20t.tBodies[0];%20t.insertBefore(b1,%20b2);%20b1.id%20=%20"mytable";%20b2.style.height%20=%20(window.innerHeight%20-%20b1.getBoundingClientRect().height-%2025)%20+%20"px";%20b2.style.overflowY%20=%20"scroll";%20b2.style.overflowX%20=%20"hidden";%20location.hash="mytable";%20for%20(var%20node%20=%20b1.parentNode,%20parent%20=%20node.parentNode;%20parent;%20node%20=%20parent,%20parent%20=%20node.parentNode)%20{%20while%20(parent.firstChild%20&&%20parent.firstChild%20!=%20node)%20{%20parent.removeChild(parent.firstChild);%20}%20while%20(parent.lastChild%20&&%20parent.lastChild%20!=%20node)%20{%20parent.removeChild(parent.lastChild);%20}%20}%20document.body.style.margin%20=%20"0";%20document.body.style.padding%20=%20"0";%20void(0)
I haven't submitted it anywhere yet because it only "sorta" works.
-Boris
We can still not search for stack frames:
https://bugzilla.mozilla.org/show_bug.cgi?id=444749
and viewing Top Crash lists fails sporadically:
https://bugzilla.mozilla.org/show_bug.cgi?id=454640
Querying the crash data really helps me when working with crash bugs
and other developers have expressed this too.
This service has been more or less broken for the past 6 months now.
It should be a top priority to fix it IMHO.
Thanks,
Mats Palmgren
We should also consider management techniques, i.e. allow project QA to
link bugs to crashes and have them appear on the main summary lists.
Otherwise there is another level of management required.
Standard8
This is https://bugzilla.mozilla.org/show_bug.cgi?id=431372 but I
expect you already knew that.
-Jeff
> Please fix http://crash-stats.mozilla.com/
>
> We can still not search for stack frames:
> https://bugzilla.mozilla.org/show_bug.cgi?id=444749
If you refine your search a bit, this shouldn't be a problem.
Searching for all crashes in a branch will timeout, sure. Searching
for all crashes in a product && branch won't. Is there something else
that's not working for you?
> and viewing Top Crash lists fails sporadically:
> https://bugzilla.mozilla.org/show_bug.cgi?id=454640
This has been fixed for a few weeks now (it was resolved in a
different bug though).
> This service has been more or less broken for the past 6 months now.
> It should be a top priority to fix it IMHO.
And now it's actively being fixed. If you're seeing things that are
common and broken, please file bugs (or search; most are on file). I
think crash-stats is mostly fixed now, though there are features
missing that are actively being worked on.
-Sam
> We should also consider management techniques, i.e. allow project QA
> to link bugs to crashes and have them appear on the main summary
> lists. Otherwise there is another level of management required.
That's bug 411357 and bug 464934 and they're both being targeted for
within the next three months.
fwiw, we've been using https://wiki.mozilla.org/QA/Topcrashes to track
topcrashes for Firefox.
-Sam
I'm not sure this quite falls under what you are asking, but I'll raise
it anyway:
In bugzilla, provide a field alongside patches where we could enter the
appropriate repository/changeset id once the patch had been checked in.
This would therefore provide appropriate consistent indications, direct
links etc. Editable from the bug page (not attachment page) would be a
bonus!
I've already filed this, but no activity yet:
https://bugzilla.mozilla.org/show_bug.cgi?id=455295
Standard8
* Get the blame column back into tinderbox.
* Graphs for the most important numbers on the top of the tinderbox page
so we more easily see performance regressions.
* Have pools of machines that run various tests so that the regression
windows are smaller. I.e. even if it takes 5 hours for the unit test
machines to cycle, if we have 5 of them doing overlapping tests the
window of patches that needs to be backed out is still just 1 hour.
* Show assertion counts for all test runs so that we can try to drive
the number of assertions down to 0. Applies to both talos tests and
unit tests.
* Change the default links in HG annotate to not just show the diff
that changed that line, but also the commit comment that went along
with it.
I.e. change the link from
http://hg.mozilla.org/mozilla-central/diff/59040f379535/content/base/src/nsXMLHttpRequest.cpp
to
http://hg.mozilla.org/mozilla-central/rev/59040f379535
Better yet if this information is displayed in a popup a'la bonsai
did.
* Ability to jump to the revision previous to the current one. I.e. I
often see something like
hg@1 95 #include ...
hg@1 96 #define ...
jonas@16665 97 class Foo { ..
hg@1 98 Foo();
And I want to see want to see why line 97 was changed and most likely go
to the revision before the change was made. With greasemonkey i could
make this work very well in bonsai. Haven't yet been able to figure out
how to do so for Hg, and of course ideally greasemonkey hacks shouldn't
be needed.
* We need to compress the data on tinderbox. Something like jesses
tidybox greasemonkey script is a great start.
/ Jonas
We also need sensible pushlog date formatting in the query fields.
Pushlog list display uses:
Sat Nov 08 01:06:43 2008
Tinderbox and bonsai use:
yyyy/mm/dd 06:48:03
Pushlog query uses:
mm/dd/yyyy
Its just taken my 5 attempts to work this out again.
I guess I need to go file a bug if there isn't one already.
Standard8
https://bugzilla.mozilla.org/show_bug.cgi?id=454995
https://bugzilla.mozilla.org/show_bug.cgi?id=455000
https://bugzilla.mozilla.org/show_bug.cgi?id=455369
are date related bugs, covering your questions, I think.
Axel
A while back I added a "json" interface to Tinderbox server, to enable
people to make rapid improvements to the UI by decoupling it from the
server code (this was suggested by Vlad in fact). Unfortunately it did
not really come out the way I would have liked; it's basically a dump
of the internal structure of Tinderbox server, and hard to work with.
However I still believe that Tinderbox server is an impediment here,
as I've seen many attempts at making improvements which are just minor
UI improvements fall flat, and the amount of work needed to really
decouple data handling, storage, and UI aren't really worth it. This
is my opinion; I'd love to see someone try, although some of the
attempts so far have made many people very sad :(
I've been working on a project, codenamed Millicent, to take over the
very critical jobs that Tinderbox server performs, which I believe are
not really appropriate for Buildbot (I think buildbot should go down
this same UI/backend decoupling in fact, but that's for a different
day):
* receive data via email (or HTTP/JSON) from build/test machines
** backwards compatible with Tinderbox client/server
* store data using database
* serve data via HTTP/JSON (and HTML)
I have some of the above done. Here's a mockup:
http://roberthelmer.com/mozilla/mockups/millicent/
Code is here:
http://hg.mozilla.org/users/robert_roberthelmer.com/Millicent
This is actually a Javascript app, pulling JSON from a static SQLite
DB via a Python app (using SQLAlchemy, so the DB is not hardcoded).
You can see some of my UI philosophy in this:
* collapse into as few columns as possible
* visually express recent perf history (using JQuery sparklines in
this example)
* provide links to drill down to more info
I really like bz's suggestion in this thread to be able to expand info
in this view, that's actually what I have in mind here. It will be
possible to expand this view into something equivalent to a full
Tinderbox waterfall.
The idea is that you get the most compressed info by default, and can
keep asking for more info as desired. I'd like to have a REST
interface for the UI too so you can easily bookmark just the level of
detail desired.
Also, Beltzner's suggestion of using some kind of ticker for perf is
spot-on IMHO, although I think just simple +/-n is not enough; you
should not have to comb through a ton of info to see if you missed a
big perf drop. I think a sparkline would be a better tool for this.
Obviously Millicent is not a drop-in replacement for Tinderbox server
today, but I intend it to be. I'd like it to be able to take in info
from other systems as well (but goal #1 is Tinderbox compat), and make
it trivial to mash with other web services (hgweb, graphs, bugzilla,
etc) instead of requiring all data to be in one system.
http://sites.google.com/a/chromium.org/dev/developers/testing/tour-of-the-chromium-buildbot
I think it has a lot of the informational overload problem of
tinderbox.mozilla.org, in terms of columns etc. but there are some
very good ideas here.
Seems like we need to blog or post more about pipedream ideas?
I think there are some common design principles here, like seperating
the html interface from the master, or using some ajax to get details
for builds. And even more basic, that neither the tinderbox code nor the
buildbot waterfall are the answer to the question.
I'm personally not so fond of the idea of mailing logs around, we seem
to jump through a whole lot of loops to get data back and forth. But
then I tend to just take buildbot for granted, probably more than you
guys do. Thus I currently focused on getting the buildbot status out
into a db. I kinda assume that logs can be just shared via a remote
filesystem. Not that one couldn't rsync or move them over.
Anyway, if dropping tinderbox is the answer, we should probably create a
real project for doing just that.
Axel
Not that it's good to have something still that is basic HTML and shows
a quick status - I very much like being able to look if everything's
alright from a mobile device, for example (but I agree that works much
more easily with the SeaMonkey tinderbox waterfall than the Firefox one).
Robert Kaiser
I just happened to be working on this project, and I thought it
relevant to the thread. Feel free to ignore or use it, I actually
wasn't planning on announcing it so soon. There's obviously a lot of
overlap between Tinderbox, Buildbot and Millicent, I happen to need
something more with
As a more realistic near-term suggestion, I say take what the Chromium
project has done with Buildbot waterfall, that pretty much makes
Tinderbox server redundant AFACT.
I agree 100%. Build status should be accessible as HTML and JSON at
the very least, might be nice to have other formats as well (text,
rss, etc).
I think that the current waterfall-style display is not a very useful
format for pretty much any usecase (especially if you are building
continuously and not on-checkin), unless you have very few builders.
I have experimented (with some success) adding JSON and XML output to
Buildbot a while ago, and it provides cleaner ways to add alternative
views, so replacing Buildbot with Tinderbox server outright is pretty
doable.
Not sure if we can replace tinderbox without having something to star
the builds, sadly.
Axel
That should read "replacing Tinderbox server with Buildbot"... Thanks
timeless
for pointing this out. I don't switch horses *that* fast :)
Refining the search doesn't seem to help.
Product: Firefox
Branch: 1.9.1
Version: Firefox 3.1b2pre
Platform: Windows
Stack signature contains: _de_casteljau
Date range: 1 months
After 20 minutes there's still no response.
>> and viewing Top Crash lists fails sporadically:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=454640
>
> This has been fixed for a few weeks now (it was resolved in a different
> bug though).
Thanks, it seems to be working now. Although some versions lacks proper
stack symbols, e.g.
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.1b1pre
/Mats
I just realized that there's a hidden requirement here that needs
spelling out.
What I really want, given a line in annotate, is easy access to:
1) The change that last changed that line (linkified bug number,
checkin comment, diff of the changes made in that checkin to
this file).
2) The annotated file from before the last time this line was changed.
In particular, if the line was last changed before we imported into hg,
I _still_ want easy access to the information (presumably using CVS diff
and CVS blame to back it up) without having to switch to a different MXR
branch, etc, etc as I have to right now.
The CVS integration is a much higher priority for me than fancy
branchiness handling. Right now I end up having to guess whether to
start looking at hg blame or CVS blame when I need blame for a line. I
get it right about 80% of the time in code I'm familiar with, but the
accuracy is going down, and I'm running into more and more cases where
there are cosmetic hg changes and I end up having to dig from the hg
history into the CVS history, which is made even more annoying by the
overlap between the histories.
A unified history view for the two repositories is something that was
considered a must have for the switch at some point...
It might be that the right solution here is to stop using hgweb annotate
and use bonsai for the hg blame view, by the way; I believe timeless has
a test bonsai installation that does just that.
-Boris