In general, 3.1 wants to be a very light platform-update / fixup release
with very little product-y aspect to it. After some discussion with
davida and clarkbw, we'd like to push forward with the feature roadmaps
fairly quickly, in the interest of prepping for the post 3.1 work.
We'll start driving that separately shortly. In the meantime, my
suggestion for folks doing code work is to work on polishing off rough
edges from 3.0, along with landing small, scoped features that were
intended to make 3.0 but didn't.
Here's a road map that's based on 1.9.2 for Thunderbird 3.1 that I think
could work, and it would start next Monday:
http://www.flickr.com/photos/dmose/4151232741/
If you have trouble reading it, remember Firefox's zoom feature in the
View menu. :-) Each tick on the timeline is intended to be 5 working
days. There are effectively 3 milestones (M1, M2, and RC1), and the last
20 working days are baking time for RC1 and an RC2 if necessary. Note
that string and feature freeze is proposed to happen when the tree
closes for M2, and RC1 is for bug-fixing only.
Things to keep in mind:
* Any UX changes with non-trivial string implications (eg > 20 strings
or so) early in either M1 or M2, and localizers should be notified with
a message to dev-l10n at that time of landing.
* Reviewers will be requiring changes to mail/, mailnews/, editor/, and
directory/ to have automated tests unless there are specific, cogent
reasons why that's impractical or wrong-headed.
* In order to help keep the number of live branches at any given time
manageable, it would be ideal if we could follow Firefox 3.6 and 3.7's
lead and do a "minor" update (automatic with no way to opt-out) from 3.0
to 3.1 fairly quickly after 3.1 is released, so that we could then stop
supporting the 1.9.1 branch entirely. The fewer changes (particularly
API and front-end changes) that we allow into 3.1, the more likely it is
that we'll be able to do this.
Thoughts on all this?
Dan
rkent
* I suspect this will vary heavily milestone-to-milestone, depending
specifically on who is signed up to work on what.
* My guess is that since 3.1 is not feature-focused, reviewers will find
it easier to interleave review work with other work than they otherwise
might.
* It would be interesting to talk to some Firefox contributors who have
been around for a few years as they've shortened up their cycles to
understand their experience. In particular, philor might have some
thoughts there...
* I haven't yet taken the time to chime in on the extension and branch
development threads because I feel like it's important to get 3.1
rolling before spending more time discussing the longer-term stuff.
That said, I do intend to get back to them, and I think there's a bunch
of promise in aspects of both of those models that we'll want to explore
going forward, specifically to help alleviate issues (like this) that we
find as we speed up our cycles.
Dan
I have some bigger changes waiting or in the pipe:
* Login code overhaul, to fix Kerberos
Bug 525238, precondition for bug 339050 (fix wrong error) and
524698 (show OS password dialog)
* Rewrite of message header pane as XHTML
During the subject wrap work, I though it would have been faster
to just rewrite the whole thing. There are still serious problems,
mainly bug 520249 and friends. I considered the latter a hard
release blocker for 3.0, I think it's unacceptable that we're
unable to show the full Author/From (including hiding the star)
and similar things.
I have time allocated for that - if nothing else happens, I expect
to be able to start working on that in 1-2 months from now. Given
that it entirely replaces the header pane, that means double work
for any changes on that during that time, so the branch can't live
for half a year. I doubt that I can just start in 6 months either,
as that depends on my work loads and other priorities.
If you really want to keep 3.1 limited to bugfixes, then maybe branch it
from 3.0 instead of trunk. You'll have considerable management overhead
to decide (for 6 months) what goes on the 3.1 and what not. Actually, if
you use trunk for 3.1, you'll have the same discussions, in other
places/forms, just devs have no other option and end up frustrated.
In any case, I don't think a development model where the whole project
cannot make bigger changes or new features over a period of one month or
longer is workable. It's disruptive for those wanting to make changes.
Even more so when such a period is over 6 months. So, my suggestion is
to open the 3.1 branch now, and leave trunk open for real meat changes.
Ben
This can and should start its life as an extension.
Andrew
Or feature branch. But that doesn't make any difference.
We either cannot make any changes to the old header pane anymore, or we
have to do them all twice. Furthermore, I want to be *done* within a
reasonable timeframe, as I have to move on to other things.
Maybe you and Kent both didn't notice the green line on the left of the
picture, cause it looked boring =). Think of it as mozilla-central
trunk, and think of the line on the right as 1.9.2. By design it'd be
(likely almost) always open, which I think was one of your points
earlier =).
We'd take feature work on trunk always, modulo the (real) review
bandwidth issues, and assuming test coverage (otherwise trunk will
degenerate and the _next_ release will be impossible).
(One thing which we haven't discussed yet but maybe should is whether we
should use the same policy that m-c does, which is that you check in on
trunk first, then move to branch.)
--david
Should I imagine M1 and M2 as real milestone names? Something like
"Thunderbird 3.1 M1" or "Thunderbird 3.1 M2"? I'm not happy to see names
like this because people (especially press) don't understand what it is.
I don't want to see in local press the same situation as is with Firefox
3.6 Beta revisions. Maybe MoCo calls this betas as revisions but in
reality I see titles like "Firefox 3.6 Beta 4 was released".
Regards,
--
Pavel Cvrček <pcv...@mozilla.cz>
http://www.mozilla.cz/
Looking at the graph you posted
<http://www.flickr.com/photos/dmose/4151232741/>, it seems you intend to
do exactly that.
So, please ignore my posting.
Ben
Dan
Yes, indeed. In fact, I didn't look at the picture at all, only read the
text.
And I can see only now that you say it that it's green :).
> By design it'd be (likely almost) always open, which I think was one
> of your points earlier =).
Yes, thanks! Great.
> (One thing which we haven't discussed yet but maybe should is whether
> we should use the same policy that m-c does, which is that you check
> in on trunk first, then move to branch.)
Sounds reasonable. I always did that anyways.
Ben
I just mean that there's no way things land on branch w/o something
related also landing on trunk, or already having been on trunk. We
never want branch to have more "features" than trunk, is my guess.
-da
What I meant here by "blocking" is the formal "blocking" status for the
bug. This term is overloaded in practice. At the beginning of a cycle,
it means "things that we, the drivers, are determined to see added to
the next release" but then it morphs into "we will not ship until this
bug is fixed".
In either case, it
> would be helpful if you could describe what specifically you're trying
> to avoid and/or find frustrating and why, just to ensure that we're on
> the same page...
Just to reiterate my comments on the extension thread, for TB3 I took
advantage of the long development cycle, so I did not feel significantly
restricted by the process. But as you shorten up the process, it is
critical that you recognize that there are contributions coming into TB
that are not "blockers" are recognized by drivers. The open window for
getting in such patches ("informally thawed" is the term that I use
here) should not be optimized to zero. And the thawed state requires not
only that the tree be open, but that reviewers have sufficient slack
that they can look at bugs that are outside of the "blocking" list.
>> The informal thaw exists when the tree is not only formally open, but
>> the critical reviewers that I need (bienvenu, standard8, neil,
>> sometimes clarkbw) are not so heads down with work that reviews are
>> effectively frozen.
> I don't know that that's really possible to guess from where we're
> sitting now.
What I am trying to subtly pressure you to do is to plan (not guess)
when the thawed times are, and then try your best to not overload the
reviewers with blockers during that time period. Otherwise, I fear you
will optimize the thawed time to zero.
I suppose I could just go with the flow, and try to get my projects
recognized as blockers. It just does not really seem necessary unless
the thawed time gets optimized to zero.
rkent
In other news, Motorola just released "Motorola Milestone" in Europe,
its Android-based phone known as "Droid" in US.
Dan
Dan
* It'd be awful nice to have a metric or two here so that we could
quantify what we're talking about a bit. Would review turnaround time
be the right metric to capture your concern?
* I think the idea of structurally building in review-bandwidth to our
cycles somehow is interesting and worth exploring. One could imagine
asking folks to spend a few days setting aside their coding and catching
up their review queues each time there was a code freeze. Is that the
sort of thing you had in mind?
* I'd be very interested in thoughts from the more heavily-loaded
reviewers. Guys?
Dan
What other choice is there? None that I see. We all agree that shorter
cycles, like you are proposing, are really important. With the current
processes, that means that there will be more frequent formal freezes
and deadline pushes.
> I think your concern is entirely reasonable. It also feels like the
> system that we're touching has enough moving parts that trying to
> optimize review-turnaround without actually knowing what it feels like
> to operate under the new rhythms and where the hot spots may be
> premature. Some thoughts:
>
> * It'd be awful nice to have a metric or two here so that we could
> quantify what we're talking about a bit. Would review turnaround time be
> the right metric to capture your concern?
Probably, yes.
By my current estimate, my productivity in working on core code is about
1/10th that of bienvenu and standard8, whose work is the closest in some
ways to mine. A few months back, I tried to understand why that is. OK,
they both have a lot more experience with the code base, so I waste time
learning things for new bugs that they already know. But that is not the
whole story. The key contributors have review loops, reviewing each
other's code, and they have a great incentive to give very short review
times to each other, even during crunch times. And they do. My review
times on average are much longer, and that is a significant drain on
productivity.
I have to continually reiterate that these are not intended as
complaints against any individuals. They are behaving exactly as I would
behave under similar constraints and pressures. And there are times,
when we are unthawed, where that is the correct behavior pattern.
But it would be nice if we could all agree that there are significant
periods of time when we are thawed, and during those time periods review
work takes priority over development. That is really the main thing I am
asking.
>
> * I think the idea of structurally building in review-bandwidth to our
> cycles somehow is interesting and worth exploring. One could imagine
> asking folks to spend a few days setting aside their coding and catching
> up their review queues each time there was a code freeze. Is that the
> sort of thing you had in mind?
In the review loops, where people are being most productive, review
delays from my limited survey are hours, not days. Your proposal also
implies there would also be "a few days" when they ignore the review
queues. That is not optimal for productivity of the people being reviewed.
Maybe my requests are unreasonable, and it is best for the whole project
if the key developers are not expected to give priority to reviews. You
are free to decide that if you want. You just need to understand that it
has a significant effect on the productivity of non-core developers.
rkent
Totally reasonable requests. I think that we do need to look at
everything when thinking about how to release more often, including
review processes, whatever management happens, etc.
In regard to your particular point, I think that we should do a bunch of
things:
1) spread the review load across more people
2) maximize the automation that can happen before it gets to a human
(jst-review-bot, l10n-string-detector, etc.)
3) push hard on the test framework so that more patches have more
passing tests from the get-go
4) work on modularizing our code. It feels to me like bitrot happens
more often than it could.
5) more proactively deal with reviews. At this point reviewing code
that's not foremost in one's mind takes a lot of mental effort and
discipline (speaking as a non-reviewer), and I can imagine that to do
that well, taking into account doing right by the codebase, by the
project stability, all while being diplomatic and careful, is a
significant challenge. I'd like to explore ways of mitigating some of that.
Some of my rough thoughts:
- detect lagging reviews and deal with them proactively. I'm working
on getting some stats, but I suspect that a 3-week old patch is almost
guaranteed to be r-'ed. A computer can do that, impersonally, and spare
the reviewer the angst. (while also keeping track of who that affects,
so that we can manage that, both for submitters & reviewers).
- crazy idea: introduce some fake current/point system to reward
behaviors we want to reward. This would apply both to reviewers and
patch submitters.
5) figure out how to allocate peoplepower across prioritized reviews,
non-prioritized reviews, prioritized dev activities, other tasks.
6) other things I haven't thought of...
[There are other, unrelated implications for every part of our system,
including which builds we want nightly testers on, how we facilitate
getting more users on those channels, etc.]
--david
(Nice!)
> * It'd be awful nice to have a metric or two here so that we could
> quantify what we're talking about a bit. Would review turnaround time
> be the right metric to capture your concern?
For me, the metric is the time - both calendar time and work time - from
the point when I think I'm done until I'm really done: reviewed, checked
in, no more work (minus the parts where I really messed up and had bugs).
> * I think the idea of structurally building in review-bandwidth to
> our cycles somehow is interesting and worth exploring.
Yes, that's necessary.
> One could imagine asking folks to spend a few days setting aside their
> coding and catching up their review queues each time there was a code
> freeze.
Actually, a few hours per day would be better. For me, it's critically
important to get reviews shortly after I attached the patch, because
then the memory (and build and test setup) is fresh. If I have to get
into it after a week or even a month, I basically have to start from
scratch, which is a considerable burden and the reason why many patches
are dormant.
Ben
No. Names like alpha or beta are ok. One example:
Mozilla Corporation says:
"Firefox 3.6 Beta (revision 4) was released"
Press says:
"Firefox 3.6 Beta 4 was released"
As you can see press didn't understand what "revision" is. Even Mike
Beltzer sometimes talks about "Firefox 3.6 Beta x". When I asked about
it on IRC answer was that "revision x" isn't the same like "beta x".
Now I see something like M1 or M2 and I'm a little bit scared how I will
describe it on our website. I think that "alpha" and "beta" will get
more press news than M1 or M2.
But maybe I'm just too pessimist. I just hope that your roadmap is
realistic and I won't write on local website something like "Thunderbird
3.1 was delayed" more times and new milestones was added.
The Thunderbird 3.0 branch now explicitly requires it:
Standard8
I think this is where branching 3.1 earlier would actually help - we've
touched on it in other parts of the thread, but at times in the 3.x
betas/RCs I've explicitly left reviews alone to work on other things
(like driving, infrastructure improvements) because I've known the
repository is going to be closed for x days and I can catch up at the end.
Having a trunk that is pretty much always open would encourage me to
keep the reviews going through those periods as folks can still land new
patches.
I'm not saying we should necessarily branch 3.1 straight away - IMO we
need to get builders set up and try and close down any trunk versus
1.9.2 issues first before we do, but certainly branching by M2 would be
worth a try.
> By my current estimate, my productivity in working on core code is about
> 1/10th that of bienvenu and standard8, whose work is the closest in some
> ways to mine. A few months back, I tried to understand why that is. OK,
> they both have a lot more experience with the code base, so I waste time
> learning things for new bugs that they already know. But that is not the
> whole story. The key contributors have review loops, reviewing each
> other's code, and they have a great incentive to give very short review
> times to each other, even during crunch times. And they do. My review
> times on average are much longer, and that is a significant drain on
> productivity.
In the recent months I've been trying to avoid any favouritism,
especially when processing big backlogs. I typically try to go through
my review queue from the oldest to the newest requests. However there
are times when I just have a little bit of time available, and then I'll
typically choose simple patches. At other times, blockers will tend to
get reviewed first as they are priorities, but that doesn't mean to say
you shouldn't just put patches in my review queue anyway.
The other thing I need to get better at is seeing if I can push reviews
to other people more and share them out.
> In the review loops, where people are being most productive, review
> delays from my limited survey are hours, not days. Your proposal also
> implies there would also be "a few days" when they ignore the review
> queues. That is not optimal for productivity of the people being reviewed.
Agreed it isn't optimal, but there are times when working on big complex
patches (e.g. the password manager changes we did) where I've just had
to work on them and nothing else because otherwise I would loose the
continuous trains of thought and have to keep on starting from scratch.
That's probably a bit of a rare case, but it is one time when we need to
consider sharing reviews around a bit more.
Standard8
I have some mozilla-specific understanding of alpha and beta, which is
alpha is non-localized and beta is localized, which maps the target
audience of those releases, i.e., technology testers vs app testers.
I don't think that picking up an old name (M13, dude) really helps us in
messaging this out, and can come with a loss of messaging internally.
Axel
I see a week of string freeze for each milestone? Sounds good.
The timing of M1 looks tricky from an l10n point of view. Most folks
will be mostly AFK in the week of new year.
Axel
From all I heard in this week's meetings, it sounded to me like FF 3.6
will be a "major update" (with prompting) over 3.5 after all - I might
have heard it wrong, though.
Also, if TB 3.1 is planned to not introduce new features, we probably
need to branch very soon and make the 1.9.3-targeted tree enable landing
of new feature work. This would also support Thunderbird doing a
1.9.2-based 3.1 release (somewhere mid-way between FF 3.6 and FF 3.7
releases) while SeaMonkey is going for a 1.9.3-based 2.1 release to be
shipped in (early) summer, as much in sync with FF 3.7 as possible.
Robert Kaiser
Dan
Yeah, I agree, but Simon should weigh in.
We should probably do some laid-back outreach for that milestone. Stuff
to talk about next week, I guess.
Axel
Dan
Dan
I'm not actually convinced there's a good one-size-fits-all answer here,
given that everyone has different work styles. That's why I'd much
prefer to have real data so that we can draw much more specific
conclusions about how to help specific sorts of reviews, patches, tests,
modules, etc.
> Maybe my requests are unreasonable, and it is best for the whole
> project if the key developers are not expected to give priority to
> reviews. You are free to decide that if you want. You just need to
> understand that it has a significant effect on the productivity of
> non-core developers.
I find your requests quite reasonable. One of the things that's really
critical for the long-term success of Thunderbird is that we leverage
our decentralized development model, which means (in part) focusing on
making it as easy and as rewarding as possible to contribute
functionality that belongs in the core.
Dan
Well firstly I think Robert's comment about not introducing new features
isn't quite right. We're likely to introduce some small ones, just not
significant - for example, send in background is one feature that could
be relatively simple to finish off.
It is a bit hard to determine in which areas Robert means by new
features. Is this Thunderbird, SeaMonkey, MailNews or all?
With the current plans of extension/branch based development I can't see
Thunderbird having any significant non-3.1 features ready to land for a
month or two. I also doubt that SeaMonkey feature work will
significantly affect MailNews (though I could be wrong).
So unless we've got feature work I don't know about going on, then I
certainly believe we're not in a rush to branch.
We need a bit of time to stabilise both trunk and 1.9.2 and work out
where we need fixes on one or both (we're just starting to set up build
infrastructure for 1.9.2). IMO it is easier to do this if they are not
branched.
I'm now not actually sure what Robert is concerned about wrt branching.
As per your diagram I would call January "very soon", especially as
December tends to be quite quiet.
I think as per your diagram January/M1 is possibly a bit early
(considering where we are now), but at the same time would give us the
possibility for landing more features on trunk before branch.
February/M2 is certainly the latest I think we could reasonably branch
and would ensure that we have a stable branch.
Standard8
I just derived things from what I read in this thread, and yes, I
understood the small/large feature difference there. What I don't know
yet is if and how any SeaMonkey 2.1 work could affect mailnews or not,
we're not much in that work stream yet, while we all took some time to
breathe and figure out the major items in 2.0 feedback and how we can
improve things there in future updates.
I mostly am concerned in how such a policy on Thunderbird 3.1
development affects what can go into comm-central and how likely we are
to have a stable enough mailnews base when trying to do a 1.9.3-based
SeaMonkey 2.1 release as much as possible in sync with Firefox 3.7,
probably in (early) summer of 2010.
> February/M2 is certainly the latest I think we could reasonably branch
> and would ensure that we have a stable branch.
I think that should run well enough with everyone, given that 1.9.3 is
supposed to be branched in January and planned to go for a release in
June (not counting any possible slip, of course).
If Gecko/FF can hold their plan of roughly 6-month release cycles, I
hope we'll get Thunderbird and SeaMonkey closer in sync with each other
and Gecko within a few of those cycles, for now, while we're not there
yet, I'd like to keep the problems we're causing for each other as small
as possible - and I hope that works out nicely for all of us.
Robert Kaiser
Dan
Yes, I think the human element on both "sides" (author and reviewer) is
very easy to underestimate. Intentionally avoiding the details of
technical and process issues and ideas previously posted ...
Memory affects both the author and the reviewer, perhaps moreso if the
process for a given patch is very iterative and long like on a large
"low value" bug. There is also a time-value to people's enthusiasm - a
week or a few months later a patch author may not have the enthusiasm or
time for follow up. Then there is a subtext in rkent's comments of "the
great unknown" ... will someone get to my patch, how shall I manage my
time and projects while waiting, etc ... akin to waiting for a doctor's
diagnosis :)
So perhaps shorter is not the main issue, although shorter helps.
Perhaps predictability being integral to the process is the bigger
issue. Things which *smooth out* the delivery of reviews over time and
make it more transparent and public. These would over time, I think,
also affect the speed of delivery as people's habits or expectations
change. Previously mentioned are workflow/process, manpower, automation,
individual work habits, management, goals, prioritization, valuing
reviews, (how about a junior reviewers corp?), etc, all of which can all
contribute.
Of course, whatever solutions evolve should impact the goals of
releasing more often, with significant community contributions.
As this relates to big feature extensions developing out of band as it
were, is there expectation that reviewer load might be significantly
lowered? Might it be even greater? Wouldn't there need to be help that
aids the extension development process similar to what happens in
reviews? In which case extension development may face the same perils
as patches do in the current process.
In my "extension driven development" environment, an extension would
have a significant life, with a significant user experience base, before
it is even considered to be upgraded in status beyond AMO. During that
period there is no review load on mailnews developers.
At the point where the extension is elevated in status, there would be
some requirements to meet, and a code review would probably be one of
them. But I don't think this need be the detailed, line-by-line analysis
that we currently do. Another hurdle would probably be automated
testing, and I would expect that testing to be the main quality control
method, not detailed code review. And after the extension is elevated,
although it would make sense to have review of changes, that should
avoid the super-reviewer step. Plus the extension author would probably
be an automatic reviewer if anyone else wanted to patch the extension,
so there would be some dispersion of the review load.
If you don't accept my proposal and definition of "extension based
development", so that instead you follow existing practice of fulling
merging accepted work into the core code base, then you still really
have the same review requirements as at present. A large extension is
then no different than a large patch.
rkent
Totally understandable and reasonable.
And it's also exactly that effect which makes the long review queues so
expensive for patch contributors. I fully second Kent's descriptions.
How about the "coding standards"?
Isn't all those changes would require a very strict policy about a
coding standard?
If there is one in place, I would be very much interested to know how
stringent this is followed by coders, for core or patches and
extensions. How about
https://developer.mozilla.org/En/Developer_Guide/Coding_Style??
With [mozilla "coding standard"] Goggle gives a very interesting
feedback at:
http://iloapp.mikek.dk/blog/developer?Home&post=46
How about well documented Thunderbird API?
As far as I remember there are few infos about TB3 changes (eg. TB/AB),
but a complete listing / reference of the whole "mail system" picture?
g�nter
As said, I second Kent's description. Thunderbird project has gotten
dramatically better, but is still less than 'good'.
> I'm working on getting some stats, but I suspect that a 3-week old
> patch is almost guaranteed to be r-'ed. A computer can do that,
> impersonally
Are you seriously suggesting to automatically hand out r- for ignored
patches? I hope I misunderstood you.
Quite the opposite, a patch waiting 2 weeks should mean an automated
'slap' on the reviewer and automated escalation to another human, so
that it's reassigned to another qualified reviewer.
> 5) figure out how to allocate peoplepower across prioritized reviews,
> non-prioritized reviews, prioritized dev activities, other tasks.
That means that it's only getting harder for Kent who wants to get
non-priority patches in, as he specifically said.
Automation could help some (let me add whitespace issues and similar
nits), but I don't think it would speed up reviews, as these things are
easy to comment on, it's not what costs reviewers time, I'd think.
I think we just need more resources (meaning more reviewers), more
attention by reviewers to the review delay.
I'd like to mention at this occasion that bwinton has helped me a great
deal with reviews right before 3.0 RC1. He was very cooperative. If we
hadn't had him as reviewer, we might have a considerably worse 3.0,
because some patches probably couldn't have landed.
Ben
I do. We need less of a hodge-podge of dissimilar modules that TB
currently is (mostly due to being developed over 15 years, but by
different authors).
And that's also my concern: Once an extension has had a significant
life, I don't think that's the point where to make considerable changes.
But they have to be done sometimes, sometimes the code design or UX
design needs to change considerably. I think that should be flushed out
before even significant code is written, much less used. I don't think
that writing a whole module by yourself and then dumping it on TB is the
right process, as no major changes or changes to how it works in general
are realistically possible anymore. That's the "take it or leave it"
situation, and that's always arkward.
> Another hurdle would probably be automated testing, and I would expect
> that testing to be the main quality control method, not detailed code
> review.
I disagree there, too. You can't find all security holes by just
testing, for example. (And some security holes are in the design. I had
this at a former company, where we had to throw away essentially the
whole code.) Similarly, some bugs are much easier found in code than by
testing and debugging.
> [Afterwards] the extension author would probably be an automatic
> reviewer if anyone else wanted to patch
Agreed.
> If you don't accept my proposal and definition of "extension based
> development", so that instead you follow existing practice of fulling
> merging accepted work into the core code base, then you still really
> have the same review requirements as at present. A large extension is
> then no different than a large patch.
You can't just skip review just because you developed something in the
form of an extension vs. the same code in form of a patch. That doesn't
change the risk of the code.
Ben
whoohoo
<http://www.bucksch.org/1/projects/mozilla/thelostmilestone/>