But if there are serious problems -- machine instability, intermittent 
test failures, or tinderbox-only crashes to name a few -- the current
system fails for two main reasons: there is no continuity as the sheriff 
changes from day-to-day, and the daily sheriffs have no clear escalation 
path if they are faced with a problem that they can't handle.
So, to resolve these two problems, I would suggest that we create a new 
super-sheriffs group, much like we have a super-reviewers group.  The 
set of people within this group would, as a group:
* have continuous responsibility for tests, crashes, etc. on the 
tinderbox, meaning that they would be on the hook for at least being 
aware of and tracking these areas;
* be the people who sheriff when the day's sheriff isn't around -- that 
is, add another step between the day's sheriff and #developers;
* give sheriffs a specific group of people whom they can contact when 
they have problems;
* have direct access to all tinderbox machines to be able to diagnose 
problems as they occur.
There are a number of people who have been performing a similar role 
already, but I think it would be helpful to formalize this and ensure 
that those people have the tools they need to do their jobs efficiently 
(specifically, direct access to the tinderboxes).
I think that this, along with creating a sheriff's newsgroup (do we have 
one already?) to better track discussions about both daily problems and 
ongoing ones, would help us get a handle on what's going on with the 
tree when problems arise.
- Vlad
This is an interesting idea, and seems to get to the heart of some of 
the concerns I've had with our current system too.  As you say, it works 
well enough when the system isn't stressed, but it falls down when 
things get tight.
I'm trying super-sheriffs on as an idea, and part of me wonders if 
really this is what the "sheriffs" group is supposed to be.  Are the 
sets really that distinct?  Because the Sheriffs are the group of people 
who have been tagged as "ought to be able to manage the tree, given 
their experience with/exposure to it, and dedication to its health" 
which is much of what you want in super-sheriffs.
I think the reasons you propose a distinct super-sheriffs group are:
1) Not all sheriffs actually do behave in the ways I describe - maybe 
they don't know how to sheriff, or don't want to, or don't even know 
that they are on the list?  The current list, after all, wasn't created 
by enlisting only interested and excited people.  It was closer to just 
an amalgam of the front-end and platform dev teams.
2) You're proposing additional access privileges, and it makes sense to 
limit that group to people who want and can make use of the authority
3) You're proposing additional responsibilities, and it makes sense to 
limit that group to people who want and can honour those responsibilities.
Would it make sense, if that list is accurate, to consider instead just 
changing the sheriff list?  There are 6 weeks worth of people in the 
calendar, and more than that CC'd on the bug. is there a core of, say, 
15 people who all actually want the job?  If so, is it more valuable to 
have them be the only sheriffs and grow the role to enable the things 
you describe?
I'm not actually disagreeing with your proposal, I think there are 
several of us, as you say, who do pieces of this work already because we 
see it needs to be done.  And I think that kind of tiered structure, 
like we have with review, has worked well for us in terms of mentoring, 
socializing knowledge, and providing leadership at critical points.
I just want to make sure that this kind of redesign leaves us in a place 
with fewer problems and not with, say, a list of sheriffs which, having 
been pruned of its most interested sheriffs, is now even more populated 
with people who don't really understand or want their role.  Maybe those 
are separate problems though.  The kind of problems that a list of 
super-sheriffs could solve, say.
I'm becoming convinced.
> I think that this, along with creating a sheriff's newsgroup (do we have 
> one already?) to better track discussions about both daily problems and 
> ongoing ones, would help us get a handle on what's going on with the 
> tree when problems arise.
I think this is a great idea, almost regardless of the super-sheriffs 
question. If you haven't filed the bug yet, I will.  If you have, please 
cc me.
Cheers,
Johnathan
--> I agree that we should build out a list of super-sheriffs.
In addition to the benefits & responsibilities Vlad outlines...
> The set of people within this group would, as a group:
>
> * have continuous responsibility for tests, crashes, etc. on the  
> tinderbox, meaning that they would be on the hook for at least being  
> aware of and tracking these areas;
> * be the people who sheriff when the day's sheriff isn't around --  
> that is, add another step between the day's sheriff and #developers;
> * give sheriffs a specific group of people whom they can contact  
> when they have problems;
> * have direct access to all tinderbox machines to be able to  
> diagnose problems as they occur.
I think this also lets us improve the pool of sheriffs by having a  
group that can identify, mentor and build up new sheriffs.  Sheriffing  
is something that most people actively contributing code should be a  
part of - both to understand the larger ramifications of their coding  
work and to help keep the project, and the tree, healthy.
Super-sheriffs can identify new contributors and help integrate them  
into the pool.  Contrary to some of my musings last time, I think the  
pool of sheriffs should be GROWING, not shrinking, on balance.  But I  
think that can't happen if the barrier to entry remains as high as it  
is.  The work that Vlad outlines should also make day-to-day  
sheriffing an easier job, since a coordinated group like this should  
be taking down long-standing problems like reducing the number of  
random reds/oranges.
As Vlad mentions, we are already doing this.  Several of us try to  
bring up new sheriffs, try to understand & fix systematic problems,  
and try to keep the tree green.  Empowering those people to more  
effectively do that work in a more deliberate and direct way is a good  
thing.
Consider me sold,
Johnathan
---
Johnathan Nightingale
Human Shield
joh...@mozilla.com
There are a lot of details that need to be worked out before doing this. 
Build machines aren't LDAP controlled - it's a non trivial thing to give 
a bunch of people access to them.
Additionally, we would need to audit some permissions to ensure that 
non-build folks do not have access to build ssh keys (which in turn 
would give them access to a number of critical systems).
Without discussing this with other RelEng folks I _think_ I'm okay with 
this in principle, but there's a fair amount of work to be done to make 
it possible.
- Ben
We already have too many people who don't really sheriff when they are 
listed as sheriff.  I'd hate to see that problem get worse.
/sdwilsh
> On 11/12/08 6:20 PM, Vladimir Vukicevic wrote:
>> * be the people who sheriff when the day's sheriff isn't around --  
>> that
>> is, add another step between the day's sheriff and #developers;
> My biggest problem with this is that it gives another incentive for  
> the sheriff to not be around.  Right now the incentive is something  
> like "hopefully people will be responsible, and hopefully nobody  
> will realize I'm not around".  With the set of super-sheriffs it  
> becomes "it's OK if I'm not around - one of the super-sheriffs will  
> have to step up".
>
> We already have too many people who don't really sheriff when they  
> are listed as sheriff.  I'd hate to see that problem get worse.
On the other hand, it also means that there are a group of people who  
can easily notice when that happens, and suggest that perhaps someone  
shouldn't actually be on the sheriff list.
On the gripping hand[1], it's important that everyone who has commit  
access understands the cost of pushing changes to the tree. I think  
that by making every committer responsible for a day (or even few  
hours within a day) of checking against performance, regressions and  
test failures, we'll end up with a better set of committers. In  
parallel we can continue to invest in tools that reduce the pain of  
being a sheriff (many have long been whispered of in the halls of the  
sheriff: clearer indication of performance regressions, reinstatement  
of the blame column, a cleaner tinderbox layout) but in the main we  
need to make these activities more familiar to every committer, so the  
burden of being a sheriff is reduced.
cheers,
mike
[1]: http://en.wikipedia.org/wiki/The_Mote_in_God%27s_Eye - read it.
> But if there are serious problems -- machine instability, intermittent
> test failures, or tinderbox-only crashes to name a few -- the current
> system fails for two main reasons: there is no continuity as the sheriff
> changes from day-to-day, and the daily sheriffs have no clear escalation
> path if they are faced with a problem that they can't handle.
I think one additional problem with the current sheriff scheme is that 
it's such a short and infrequent shift. I've often felt like I'm 
relearning the basics each time, because either I forget things 
(documentation? ha!) or things have changed. That makes it a 
frustrating, inefficient experience.
Fixing that would probably require longer and/or more frequent sheriff 
duties. Say, a week at a time every other month. (Don't everyone 
volunteer at once!) Maybe it would help to have "sheriff and a deputy" 
-- the sheriff being on a longer duty cycle, and the deputy rotating 
daily. Sheriffs would become more efficient, gain experience faster, and 
be able to tackle longer-term projects. Deputies would be able to mentor 
from someone experienced, and lend a hand on busy days (which seem to be 
increasingly common).
(This could all be in addition to super-sheriffs, or an alternative.)
> * have direct access to all tinderbox machines to be able to diagnose
> problems as they occur.
+1, this would really be useful. Even if it's just a handful of people 
due to access concerns, or if there's a way to clone troublesome 
tinderboxes for developer experimentation.
Justin
On 13/11/08 02:20, Vladimir Vukicevic wrote:
>
> Over the past few weeks (or months), it's become increasingly obvious
> that the current sheriff structure really doesn't work when there are
> serious problems that need to be dealt with. Things function fine if the
> sheriff serves as a watchdog for noticing performance regressions
> (though this should be everyone's responsibility, and we need tools to
> make this easier), if he/she's just watching to make sure we don't have
> too many checkins at once, or other similar 'maintenance' tasks.
>
> But if there are serious problems -- machine instability, intermittent
> test failures, or tinderbox-only crashes to name a few -- the current
> system fails for two main reasons: there is no continuity as the sheriff
> changes from day-to-day, and the daily sheriffs have no clear escalation
> path if they are faced with a problem that they can't handle.
>
> So, to resolve these two problems, I would suggest that we create a new
> super-sheriffs group, much like we have a super-reviewers group. The set
> of people within this group would, as a group:
>
> * have continuous responsibility for tests, crashes, etc. on the
> tinderbox, meaning that they would be on the hook for at least being
> aware of and tracking these areas;
I'm not entirely sure what you are suggesting the super-sheriff does 
here. Right now we kind of have a system where whoever spots a 
regression tends to end up having to try to work out what caused it, 
however long that takes. This can suck for that person, it certainly 
makes me wonder why I check perf graphs sometimes, but it has the 
benefit that at least that person has all the info on the issue.
Are you suggesting that instead whoever finds the regression merely 
hands off to a super sheriff?
> * be the people who sheriff when the day's sheriff isn't around -- that
> is, add another step between the day's sheriff and #developers;
This is I think good for spotting people who aren't sheriffing regularly 
and either kicking them or replacing them. I wonder though if you have 
considered the timezone problem. For a long time now the sheriffing has 
been focused on PST work hours. That is understandable but it can leave 
problems where people who are on the sheriff list can't actually work 
those hours, and that there is commonly no sheriff outside of those hours.
> * give sheriffs a specific group of people whom they can contact when
> they have problems;
>
> * have direct access to all tinderbox machines to be able to diagnose
> problems as they occur.
I can see this being useful, but I'm not sure we need a super-sheriff 
group for it specifically. As it happens I've found merely having access 
to the buildbot waterfall to be seriously helpful when sheriffing.
>
> There are a number of people who have been performing a similar role
> already, but I think it would be helpful to formalize this and ensure
> that those people have the tools they need to do their jobs efficiently
> (specifically, direct access to the tinderboxes).
>
> I think that this, along with creating a sheriff's newsgroup (do we have
> one already?) to better track discussions about both daily problems and
> ongoing ones, would help us get a handle on what's going on with the
> tree when problems arise.
This I certainly agree with. I had thought about a wiki page or 
something to track things. But a newsgroup where a thread is created for 
perf regressions would be very useful. Equally useful would be a good 
way to hand off between sheriffs. It's pretty common that I get up in 
the morning to see confusing messages on the tinderbox and in bug 
reports and so I have to guess whether it is safe to check in or not. A 
simple message at the end of the day by a sheriff to say "this is the 
state of the tree, this is what to look for and this is when we can 
start checking in again" would be pretty helpful.
I have since filed the bug for the newsgroup (464609), which Gerv is 
ready to create.  Gerv suggested mozilla.dev.planning.sheriff, which is 
mostly fine, however I think that since this is a newsgroup for 
sheriffs, RelEng, IT and devs to come *together* on issues of tree 
management, it could stand a slightly broader name.
Before I go on, let's everybody take a firm grip on your bikeshedding 
instincts. :) Now, is there anything show-stoppingly-wrong with:
mozilla.dev.planning.treemanagement
If I don't hear of any such showstoppers in the next few days, I'll ask 
Gerv to create it as named.
Cheers,
Johnathan
Not bikeshedding I think, but do we have a good consensus on what the 
scope of discussion in the newsgroup should be. I would expect the 
following:
* Threads about investigations into performance/test regressions
* Threads about planned tree outages
* Threads about changes to the tree rules
I'm positive I'm missing stuff, but I could also be way off base.
I would expect this newsgroup to be about tools and techniques for 
sheriffing, schedules for sheriffing, discussion of random test failures and 
slow build boxes, etc.
cheers,
mike
_______________________________________________
dev-planning mailing list
dev-pl...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-planning
Thoughts after reading this thread:
1) Currently, we only have developers as sheriffs. However, intermittent
problems can be caused by bugs in the shipping product code, the test or
the RelEng infrastructure. What if QA and RelEng scheduled people to be
available, as advisors to those Developer sheriffs? That way,
Dev+QA+RelEng could immediately debug all aspects of a problem
*together*? Not sure how we'd figure out the timezones, and I think we
dont have enough people to do 24x7, but even if its only during some
certain hours of the day, the idea of a joint group that can immediately
investigate all parts of the problem space seems useful.
2) I like the idea of longer "sheriff" shifts (a week, not a day) as it
will allow people time to re-learn, be productive and get into the
rhythm of it.
3) Doing "sheriff + deputy" for the week seems like a good way to have
cover in case sheriff have to attend meeting / take day off during the
week... and its a great way to help train up new sheriffs.
4) Personally, I'd be in favour of refining membership of the existing
sheriff group, and not bother creating "super-sheriffs"; it seems less
confusing to me.
5) Why do we need a different sheriff/treemanagement newsgroup?
Personally, the existing dev.planning newsgroup seems like the right
venue for all this, imho.
Just my $0.02!
tc
John.
=====
What we really seem to be missing is someone from RelEng on a European 
timeshift, esp. between about 12:00 to 18:00 UTC.
I really like the idea of having on-call people from other areas to 
support the sheriff, so the sheriff knows whom to ask if e.g. the red is 
a machine issue.
Robert Kaiser
> 1) Currently, we only have developers as sheriffs. However,  
> intermittent
> problems can be caused by bugs in the shipping product code, the  
> test or
> the RelEng infrastructure. What if QA and RelEng scheduled people to  
> be
> available, as advisors to those Developer sheriffs? That way,
> Dev+QA+RelEng could immediately debug all aspects of a problem
> *together*? Not sure how we'd figure out the timezones, and I think we
> dont have enough people to do 24x7, but even if its only during some
> certain hours of the day, the idea of a joint group that can  
> immediately
> investigate all parts of the problem space seems useful.
Agreed, this is required. So far I think we've made do by cross- 
communicating on IRC, usually pulling nthomas or bhearsum in for  
questions and asssistance on IRC. A lot of the times people are told  
to file bugs (which is fine) but knowing that there's someone "on  
call" to deal with those issues in a timely fashion is critical and  
would reduce frustration.
> 2) I like the idea of longer "sheriff" shifts (a week, not a day) as  
> it
> will allow people time to re-learn, be productive and get into the
> rhythm of it.
I don't recall seeing this suggestion, aside from the idea that the  
"super sheriff" (or if you will, sheriff with daily deputies) will  
have a lesser responsibility towards monitoring each checkin and a  
greater one towards helping that day's sheriff (or if you will,  
deputy) should odd problems occur.
I'm hesitant to increase the time commitment of super sheriffs, as  
there's a good degree of overlap between them and our senior code  
reviewing community.
> 3) Doing "sheriff + deputy" for the week seems like a good way to have
> cover in case sheriff have to attend meeting / take day off during the
> week... and its a great way to help train up new sheriffs.
Absolutely agreed. This seems like a good way to institute the model  
you posit above.
> 4) Personally, I'd be in favour of refining membership of the existing
> sheriff group, and not bother creating "super-sheriffs"; it seems less
> confusing to me.
As long as the idea is that they get a group of deputies, this makes  
sense. While I originally felt the same way (just boot the sheriffs  
who don't seem to care as much) I think that it's a necessary part of  
our community process and should be a requirement for code committers  
as a way of enforcing good behaviours and habits.
> 5) Why do we need a different sheriff/treemanagement newsgroup?
> Personally, the existing dev.planning newsgroup seems like the right
> venue for all this, imho.
Right now we're only discussing about 50% of what we should be  
discussing in newsgroups; the rest is on IRC, wiki pages and group  
knowledge. I  don't think that dev.planning is the right place to  
discuss daily operational content as opposed to an area for project- 
wide announcements that affect planning and how we work (like this  
thread). The types of things I'd expect to see in a sheriffing  
newsgroup are discussions of "Hey, there's a random orange here, what  
could be causing it?" with the group collaborating to debug it; that  
would be a little much for dev-planning.
cheers,
mike
I love this idea. Having dedicated time where I am to *expect* 
interrupts makes me much more amiable to them. Personally, I'd like to 
see these shifts be a week long. This way, each person would only have 
one once every 6 weeks or so - giving folks a lot of time to *ignore* 
interrupts and focus on their day to day work.
> 5) Why do we need a different sheriff/treemanagement newsgroup?
> Personally, the existing dev.planning newsgroup seems like the right
> venue for all this, imho.
>
I agree with Mike and others here. dev.planning is a project wide 
newsgroup and sheriff stuff isn't relevant to most people outside of 
Engineering. A dev.whatever group would be *great* for tracking things 
like this and would hopefully reduce the occurrence of "I don't know 
where to file this, post this, or note this, so I won't do anything" 
incidents.
A couple more things on this:
If we implement this I think we should hold off an giving sheriffs 
direct tinderbox access. If there is a defined point of contact for 
Releng I think it will greatly reduce the need for sheriffs to step in. 
I'm totally open to revisiting this later, though.
And to be clear, I don't think a RelEng person needs to monitor the tree 
the same way the sheriff does. They shouldn't spend their day *actively* 
watching for problems. However, they should be hanging out in 
#developers, watching the newgroup, and watching for incoming bugs - so 
they can respond in a timely manner to them.
Mike Beltzner wrote:
> On 17-Nov-08, at 3:55 AM, John O'Duinn wrote:
>> 2) I like the idea of longer "sheriff" shifts (a week, not a day) as it
>> will allow people time to re-learn, be productive and get into the
>> rhythm of it.
> 
> I don't recall seeing this suggestion, aside from the idea that the
> "super sheriff" (or if you will, sheriff with daily deputies) will have
> a lesser responsibility towards monitoring each checkin and a greater
> one towards helping that day's sheriff (or if you will, deputy) should
> odd problems occur.
See justin dolske's post on this thread at 11/13/08 9:08 PM.
> I'm hesitant to increase the time commitment of super sheriffs, as
> there's a good degree of overlap between them and our senior code
> reviewing community.
Well, by doing longer "sheriff shifts", each shift is longer, but your
next shift is further in the future. Not sure if the actual overall time
commitment changes...
tc
John.
> hi;
>
> Thoughts after reading this thread:
>
> 1) ...What if QA and RelEng scheduled people to be
> available, as advisors to those Developer sheriffs? That way,
> Dev+QA+RelEng could immediately debug all aspects of a problem
> *together*?
I think this would be useful.  I don't know if it obviates the need  
for some sheriffs to have box access, I guess it comes down to  
availability of resources - if sheriffs can get the equivalent of  
hands-on control when they need to, for instance, attach a debugger to  
a test-failing process, I think that's a real improvement.
> 2) I like the idea of longer "sheriff" shifts (a week, not a day) as  
> it
> will allow people time to re-learn, be productive and get into the
> rhythm of it.
> 3) Doing "sheriff + deputy" for the week seems like a good way to have
> cover in case sheriff have to attend meeting / take day off during the
> week... and its a great way to help train up new sheriffs.
I am not keen to do this - I think it's too much to put on every  
sheriff, and I don't think it should be necessary.  I'd really rather  
see us reduce the amount of "re-learning" sheriffs need, by having a  
consolidated group of people tracking down and eliminating recurrent  
"fake oranges" and box problems.  I don't think the time-calculus  
works out in reality the way it might on paper - I think it's  
substantially harder for most people to commit to a whole week of  
reduced productivity elsewhere in their jobs.
> 4) Personally, I'd be in favour of refining membership of the existing
> sheriff group, and not bother creating "super-sheriffs"; it seems less
> confusing to me.
I'm not sure how confusing this would be to developer-sheriffs who are  
already accustomed to the review/superreview process, and the peer/ 
module owner structure.  In any event though, there are long-standing  
problems with the current organization - there are well-documented  
failures, particularly during crunch periods.  I think that if we have  
organizational belief that a super-sheriff structure would help track  
those down, the cost of change-confusion is probably worth it.
> 5) Why do we need a different sheriff/treemanagement newsgroup?
> Personally, the existing dev.planning newsgroup seems like the right
> venue for all this, imho.
Oh this is the easiest part of it for me.  In fine Mozilla tradition,  
that's just recognizing something that's already happening.  We have  
been playing various ad hoc games in the meantime - email threads that  
get most-but-not-all interested parties, or IRC conversations that are  
(perforce) timezone-specific.  If it falls into disuse, fine, but I  
suspect it's an coordination point we've needed for a while.
Cheers,
Having one releng person on call is nice, but looking at the daily habit 
of Ted and Mossop, we need more people on different timezones empowered 
to fix the tree.
If it's not feasible to have them have access to the actual machines, we 
need to re-establish methods to clobber (TODO: define what that is for 
multiple slaves) and kick builds.
I'm still working on "ignorance is bliss" when it comes down to 
sheriffing, but the "European tree" is in an utterly sad state. I 
actually don't remember landing stuff with a good feeling about what the 
tree was up to for quite a while. Luckily, most of my landings these 
days are just changes to all-locales or shipped-locales, so I don't 
really bother.
Axel
Ben Hearsum wrote:
> On 11/17/08 10:39 AM, Ben Hearsum wrote:
>> On 11/17/08 3:55 AM, John O'Duinn wrote:
>>> hi;
>>>
>>> Thoughts after reading this thread:
>>>
>>> 1) Currently, we only have developers as sheriffs. However, intermittent
>>> problems can be caused by bugs in the shipping product code, the test or
>>> the RelEng infrastructure. What if QA and RelEng scheduled people to be
>>> available, as advisors to those Developer sheriffs? That way,
>>> Dev+QA+RelEng could immediately debug all aspects of a problem
>>> *together*? Not sure how we'd figure out the timezones, and I think we
>>> dont have enough people to do 24x7, but even if its only during some
>>> certain hours of the day, the idea of a joint group that can immediately
>>> investigate all parts of the problem space seems useful.
>>>
>>
>> I love this idea. Having dedicated time where I am to *expect*
>> interrupts makes me much more amiable to them. Personally, I'd like to
>> see these shifts be a week long. This way, each person would only have
>> one once every 6 weeks or so - giving folks a lot of time to *ignore*
>> interrupts and focus on their day to day work.
Cool. :-)
> A couple more things on this:
> If we implement this I think we should hold off an giving sheriffs
> direct tinderbox access. If there is a defined point of contact for
> Releng I think it will greatly reduce the need for sheriffs to step in.
> I'm totally open to revisiting this later, though.
Agreed. If there's a RelEng contact/advisor available, they can look
into machine issues on the spot, without worries of an accidental change
messing up the machine.
> And to be clear, I don't think a RelEng person needs to monitor the tree
> the same way the sheriff does. They shouldn't spend their day *actively*
> watching for problems. However, they should be hanging out in
> #developers, watching the newgroup, and watching for incoming bugs - so
> they can respond in a timely manner to them.
Yes, thats why I suggested "advisors to those Developer sheriffs". I'm
not suggesting that only one person be sheriff, with sometimes QA,
RelEng being that solo sheriff coordinating landings, etc. I'm
suggesting that the two Developers who are sheriff have a QA person and
RelEng person available to help investigate problems that arise.
tc
John.
Yeah - I think the key point is that only people who are responsible for
tree management should need to add that group to their reading list.
"Normal" developers should see announcements of important tree
management events elsewhere.
Gerv
Yes, this would be a big help.  Bugs are useful for tracking, but 
usually if a bug has to be filed to fix something that's blocking people 
from committing, people can usually forget about getting anything 
checked in for the next few hours.  We need a faster way to fix these 
issues.
>> 2) I like the idea of longer "sheriff" shifts (a week, not a day) as it
>> will allow people time to re-learn, be productive and get into the
>> rhythm of it.
>
> I don't recall seeing this suggestion, aside from the idea that the
> "super sheriff" (or if you will, sheriff with daily deputies) will have
> a lesser responsibility towards monitoring each checkin and a greater
> one towards helping that day's sheriff (or if you will, deputy) should
> odd problems occur.
Yes, that was the original idea -- I am very much against increasing the 
time commitment for active sheriffing; it's a huge drain and effort, and 
I think I would go crazy doing it for a week straight.
>> 3) Doing "sheriff + deputy" for the week seems like a good way to have
>> cover in case sheriff have to attend meeting / take day off during the
>> week... and its a great way to help train up new sheriffs.
>
> Absolutely agreed. This seems like a good way to institute the model you
> posit above.
Well, no -- sheriff + deputy for the week implies, well, being sheriff 
for a week.  Continuing sheriff duties as normal but creating a group 
that's a safety net for the daily sheriff is what I had in mind -- that 
is, I didn't want to provide an opportunity for sheriffs to not do their 
daily duties, but more to have someone to turn to if they need help, and 
for a group to be aware of longer-standing problems.  Hence the idea for 
a separate group, because I'm not proposing any changes to existing 
sheriffs, just additional structure around them.
>> 5) Why do we need a different sheriff/treemanagement newsgroup?
>> Personally, the existing dev.planning newsgroup seems like the right
>> venue for all this, imho.
>
> Right now we're only discussing about 50% of what we should be
> discussing in newsgroups; the rest is on IRC, wiki pages and group
> knowledge. I don't think that dev.planning is the right place to discuss
> daily operational content as opposed to an area for project-wide
> announcements that affect planning and how we work (like this thread).
> The types of things I'd expect to see in a sheriffing newsgroup are
> discussions of "Hey, there's a random orange here, what could be causing
> it?" with the group collaborating to debug it; that would be a little
> much for dev-planning.
Yep, I think this was discussed in other posts, but that's what I think 
as well.  m.d.planning is for planning, not for discussion of details of 
the tree.
     - Vlad
Yes, this would be a big help.  Bugs are useful for tracking, but 
usually if a bug has to be filed to fix something that's blocking people 
from committing, people can usually forget about getting anything 
checked in for the next few hours.  We need a faster way to fix these 
issues.
>> 2) I like the idea of longer "sheriff" shifts (a week, not a day) as it
>> will allow people time to re-learn, be productive and get into the
>> rhythm of it.
>
> I don't recall seeing this suggestion, aside from the idea that the
> "super sheriff" (or if you will, sheriff with daily deputies) will have
> a lesser responsibility towards monitoring each checkin and a greater
> one towards helping that day's sheriff (or if you will, deputy) should
> odd problems occur.
Yes, that was the original idea -- I am very much against increasing the 
time commitment for active sheriffing; it's a huge drain and effort, and 
I think I would go crazy doing it for a week straight.
>> 3) Doing "sheriff + deputy" for the week seems like a good way to have
>> cover in case sheriff have to attend meeting / take day off during the
>> week... and its a great way to help train up new sheriffs.
>
> Absolutely agreed. This seems like a good way to institute the model you
> posit above.
Well, no -- sheriff + deputy for the week implies, well, being sheriff 
for a week.  Continuing sheriff duties as normal but creating a group 
that's a safety net for the daily sheriff is what I had in mind -- that 
is, I didn't want to provide an opportunity for sheriffs to not do their 
daily duties, but more to have someone to turn to if they need help, and 
for a group to be aware of longer-standing problems.  Hence the idea for 
a separate group, because I'm not proposing any changes to existing 
sheriffs, just additional structure around them.
>> 5) Why do we need a different sheriff/treemanagement newsgroup?
>> Personally, the existing dev.planning newsgroup seems like the right
>> venue for all this, imho.
>
> Right now we're only discussing about 50% of what we should be
> discussing in newsgroups; the rest is on IRC, wiki pages and group
> knowledge. I don't think that dev.planning is the right place to discuss
> daily operational content as opposed to an area for project-wide
> announcements that affect planning and how we work (like this thread).
> The types of things I'd expect to see in a sheriffing newsgroup are
> discussions of "Hey, there's a random orange here, what could be causing
> it?" with the group collaborating to debug it; that would be a little
> much for dev-planning.
Yep, I think this was discussed in other posts, but that's what I think 
On 11/17/08 9:39 AM, Johnathan Nightingale wrote:
> On 17-Nov-08, at 3:55 AM, John O'Duinn wrote:
>
>> hi;
>>
>> Thoughts after reading this thread:
>>
>> 1) ...What if QA and RelEng scheduled people to be
>> available, as advisors to those Developer sheriffs? That way,
>> Dev+QA+RelEng could immediately debug all aspects of a problem
>> *together*?
>
> I think this would be useful. I don't know if it obviates the need for
> some sheriffs to have box access, I guess it comes down to availability
> of resources - if sheriffs can get the equivalent of hands-on control
> when they need to, for instance, attach a debugger to a test-failing
> process, I think that's a real improvement.
Sure, though "equivalent of hands-on control" is not the same as 
hands-on control.  Trying to remotely debug with someone copy-pasting 
(or typing back responses, since copy-pasting from some of these 
machines isn't easy) is a huge pain and really unnecessary.  Having 
someone around to do things like reboot machines, check 
network/diskspace/etc. would be very helpful though, because then the 
sheriff can work with that person instead of necessarily always going to 
the supersheriff.
>> 4) Personally, I'd be in favour of refining membership of the existing
>> sheriff group, and not bother creating "super-sheriffs"; it seems less
>> confusing to me.
>
> I'm not sure how confusing this would be to developer-sheriffs who are
> already accustomed to the review/superreview process, and the
> peer/module owner structure. In any event though, there are
> long-standing problems with the current organization - there are
> well-documented failures, particularly during crunch periods. I think
> that if we have organizational belief that a super-sheriff structure
> would help track those down, the cost of change-confusion is probably
> worth it.
I'm not sure what's confusing, really -- there's a set of sheriffs and 
then there's a set of super-sheriffs who are empowered to do additional 
things that the sheriffs normally aren't, and who are also charged with 
day-to-day continuity of the status of the tree.
>> 5) Why do we need a different sheriff/treemanagement newsgroup?
>> Personally, the existing dev.planning newsgroup seems like the right
>> venue for all this, imho.
>
> Oh this is the easiest part of it for me. In fine Mozilla tradition,
> that's just recognizing something that's already happening. We have been
> playing various ad hoc games in the meantime - email threads that get
> most-but-not-all interested parties, or IRC conversations that are
> (perforce) timezone-specific. If it falls into disuse, fine, but I
> suspect it's an coordination point we've needed for a while.
Ok -- as per bug 464609, mozilla.dev.tree-management should be showing 
up sometime this week which should help.  (dev.planning is not the right 
place for this, as others have said; talking about why test X is failing 
is not in the same scope as planning a schedule for the next release).
     - Vlad