> I'm managing a team of sysadmins who try to work on projects on a 2
> week scrum cycle.
>
> They're very frequently interrupted by various teams, nagios alerts,
> and general day-to-day issues. This results in their velocity being
> very hard to calculate, and them frequently failing to deliver in
> their sprints.
>
> I have a couple of ideas:
>
> 1) Divide the time available to the team in two - so say that roughly
> half of the time available will be spend in interrupt-driven
> activities, so we should only expect a limited amount of time to work
> on projects. As long as interrupts don't exceed 50% of the time, we
> should be able to perform reasonably consistently.
This is what I've seen in a few places and what we do here. We have
interruptible people to respond to small tasks/tickets and site
issues, and people working on project/story cards, rotating the roles
around the team. At the moment we do that on a two week cycle, but
I've worked other places where there was an exposed dev/systems pair
who dealt with triage/proxy customer questions/issues.
We also have a mechanism to feed in issues we see as tech debt to try
and continually improve the system and when we have capacity people
can work on that.
Do you have any technical practices to support your systems team
ability to deliver (config management, automated infrastructure, etc)
and non-prod environments for them to work in? If you're just having
the daily standups and two week iterations/sprints without making
change safe and easy then it's going to be hard work.
> I'm not convinced (2) works - sysadmin interrupts tend to be highly
> urgent, so perhaps the best approach is just to set our expectations
> fairly low and measure velocity. The trouble is that as a team we can
> rarely predict how much interruption there will be, so our velocity
> will be all over the place.
The problem with treating them as bugs is that often the bugs are in
production already at that stage and thus the cost of change is high
and the urgency to change is high, meaning you're unlikely to see much
in the way of driving continual systems improvement.
> I've read good things about Kanban, but have no experience - can
> anyone share? It seems to fit the sysadmin model rather well, so
> perhaps it's worth a try?
So with a lean approach you'd have a few things to do, so you could
"stop the line" in case of catastrophic failure, then if issues come
in. The other key thing is flow, you know you have a capacity for
certain amounts of things in flight and everything gets pulled. Issues
are just instantly prioritized bits of work. I've not see it in
practice for a systems team though.
Paul
I've been stalking this group for some time but now I feel that I
might have something to contribute so here goes.
On Wed, Oct 14, 2009 at 11:31 AM, Stephen
<atalanta...@googlemail.com> wrote:
> 1) Divide the time available to the team in two - so say that roughly
> half of the time available will be spend in interrupt-driven
> activities, so we should only expect a limited amount of time to work
> on projects. As long as interrupts don't exceed 50% of the time, we
> should be able to perform reasonably consistently.
This sounds like a reasonable approach. We have a team of four
sysadmins and in our two week sprints we roughly divide the time in
half. One half for interruption driven work and the other half for
"project" work. That means work which is planned and takes longer to
complete. The interruption work is usually short tasks such as
creating user accounts etc.
We try to manage the interruptions by agreeing every morning who will
handle the interruptions primaly. This can be one person or many. The
idea is to give people who work on long tasks the needed peace so they
can get their flow on.
> I'm not convinced (2) works - sysadmin interrupts tend to be highly
> urgent, so perhaps the best approach is just to set our expectations
This is usually how it goes. The interruptions need to be taken care
of immediately most of the time and you just can't help it. Better to
assign time for this in advance.
Just remember that the most important thing is to shape the process so
that it fits your needs. Remove the things that don't work and replace
them with something better.
The scrum that we use has been evolving for about two years now and
basically we have picked the best of scrum practises that fit sysadmin
work and removed the rest. I can say we have changed or removed quite
a lot and the system we now use is sometimes referred to (by us) as
"scam" instead of scrum because it has so little to do with scrum
anymore... :D
--
Joni Huuhtanen
Sysadmin
Reaktor (the one in Finland)
>
> 2009/10/14 Stephen <atalanta...@googlemail.com>:
> Hey
>
>> I'm managing a team of sysadmins who try to work on projects on a 2
>> week scrum cycle.
>>
>> They're very frequently interrupted by various teams, nagios alerts,
>> and general day-to-day issues. This results in their velocity being
>> very hard to calculate, and them frequently failing to deliver in
>> their sprints.
>>
>> I have a couple of ideas:
>>
>> 1) Divide the time available to the team in two - so say that roughly
>> half of the time available will be spend in interrupt-driven
>> activities, so we should only expect a limited amount of time to work
>> on projects. As long as interrupts don't exceed 50% of the time, we
>> should be able to perform reasonably consistently.
>
> This is what I've seen in a few places and what we do here. We have
> interruptible people to respond to small tasks/tickets and site
> issues, and people working on project/story cards, rotating the roles
> around the team. At the moment we do that on a two week cycle, but
> I've worked other places where there was an exposed dev/systems pair
> who dealt with triage/proxy customer questions/issues.
We had roughly the same setup. It worked for us.
>
> We also have a mechanism to feed in issues we see as tech debt to try
> and continually improve the system and when we have capacity people
> can work on that.
That's one of the most important parts of the whole story. You'll only
increase your velocity over time, if you constantly do root cause
analysis of urgent issues and address those as stories to be worked
on. We've seen our team coming out of constant fire fighting mode into
a very stable and predictable work mode within a couple of months only
by making sure that we take the time to fix root causes!
>
> Do you have any technical practices to support your systems team
> ability to deliver (config management, automated infrastructure, etc)
> and non-prod environments for them to work in? If you're just having
> the daily standups and two week iterations/sprints without making
> change safe and easy then it's going to be hard work.
If you've nothing of the above, at least a non-prod environment (close
enough to the real prod env) for the team to work in is a must. E.g.
if you run a cluster of 5 servers in production you should have at
least a two node cluster (same OS, same installed software, same
config where possible) as a test environment for your sysadmins.
>
>> I'm not convinced (2) works - sysadmin interrupts tend to be highly
>> urgent, so perhaps the best approach is just to set our expectations
>> fairly low and measure velocity. The trouble is that as a team we
>> can
>> rarely predict how much interruption there will be, so our velocity
>> will be all over the place.
>
> The problem with treating them as bugs is that often the bugs are in
> production already at that stage and thus the cost of change is high
> and the urgency to change is high, meaning you're unlikely to see much
> in the way of driving continual systems improvement.
In your current mode of working, the deviation of velocity will not
change. You have to change the way you work (by addressing root cause)
to be able to change something here. Marry Poppendieck's upcoming book
"Leading Lean Software Development" might be a good start on this way
of thinking (http://my.safaribooksonline.com/9780321699633)
Matthias
http://www.agileweboperations.com
http://twitter.com/mmarschall