Sysadmin Velocity

161 views
Skip to first unread message

Stephen

unread,
Oct 14, 2009, 4:31:07 AM10/14/09
to Agile System Administration
Hi,

I'm managing a team of sysadmins who try to work on projects on a 2
week scrum cycle.

They're very frequently interrupted by various teams, nagios alerts,
and general day-to-day issues. This results in their velocity being
very hard to calculate, and them frequently failing to deliver in
their sprints.

I have a couple of ideas:

1) Divide the time available to the team in two - so say that roughly
half of the time available will be spend in interrupt-driven
activities, so we should only expect a limited amount of time to work
on projects. As long as interrupts don't exceed 50% of the time, we
should be able to perform reasonably consistently.

2) Treat sysadmin interrupts as bugs - in a developer scrum we might
include a dozen bugs in the backlog, each of which would be
estimated. If a new bug comes up, it has to be assessed against the
bugs in the current sprint, and swapped out for a bug of equivalent
value, if it is considered more important.

I'm not convinced (2) works - sysadmin interrupts tend to be highly
urgent, so perhaps the best approach is just to set our expectations
fairly low and measure velocity. The trouble is that as a team we can
rarely predict how much interruption there will be, so our velocity
will be all over the place.

I appreciate that part of my job (and part of the point of applying
agile priniples to syadmin work) is to reduce the amount of chaos, and
get to the point where systems are more predictable, however, on that
journey, what is the best way to manage velocity?

I've read good things about Kanban, but have no experience - can
anyone share? It seems to fit the sysadmin model rather well, so
perhaps it's worth a try?

Thanks,

S.

Paul Nasrat

unread,
Oct 14, 2009, 5:26:49 AM10/14/09
to agile-system-...@googlegroups.com
2009/10/14 Stephen <atalanta...@googlemail.com>:
Hey

> I'm managing a team of sysadmins who try to work on projects on a 2
> week scrum cycle.
>
> They're very frequently interrupted by various teams, nagios alerts,
> and general day-to-day issues.  This results in their velocity being
> very hard to calculate, and them frequently failing to deliver in
> their sprints.
>
> I have a couple of ideas:
>
> 1) Divide the time available to the team in two - so say that roughly
> half of the time available will be spend in interrupt-driven
> activities, so we should only expect a limited amount of time to work
> on projects.  As long as interrupts don't exceed 50% of the time, we
> should be able to perform reasonably consistently.

This is what I've seen in a few places and what we do here. We have
interruptible people to respond to small tasks/tickets and site
issues, and people working on project/story cards, rotating the roles
around the team. At the moment we do that on a two week cycle, but
I've worked other places where there was an exposed dev/systems pair
who dealt with triage/proxy customer questions/issues.

We also have a mechanism to feed in issues we see as tech debt to try
and continually improve the system and when we have capacity people
can work on that.

Do you have any technical practices to support your systems team
ability to deliver (config management, automated infrastructure, etc)
and non-prod environments for them to work in? If you're just having
the daily standups and two week iterations/sprints without making
change safe and easy then it's going to be hard work.

> I'm not convinced (2) works - sysadmin interrupts tend to be highly
> urgent, so perhaps the best approach is just to set our expectations
> fairly low and measure velocity.  The trouble is that as a team we can
> rarely predict how much interruption there will be, so our velocity
> will be all over the place.

The problem with treating them as bugs is that often the bugs are in
production already at that stage and thus the cost of change is high
and the urgency to change is high, meaning you're unlikely to see much
in the way of driving continual systems improvement.

> I've read good things about Kanban, but have no experience - can
> anyone share?  It seems to fit the sysadmin model rather well, so
> perhaps it's worth a try?

So with a lean approach you'd have a few things to do, so you could
"stop the line" in case of catastrophic failure, then if issues come
in. The other key thing is flow, you know you have a capacity for
certain amounts of things in flight and everything gets pulled. Issues
are just instantly prioritized bits of work. I've not see it in
practice for a systems team though.

Paul

Julian Simpson

unread,
Oct 14, 2009, 5:33:24 AM10/14/09
to agile-system-...@googlegroups.com
What Paul said.

We also track adhoc work during sprints in our (combined
developer/sysadmin/bottle washer) team. Each sprint we have one story
that we can then spawn tasks from. We can't prevent the production
issues but we can honestly report back what we did.

J.

2009/10/14 Paul Nasrat <pna...@googlemail.com>:
--
Julian Simpson
Software Build and Deployment
http://www.build-doctor.com

Joni Huuhtanen

unread,
Oct 14, 2009, 5:35:29 AM10/14/09
to agile-system-...@googlegroups.com
Hiya.

I've been stalking this group for some time but now I feel that I
might have something to contribute so here goes.

On Wed, Oct 14, 2009 at 11:31 AM, Stephen
<atalanta...@googlemail.com> wrote:
> 1) Divide the time available to the team in two - so say that roughly
> half of the time available will be spend in interrupt-driven
> activities, so we should only expect a limited amount of time to work
> on projects.  As long as interrupts don't exceed 50% of the time, we
> should be able to perform reasonably consistently.

This sounds like a reasonable approach. We have a team of four
sysadmins and in our two week sprints we roughly divide the time in
half. One half for interruption driven work and the other half for
"project" work. That means work which is planned and takes longer to
complete. The interruption work is usually short tasks such as
creating user accounts etc.

We try to manage the interruptions by agreeing every morning who will
handle the interruptions primaly. This can be one person or many. The
idea is to give people who work on long tasks the needed peace so they
can get their flow on.


> I'm not convinced (2) works - sysadmin interrupts tend to be highly
> urgent, so perhaps the best approach is just to set our expectations

This is usually how it goes. The interruptions need to be taken care
of immediately most of the time and you just can't help it. Better to
assign time for this in advance.

Just remember that the most important thing is to shape the process so
that it fits your needs. Remove the things that don't work and replace
them with something better.

The scrum that we use has been evolving for about two years now and
basically we have picked the best of scrum practises that fit sysadmin
work and removed the rest. I can say we have changed or removed quite
a lot and the system we now use is sometimes referred to (by us) as
"scam" instead of scrum because it has so little to do with scrum
anymore... :D

--
Joni Huuhtanen
Sysadmin
Reaktor (the one in Finland)

Matthias Marschall

unread,
Oct 14, 2009, 5:37:51 AM10/14/09
to agile-system-...@googlegroups.com
Hi

>
> 2009/10/14 Stephen <atalanta...@googlemail.com>:
> Hey
>
>> I'm managing a team of sysadmins who try to work on projects on a 2
>> week scrum cycle.
>>
>> They're very frequently interrupted by various teams, nagios alerts,
>> and general day-to-day issues. This results in their velocity being
>> very hard to calculate, and them frequently failing to deliver in
>> their sprints.
>>
>> I have a couple of ideas:
>>
>> 1) Divide the time available to the team in two - so say that roughly
>> half of the time available will be spend in interrupt-driven
>> activities, so we should only expect a limited amount of time to work
>> on projects. As long as interrupts don't exceed 50% of the time, we
>> should be able to perform reasonably consistently.
>
> This is what I've seen in a few places and what we do here. We have
> interruptible people to respond to small tasks/tickets and site
> issues, and people working on project/story cards, rotating the roles
> around the team. At the moment we do that on a two week cycle, but
> I've worked other places where there was an exposed dev/systems pair
> who dealt with triage/proxy customer questions/issues.

We had roughly the same setup. It worked for us.

>
> We also have a mechanism to feed in issues we see as tech debt to try
> and continually improve the system and when we have capacity people
> can work on that.

That's one of the most important parts of the whole story. You'll only
increase your velocity over time, if you constantly do root cause
analysis of urgent issues and address those as stories to be worked
on. We've seen our team coming out of constant fire fighting mode into
a very stable and predictable work mode within a couple of months only
by making sure that we take the time to fix root causes!

>
> Do you have any technical practices to support your systems team
> ability to deliver (config management, automated infrastructure, etc)
> and non-prod environments for them to work in? If you're just having
> the daily standups and two week iterations/sprints without making
> change safe and easy then it's going to be hard work.

If you've nothing of the above, at least a non-prod environment (close
enough to the real prod env) for the team to work in is a must. E.g.
if you run a cluster of 5 servers in production you should have at
least a two node cluster (same OS, same installed software, same
config where possible) as a test environment for your sysadmins.

>
>> I'm not convinced (2) works - sysadmin interrupts tend to be highly
>> urgent, so perhaps the best approach is just to set our expectations
>> fairly low and measure velocity. The trouble is that as a team we
>> can
>> rarely predict how much interruption there will be, so our velocity
>> will be all over the place.
>
> The problem with treating them as bugs is that often the bugs are in
> production already at that stage and thus the cost of change is high
> and the urgency to change is high, meaning you're unlikely to see much
> in the way of driving continual systems improvement.

In your current mode of working, the deviation of velocity will not
change. You have to change the way you work (by addressing root cause)
to be able to change something here. Marry Poppendieck's upcoming book
"Leading Lean Software Development" might be a good start on this way
of thinking (http://my.safaribooksonline.com/9780321699633)

Matthias

http://www.agileweboperations.com
http://twitter.com/mmarschall

Patrick Debois

unread,
Oct 14, 2009, 5:48:01 AM10/14/09
to agile-system-...@googlegroups.com
Hi, Stephen,

interesting topic. I've had the same experiences with Scrum for
sysadmins, so you're not alone anymore ;-)

By reducing stories into smaller tasks on the sprint backlog, it tries
to increase the predictability.
Off course not everything goes according plan, and there will be
interrupts too, but they will be limited.
Another mechanism is the scrummaster, keeping things going and
unblocking blocking factors.

If your interrupts are rather small , you will be able to buffer this.
This would go for approach 2.

But if they have a large impact on a sprint, and if they appear often.
It could be due to technical debt in your infrastructure, or handling
too many projects by a small team, or just plain understaffing.
Technical debt can be handled by upgrading, automating stuff. If this
is your problem, it might be related to the infrastructure of the project.
If so you can have a few spikes to have corrections. If the
infrastructure is shared across other projects and the incidents come
from other factors,
you have to rise above the level of the project.

The problem is that velocity is just an estimate IMHO. But it is often
considered as a written in stone productivity indicator.
If you would estimate your firefighting efforts , and would add them to
the end of the sprint burndown, it would not be that different.
So if the work was related to the project, it's easy to explain it to
the product owner.

In most organizations, the sysadmin works under an Operations Manager
and is lend to projects (to fill up the time not working on incidents).
If the product owner would look at it from the sole perspective of his
own project , yes velocity has decreased. As for the operations manager
he did a good job.
This is inherent to the shared nature of sysadmins between operations
and projects.

I've learned to make this extra efforts visible to the product owner.
Off course he would feel that HIS project has not advanced, but at least
he knows why.

Sometimes the product owner/scrum manager, will shout that he needs the
sysadmin resource and discussion begins between ops and project mgr.
Both have to remember that they work for the same company with the same
goal , and that solving the operational problem might give the company
more benefit then advancing the project.
So even if the project team is idle, it is not necessary a bad thing. I
know, it's a though one to sell (but read the The Goal of Elio Goldrat)

I find that Kanban focuses less on this velocity (which is just an
estimate and nothing more) and more on the flow of getting things done.
But I personally don't have experience with the Kanban with sysadmins,
but I know that Mattias Skarin - http://blog.crisp.se/mattiasskarin has
introduced Kanban with sysadmins.
He has an upcoming book that describes how he introduced it. He will be
coming to http://www.devopsdays.org/ and I hope to make a recording of it.

Patrick

Stephen

unread,
Oct 14, 2009, 11:15:35 AM10/14/09
to Agile System Administration
Hi,

> We also have a mechanism to feed in issues we see as tech debt to try
> and continually improve the system and when we have capacity people
> can work on that.

I've done this religiously in other teams. This is a brand new team
for me. The team is established - I'm the new manager.

> Do you have any technical practices to support your systems team
> ability to deliver (config management, automated infrastructure, etc)
> and non-prod environments for them to work in? If you're just having
> the daily standups and two week iterations/sprints without making
> change safe and easy then it's going to be hard work.

At present they have a staging area, but it isn't clear how that works
in terms of its function, and who is able to use it and when.

I will be introducing puppet.

S.

Stephen

unread,
Oct 14, 2009, 11:20:30 AM10/14/09
to Agile System Administration
Hi,

> We also track adhoc work during sprints in our (combined
> developer/sysadmin/bottle washer) team.  Each sprint we have one story
> that we can then spawn tasks from.  We can't prevent the production
> issues but we can honestly report back what we did.

How do you do that? Do you ask people to record elapsed time? At the
end of the sprint do you ask them for a retrospective figure?

The team currently estimates in story points (fib series to 13). They
have a few cards called 'Production Support' etc, which they estimate
at 8 each. They appear to have committed to about 200 story points in
the sprint, so they've set aside about 10-15% for interruptions.

Within constraints I don't see that they can have a big 'interrupt'
card, valued at 100, and it seems daft to have 8 13 point cards. I
would think it is more realistic to commit to 100 story points.

However, I do agree, some tracking of interrupt-driven work would be
very valuable.

S.

Stephen

unread,
Oct 14, 2009, 11:23:36 AM10/14/09
to Agile System Administration
Hi,

> By reducing stories into smaller tasks on the sprint backlog, it tries
> to increase the predictability.

Yes - I'm not actually a fan of story points for sysadmin work. I
agree for development work it can be very handy (see Mike Cohn's
defence in Agile Estimating and Planning), but for systems work I
think it makes more sense to go in for detailed estimates in ideal
days.

> Another mechanism is the scrummaster, keeping things going and
> unblocking blocking factors.

I'm taking over as scrum master.
> I've learned to make this extra efforts visible to the product owner.

I think this is the key - how do you go about it?

> I find that Kanban focuses less on this velocity (which is just an
> estimate and nothing more) and more on the flow of getting things done.
> But I personally don't have experience with the Kanban with sysadmins,
> but I know that Mattias Skarin -http://blog.crisp.se/mattiasskarinhas
> introduced Kanban with sysadmins.

It sounds like an interesting approach!

> He has an upcoming book that describes how he introduced it. He will be
> coming tohttp://www.devopsdays.org/and I hope to make a recording of it.

See you there! :)

S.

Arne Roock

unread,
Oct 15, 2009, 5:32:35 AM10/15/09
to Agile System Administration
Hi,
sysadmins are facing different problems than software development
teams. Especially the frequent interrupts and the emergency-tasks
(need to be done NOW) are a problem when using timeboxes. Here Kanban
provides a set of useful techniques, most of all different levels of
service agreement. These are trade-offs between customers and
developers/sysadmins that establish certain rules: which tickets
should be done in which order and on which priority?
Kanban and Scrum are not really competing. A lot of teams keep the
scrum practices that work and add the kanban practices they consider
to be useful.
Most important sites to read about kanban:
http://www.limitedwipsociety.org/
http://www.agilemanagement.net/

And talk to Marcel in Ghent - he is thinking about Kanban in system
administration.

Cheers,
Arne

littleidea

unread,
Oct 15, 2009, 6:34:37 PM10/15/09
to Agile System Administration

For what it's worth, I think Kanban maps much better onto sysops work
than Scrum. (we won't get into dev work...)

Iterations are arbitrary. Manage flow.

Particularly when you have a lot of interrupts.

I personally would not make the interrupt work separate from the
projects as I believe that just leads to ambiguity.

I would put the projects on the board, then when interrupts come in,
they go up on the board and move across.

When projects aren't getting done because there is a big pile of
interrupts, I think it is easier to justify investing in fixing the
root causes of all the fire.

I like the way Arlo Belshee thinks about things. Check out this
podcast on Naked Planning:
http://cdn2.libsyn.com/agiletoolkit/Agile2008_Arlo_Belshee.mp3

Regards,
Andrew


On Oct 15, 3:32 am, Arne Roock <a...@homonyme.de> wrote:
> Hi,
> sysadmins are facing different problems than software development
> teams. Especially the frequent interrupts and the emergency-tasks
> (need to be done NOW) are a problem when using timeboxes. Here Kanban
> provides a set of useful techniques, most of all different levels of
> service agreement. These are trade-offs between customers and
> developers/sysadmins that establish certain rules: which tickets
> should be done in which order and on which priority?
> Kanban and Scrum are not really competing. A lot of teams keep the
> scrum practices that work and add the kanban practices they consider
> to be useful.
> Most important sites to read about kanban:http://www.limitedwipsociety.org/http://www.agilemanagement.net/

Stephen

unread,
Oct 16, 2009, 2:49:31 AM10/16/09
to Agile System Administration
Hi,

On Oct 15, 11:34 pm, littleidea <and...@reductivelabs.com> wrote:

> For what it's worth, I think Kanban maps much better onto sysops work
> than Scrum. (we won't get into dev work...)

The concept of managing a constant stream of work seems absolutely
ideal for sysops work.

> Iterations are arbitrary. Manage flow.

For sysadmins I agree - we're not generally delivering product - we're
supporting those who do, addressing technical debt in the systems, and
looking for opportunities to improve reslience, reliability and
performance.

> I would put the projects on the board, then when interrupts come in,
> they go up on the board and move across.

Could you explain what you mean by that?

I have two different experiences of agile planning. The first is a
fairly traditional XP approach, where at the start of an iteration the
team estimates the stories that the business have said are the most
important, and then commit to work on as much as they think they can
deliver by the end of the iteration. Typically I've managed this on a
wiki, or in a tool like Mingle.

The second is traditional scrum, with a wadge of backlog somewhere on
the extreme left, a scrum planning meeting in which a number of cards
are picked up to the expected velocity of the team, and then during
the sprint people work on cards, sticking them in 'in progress', then
'review' then 'done'.

How does a Kanban board differ?

> When projects aren't getting done because there is a big pile of
> interrupts, I think it is easier to justify investing in fixing the
> root causes of all the fire.

Definitely.

> I like the way Arlo Belshee thinks about things. Check out this
> podcast on Naked Planning:http://cdn2.libsyn.com/agiletoolkit/Agile2008_Arlo_Belshee.mp3

WIll check it out, thanks.

S.

littleidea

unread,
Oct 16, 2009, 4:22:31 AM10/16/09
to Agile System Administration

>
> > Iterations are arbitrary. Manage flow.
>
> For sysadmins I agree - we're not generally delivering product - we're
> supporting those who do, addressing technical debt in the systems, and
> looking for opportunities to improve reslience, reliability and
> performance.

They are arbitrary for dev work too, but that's a potential
distraction.

> > I would put the projects on the board, then when interrupts come in,
> > they go up on the board and move across.
>
> Could you explain what you mean by that?
>

Sure, or at least I can try.

Here's a short video of Arlo explaining Naked Planning:
http://video.yahoo.com/watch/2150754/6801952

The interrupts are equivalent to what he's calling 'urgent'.

The key is if something needs to be worked on, it gets a card on the
board. This avoids the scenario where you have a span with lots of
fires that need attention and people are wondering why the projects
aren't moving forward.

Here's a nice cartoon explanation of Kanban.
http://blog.crisp.se/henrikkniberg/2009/06/26/1246053060000.html

If we extend this to devops ideas where people have adopted some
notion of continuous delivery, with every one working off the same
board, things could get very interesting.

Might not be appropriate for every team or scenario, but I believe it
has potential.

> I have two different experiences of agile planning.  The first is a
> fairly traditional XP approach, where at the start of an iteration the
> team estimates the stories that the business have said are the most
> important, and  then commit to work on as much as they think they can
> deliver by the end of the iteration.  Typically I've managed this on a
> wiki, or in a tool like Mingle.
>

Kanban is basically dropping the idea of iterations. Just work on the
highest priority stuff and kick ass. If you have too many
interruptions, I'd consider that technical debt you are paying
interest on, and the priority should probably be to pay down the
principle.

> The second is traditional scrum, with a wadge of backlog somewhere on
> the extreme left, a scrum planning meeting in which a number of cards
> are picked up to the expected velocity of the team, and then during
> the sprint people work on cards, sticking them in 'in progress', then
> 'review' then 'done'.
>
> How does a Kanban board differ?
>

Kanban differs in the sense that there is no iteration or estimation.
(although some people have mixed a bit of that back in.) Velocity is
reflected in how fast cards move across the board. (if you listen to
Arlo on the podcast, he explains why estimation is an act of self
deception.)

John Allspaw

unread,
Oct 16, 2009, 12:04:00 PM10/16/09
to agile-system-...@googlegroups.com
FWIW, at our shop we don't have a name for whatever we're doing. I'm not sure if it's like Kanban or not. I'm going to call it MumbleMumble.

We have a 'dev' team, and an 'ops' team. Major overlaps in each, but each bears their own responsibilities.
As for the 'ops' group (my team) - there's basically two types of work:

- 'project' work is of two categories: infrastructure work that we have (or want) to do to support organic growth, and infrastructure work to support an upcoming feature.

- 'site up' work which involves maintenance/alerts/tweaking/stuff-you-should-do-but-isn't-tied-inherently-to-a-feature-or-organic-growth

As to the OP's topic, we don't actually have any regular timeframes for any of the project work done, except for:
- provisioning of new infrastructure (driven by capacity planning numbers)
- feature releases where there is external circumstances, like public announcements planned, etc. <- this doesn't happen often, but does happen

for the 'site-up' work, there are two things that helps (we hope) keep the interrupters/disruptors from poking in too much:
- (these things are basically reactions to either alerts or traffic-related issues that can't wait, nothing else)
- the guy who is on primary on-call rotation takes these interrupters for the most part, and it's baked into assumptions that any project work slated for that week for that dude takes that into consideration
- there is ongoing/neverending project work whose goal is to reduce those interrupters, fought mainly by constantly improving the metrics needed to track down and solve them automagically or consolidate them
- I also suspect that since traditional 'release management' isn't a burdensome part of either dev or ops job, this unloads that historically scary/pressure-filled bit off of everyone's plate (dev's deploy their own code, small changes, etc.)

I have no idea if this helps or confuses anyone at all. I'm hoping for the former. :)

--
John Allspaw
http://flickr.com/photos/allspaw

Ryan Dooley

unread,
Oct 16, 2009, 1:29:01 PM10/16/09
to agile-system-...@googlegroups.com
Yeah this is pretty much the way it was for Yahoo Production
Engineering when I was there. Our group had one other division in
dealing with hardware 'flow' which was all about organic growth and
ongoing break/fix of systems.

Cheers,
Ryan

littleidea

unread,
Oct 16, 2009, 4:18:05 PM10/16/09
to Agile System Administration

>
> > I have no idea if this helps or confuses anyone at all. I'm hoping for the
> > former. :)

Process is secondary to smart, committed people trying to solve
problems.

I believe the latter by itself can solve nearly any problem, while the
former by itself is next to useless.

Kanban is a form of information radiator to help make priorities and
progress visible.

I used to say 'there is no such thing as a free unicorn', but now I
can't...
http://unicorn.bogomips.org/

littleidea

unread,
Oct 16, 2009, 5:00:16 PM10/16/09
to Agile System Administration

>
> > I have no idea if this helps or confuses anyone at all. I'm hoping for the
> > former. :)

Patrick Debois

unread,
Oct 17, 2009, 5:01:08 AM10/17/09
to agile-system-...@googlegroups.com
Thanks John for sharing to us.

to me the key phrase is : the guy who is on primary on-call rotation
takes these interrupters for the most part, and it's baked into
assumptions that any project work slated for that week for that dude
takes that into consideration
In other words , there is a company wide consensus, that these
interrupts take precedence over project work. And this consensus is key,
that everyone in the company is clearly reminded of the company goal.
Making money.

Let's consider the following company growth scenario:

In a small team working on its own infrastructure and it's own product,
communication is easy , and everyone can overview the goal and see that
the effort of spending time in ops is time well spent.
The sysadmin helps them and all devs also know how to handle things.

Eventually the company grows and there are coming more projects. So have
economies of scale, each project will use part of a shared infrastructure.
So often a group of sysadmins is dedicated at working on the operational
part and others are working on improving the infrastructure and still
others are working for the projects.

Either because of their specialized skills or cutting costs mentality,
the number of sysadmins gets reduced because manstaffing all these tasks
at a 100% is not considered economical.
So these resources are shared and respond to the highest priority.

The problem arises, when the project managers of the different projects
have no affinity with the other projects and their bonus soley depends
on their own project. What happens then is that they are trying to
create local optima.
In their view of reality, if their project is not advancing, in Scrum
their burndown chart has difficulties to advance, and they get a bad
feeling about this and try to still get hold of these precious share
resources.
And it doesn't help if upper management tells each of them they have an
important project.

I've seen this discussion way to often. So even if their project is not
advancing and they have their shared resource reassigned to production
things. They have to bear in mind that even if their project is not
advancing,
their shared resource is best put in another part of the company
(probably fighting to not loosing money instead of gaining money).

So because this rule is probably clear at flickr, that production takes
priority over projects, that this actually works. But it is hard if the
company produces one shot projects or different division which have no
affinity with other projects.

Automating and procedures will help optimize things, but unless people
understand that even if their burndown chart is not progressing that
this might be a good thing for the company, you will always have a
discussion around this.

John Allspaw

unread,
Oct 17, 2009, 9:02:29 AM10/17/09
to agile-system-...@googlegroups.com
Yeah, absolutely, it helps that non-degraded site uptime is held as the clearly understood most-important thing.

At Yahoo, 'site-up' is a priority 'zero' thing, all else has less priority. What that does is places a globally-understood value on it, and can help as a rallying point for both dev and ops to work together to work on processes and tools to reduce the number of interrupters. Which, of course, is where the gold is.

It also helps that ops, dev, and product all have organizational reporting structures all the way up to the CEO, which means that this understanding of the balance between site integrity and feature development is institutionalized. Which isn't to say that pushes and pulls don't happen, just that it doesn't allow for wildly varying local optima. :)

Sorry if this is off-topic.
-j

Mark Bainter

unread,
Nov 10, 2009, 2:32:31 AM11/10/09
to Agile System Administration
On Oct 16, 2:22 am, littleidea <and...@reductivelabs.com> wrote:
> > deliver by the end of the iteration.  Typically I've managed this on a
> > wiki, or in a tool like Mingle.
>
> Kanban is basically dropping the idea of iterations. Just work on the
> highest priority stuff and kick ass. If you have too many
> interruptions, I'd consider that technical debt you are paying
> interest on, and the priority should probably be to pay down the
> principle.

I'm rather new to either concept, but from the reading I've done
tonight (ouch, my head) the ability to communicate the state of things
on the team to people outside the team is a critical part of this. It
looks like it's fairly common to use a big board, but if you're trying
to gain traction where there's nothing like that available, are there
any other technical options? (Besides mingle I mean - and perhaps in
particular any OSS options?)

Obviously, for within the team, there's all kinds of ways you could
accomplish this with existing wiki tools, or even perhaps something
like mantis with a little work. However, it seems to me that such
solutions are lacking the clarity of the physical board in
communicating to people outside of your team.

Reply all
Reply to author
Forward
0 new messages