The notion of triage involves the prefix “tri”, which means “three”. In medical triage, emergency patients are divided into three classes: those who will probably live, whether they receive treatment or not; those who will likely die, whether they receive treatment or not; and those for whom immediate care may make a difference. Triage is a quick rule of thumb to decide how to allocate limited care resources when the demand is far too high.
The analogy, of course, would be to divide bugs into three classes (or some other number I suppose), to decide which ones get attention.
But I’d like to stop us right here. They are not “bugs”, that wander in through the cracks in the doors. Everything in the product is something that the developers put in (or left out) by their own actions. They’re not bugs. They’re defects. They’re defective work. The work was not really “done”. We’ll come back to that. I find that it helps to refer to defects because is reminds us that they are up to us, not some random event like rain.
The notion of “Defect Triage” seems to be all over the map in the literature, and is often associated with complex procedures of deciding whether a support request reflects a defect or a need for help, and so on and so on. It is possible that a procedure would be useful, but it seems unlikely to me that a standard procedure will work in more than one situation. That is, the details of processing bugs will always need to be specialized to the situation in hand.
Generally speaking, I find that dealing with defects in Scrum can and should be much simpler than most triage procedures.
In Scrum, the Product Owner decides what the team will do (and the team decides how much work to take on and how to do it). Therefore, in Scrum, the Product Owner decides which defects to fix, and which ones to defer. To have a separate stream of work, or a subsetting of the team who pull defects from somewhere, essentially reduces the Product Owner’s ability to get the best possible result by the desired date. It undermines the authority of the Product Owner, making it harder for her to do her job.
This is really never ideal. Dedicating a person or a subset of the team to defects is demotivating, as Andy pointed out. In addition, it costs the team the talents of those individuals over that time period. It weakens the team. Additionally, it weakens self-organization. The team is supposed to decide how to do the work, not some outside resource allocator. Even if the team wanted to organize that way, however, I’d recommend against it. As Andy also pointed out, whatever you try should be assessed and improved in the retrospective.
But wait, there’s more.
At the beginning of some Sprint, there are N defects.
How many defects should be in the product at the end of the Sprint? Well, if the team is producing an increment of “done” software, there should be no more than N. Suppose there is one new defect, in item A. What can we say?
First of all, item A is not done. Yet somehow we thought it was. What caused that? One common cause is that the team has a separate testing process that happens after the Sprint. This notion is terribly dangerous. It means that the team can never know whether it is done or not. Therefore it can never ship an increment of done software. But Scrum says we must. Therefore the process is broken.
Another possible reason — the only possible real reason, actually — is that the software was not sufficiently tested during the Sprint. Downstream testing or not, if we test the software well enough, the downstream people won’t find any problems in our code. So we’re not producing done software. Therefore the process is broken.
Now, I’m not one who says you have to do Scrum because you have to do Scrum. You do Scrum for what it provides, which is a visible flow of done software that enables the Product Owner to create the best possible product by the end date. If there are defects, the software’s not done, the information isn’t visible, and the Product Owner can’t do her job. You don’t get the benefit of Scrum. The benefit, the ability to actually steer your product to success, is what you should care about, not whether Scrum says or not.
So we cannot tolerate the defect count increasing, even by one defect, at the end of any Sprint. Naturally, we are human, and it will happen from time to time. And every time that it happens, we need to do the same thing, roughly this:
- Give the new defect to the Product Owner to prioritize into future Sprints;
- In the Sprint Retrospective, figure out what part of our process allowed the defect to slip through our testing;
- Improve our process so that that defect, and defects like it, will not slip through.
Note that we improve our process. This is not a witch hunt for “who wrote that defect”, it is a process analysis for “how did we, the whole team, not catch this, and what can we do to catch similar things next time?”
Typical improvements are these:
- Ensure that every Sprint backlog item has concrete acceptance criteria;
- Have acceptance criteria up front: see “definition of ready”;
- Improve those criteria as indicated to ensure defects like this don’t get through;
- Slice stories smaller, so that acceptance criteria are easier to write and check;
- Improve programmer-level testing to provide a second net of tests to prevent problems like this;
- Use test-driven development to provide programmer-level testing more reliably.
There are many more possible improvements, and many details to these.
The fundamental points are:
- Scrum has a Product Owner who decides what we work on, who therefore decides what defects will be worked on,
- Each defect escaping the Sprint should be addressed in the Retrospective.
Note two important things about thinking this way:
First, ideally, our defect list will not grow: it can only shrink. Even if things are not ideal, it will grow far more slowly.
Second, if we work this way from the beginning, we’ll have far fewer defects than we did in the old days. How many fewer? Teams working this way report from one tenth down to one one-hundredth of the defect rates they had in the past. Teams quite commonly report one or two defect escapes per year into the live product.
This process is more effective, because it actually reduces defects at the source, instead of providing a fake sense that something is being done by pushing bug reports around on the table.
At first this seems like it’ll be more difficult. Well, no. Fixing a thousand bugs is difficult. Frankly it’s impossible. Preventing them is easy. You just have to pay attention and improve your process.
Ron Jeffries
Perfectionism is the voice of the oppressor -- Anne Lamott