Testing metrics - still a problem

Dima

unread,

Dec 30, 2008, 5:58:03 AM12/30/08

to

Hi guys, I know that when you read the title you thought – another
one…

My problem is not with the actual metrics. I think I’ll be able to
define the right ones for me, which will answer my particular needs
and not some “proved”/”best practice” programme/approach.
My concern is with the people, how they would react to applying the
metrics. My biggest concern is that when metrics are applied target
shifts from finding software problems to satisfying the metrics. The
bigger the company – the bigger the issue.
I can’t completely forget about the metrics, they are needed for going
forward!

Anyone please share your experience with applying the metrics in your
companies. How people reacted, what are mistakes one should avoid. And
how to “keep your believe” in software quality and not metrics graphs.

cheers, Dumitru.

H. S. Lahman

unread,

Dec 30, 2008, 1:44:03 PM12/30/08

to

Responding to Dima...

> My problem is not with the actual metrics. I think I’ll be able to
> define the right ones for me, which will answer my particular needs
> and not some “proved”/”best practice” programme/approach.
> My concern is with the people, how they would react to applying the
> metrics. My biggest concern is that when metrics are applied target
> shifts from finding software problems to satisfying the metrics. The
> bigger the company – the bigger the issue.
> I can’t completely forget about the metrics, they are needed for going
> forward!

I am not sure what you mean by "applying metrics". Metrics just provide
data about processes. To be useful they need to be interpreted against
some set of goals. If the goals are poorly defined, then the metrics
will be of little use. Note that there is nothing to prevent you from
defining your own local goals.

In particular, if the metrics are to be used for process improvement,
then the shop needs some sort of infrastructure for process improvement
that can use the metrics data as input. The better that process, the
more benefit from the metrics.

--
Life is the only flaw in an otherwise perfect nonexistence
-- Schopenhauer

H. S. Lahman
H.la...@verizon.net

Shrinik

unread,

Dec 30, 2008, 4:18:16 PM12/30/08

to

>>> My biggest concern is that when metrics are applied target
shifts from finding software problems to satisfying the metrics.

There is a name for this "goal displacement" ....

http://shrinik.blogspot.com/2008/05/when-method-dictates-goal-see-goal.html

>>> Anyone please share your experience with applying the metrics in your
companies. How people reacted, what are mistakes one should avoid. And
how to “keep your believe” in software quality and not metrics graphs.

I must say - try to avoid side effects of metrics - humans have
tendency to orient themselves to what is being measured in such a way
that their work looks good in terms of what is being measured.

I wrote about it here....

http://shrinik.blogspot.com/2008/06/side-effects-of-metricsstatistics.html

A must read for all metrics people is paper from Dr Kaner and Bond :

http://www.kaner.com/pdfs/metrics2004.pdf

Inquiry metrics are better than control metrics --- meaning use
metrics to understand "what is happening" - first order measurements.
Control metrics tend to take step towards "controlling" or managing
things that that you don't have much understanding. Software systems I
believe are not yet ready for second order measurements - for details
read Jerry Weinberg's 3 volume series "Quality software management".

Shrini Kulkarni
Test consultant, India
http://shrinik.blogspot.com

Vladimir Trushkin

unread,

Dec 30, 2008, 4:30:14 PM12/30/08

to

On 30 дек, 12:58, Dima <dumitru.corobce...@gmail.com> wrote:
> Hi guys, I know that when you read the title you thought - another
> one...

>
> My problem is not with the actual metrics. I think I'll be able to
> define the right ones for me, which will answer my particular needs
> and not some "proved"/"best practice" programme/approach.
> My concern is with the people, how they would react to applying the
> metrics. My biggest concern is that when metrics are applied target
> shifts from finding software problems to satisfying the metrics. The

> bigger the company - the bigger the issue.

> I can't completely forget about the metrics, they are needed for going
> forward!
>
> Anyone please share your experience with applying the metrics in your
> companies. How people reacted, what are mistakes one should avoid. And
> how to "keep your believe" in software quality and not metrics graphs.
>
> cheers, Dumitru.

This is a great question and well proven concern on your side. Metrics
have a drawback related to indulging people to satisfy metrics rather
than doing a job. In that case, you need to not simply rely on a
metric that "is telling the truth" but to use it _as one of the
sources to take on a decision_. People will see that metrics are used
indirectly an will stop trying to manipulate it.

----
Best Wishes and Happy New Year,
Vladimir

dumitru.corobceanu

unread,

Dec 31, 2008, 6:16:23 AM12/31/08

to

> I am not sure what you mean by "applying metrics". Metrics just provide

Sorry for that, English is not my strongest point. I meant introducing
the metrics, starting to take “readings” of our process.

> data about processes. To be useful they need to be interpreted against
> some set of goals. If the goals are poorly defined, then the metrics
> will be of little use. Note that there is nothing to prevent you from
> defining your own local goals.

Completely agree, I think we are ok at the "defining our goals"

> In particular, if the metrics are to be used for process improvement,
> then the shop needs some sort of infrastructure for process improvement
> that can use the metrics data as input. The better that process, the
> more benefit from the metrics.

I know this works in theory, but I've seen this failing in practice.
While managers see the process improvement, actual code quality
suffers.

dumitru.corobceanu

unread,

Dec 31, 2008, 6:27:53 AM12/31/08

to

> Shrini Kulkarni
> Test consultant, Indiahttp://shrinik.blogspot.com

...lots to read, thanks.
I guess reading this kind of papers before brought this problem to my
attention. Then I started to see this all over around me, not just
software testing. As a result my first reply to metrics is always NO.
But you need them, you need them to improve.

Some new terms (for me at least):
“goal displacement” – I knew it is happening, almost inevitable; I
just didn’t have a name for it.
“Inquiry metrics“ and ”control metrics” – in theory everything sounds
good, but what is to stop the inquiry metrics to become control
metrics?

dumitru.corobceanu

unread,

Dec 31, 2008, 6:31:42 AM12/31/08

to

> sources to take on a decision_. People will see that metrics are used
> indirectly an will stop trying to manipulate it.

This is my hope as well. Is this really possible? Do you have a live
example, even if this is a "I heard from a friend who heard from his
friend"?

Shrinik

unread,

Dec 31, 2008, 9:03:52 AM12/31/08

to

>>>But you need them, you need them to improve.

I am afraid not ... You need them to tell a story of something about
software the story that you *want* to tell. As it is possible to with
metrics - you can just make any story that you like... with so many
parameters/attributes... there is lot of room for *making up* the
stuff. For example if a particular release of software was bad you can
say that people could not concentrate as there as "football" season
going on with key matches being played close to release dates where as
development manger might say " Due to last minute changes in software
requirements caused lots of problems". Developer may say that " it was
a bad release due to less time given to do reviews and they touched a
core functionality of the software".

One metric (say high number of post production defects or high defect
leakage rate) and so many stories...

If one is looking to manipulate (for good or bad reasons -- note that
manipulation can be done for good reasons also) events or results of
software projects ... your safest bet is metrics ... look no
further....

I see "improving by metrics" is a very remote possibility and I see
that as another way of manipulation. You can take any metric and
compare that with other collected at different time and make a story
of improvement with that ....You can show any "coincidental" effect or
improvement as something that happened due to metrics ... metrics
allow that ... they are very flexible.

>>>but what is to stop the inquiry metrics to become control metrics?

In simple terms, inquiry metrics help us to "understand things" - as
in understanding let us say "how human brain works or how aeroplane
flies or how a plant produces its own food". Where as Control metrics
let you to "modify" or improve or optimize the phenomenon for better.
In group dynamics situation - control metrics let you to take control
actions.

Control requires far deeper systems knowledge than inquiry....

Inquiry metrics cannot become control metrics because former is first
order measurement and latter is second order measurement. Software
world has not come to maturity level to create and use Second order
metrics

Shrini

S Perryman

unread,

Dec 31, 2008, 11:13:29 AM12/31/08

to

Shrinik wrote:

>>but what is to stop the inquiry metrics to become control metrics?

> In simple terms, inquiry metrics help us to "understand things" - as
> in understanding let us say "how human brain works or how aeroplane
> flies or how a plant produces its own food". Where as Control metrics
> let you to "modify" or improve or optimize the phenomenon for better.
> In group dynamics situation - control metrics let you to take control
> actions.

> Control requires far deeper systems knowledge than inquiry....

> Inquiry metrics cannot become control metrics because former is first
> order measurement and latter is second order measurement. Software
> world has not come to maturity level to create and use Second order
> metrics

You really aren't helping the OP here IMHO with the cryptic words.

There is a metric M which measures something.

The assumption is that M tells us something *useful* about something
(s/w, system, development process etc) . If it doesn't, gathering M is
pointless.

There is also an assumption that you can state that for two values of
M (call them MX and MY) , that MY is 'better' than MX.

The basic question is for those M that are being gathered, what is the
desired MY, and what is the current MX.

If we change activities in some way, and if MX is now moving towards
MY, we have evidence that the activity is improving those things that
M measures.

That aside, what M is the OP currently using, considering using etc.
And for what purpose.

Regards,
Steven Perryman

Shrinik

unread,

Jan 1, 2009, 7:55:08 AM1/1/09

to

Hi Steven,

>>>You really aren't helping the OP here IMHO with the cryptic words.

I am willing to explain the cryptic words ... any word can be treated
as cryptic depending upon the level of understanding of the parties in
the conversation. BTW ... what is OP here ...is not that cryptic? (it
is certainly for me)

>>> There is a metric M which measures something.

This is too generic .... usefulness and utility of M depends upon what
M is measuring (a temperature, blood pressure or just counting number
test cases etc) and how.

>>>The assumption is that M tells us something *useful* about something
(s/w, system, development process etc) .

From this point ... assuming that assumption holds good for a
situation, branching out happens into "inquiry" and "control". So,
what I am saying here is the "useful" information could be treated as
either answering a question related to "inquiry" or provide hints or
information to take actions (controlling). My opinion is that software
metrics are better used for inquiry purposes than "control" purposes.
Software metrics models need to mature to "control" level. They have
not yet.

>>>There is also an assumption that you can state that for two values of M

So you are talking about numbers like 56.78 and 45.910 right? The
problem of speaking about numbers merely to represent rich and
multidimensional items like "test cases" or "bugs", "code coverage"
etc they horribly simplify the world behind them ... they are like
squeezing ocean into a drop of water.

>>>There is also an assumption that you can state that for two values of
M (call them MX and MY) , that MY is 'better' than MX.

So taking above example MX=56.78 and MY=45.910 (assuming that more is
better) you can simply say that MX is better. You will say that right?
I am sure you would. Here is where the problem comes... simply
comparing two numbers on number scale ( grade 1 or 2 airthmatics) you
are dealing with mere numbers. These numbers could represent complex
things in software that are multidimensional.

Saying one number is better than other is simple thing but there is a
danger that that would trick you to ignore other parameters that
affect the numbers. Also the word "better" is interpretable ....
better with respect to what? whom? and when? "better is always context
dependent.

>>> The basic question is for those M that are being gathered, what is the desired MY, and what is the current MX.

This simplified model of current and desired - is precisely what I am
probing ... life is not that simple ... Question the model that is
behind "M" ... ask how solid it is .... dont jump guns ...

>>>If we change activities in some way, and if MX is now moving towards
MY, we have evidence that the activity is improving those things that
M measures.

It is like playing or experimenting with a phenomenon that is impacted
by say 10 variables and varying 5 parameters at a time. Often you
could be tricked to believe the certain trend in MX->MY is an
improvement there could other reasons that is making trend go in that
way and simple model of MX/MY or metric M can not capture those
elements ...
In summary ... if you are serious about improvement, use broad based
qualitative measures, lots of metrics and some subjective information
as well.

If you want to manipulate, show some improvement to your managers ...
Metrics are your best friend.

I hope you would not call what I have described here is not cryptic
again ... to OP (I do not know who is this OP?)

Shrini Kulkarni

dumitru.corobceanu

unread,

Jan 1, 2009, 12:44:56 PM1/1/09

to

On 31 Dec 2008, 14:03, Shrinik <Shri...@gmail.com> wrote:
> >>>But you need them, you need them to improve.
>
> I am afraid not ... You need them to tell a story of something about

> software the story that you *want* to tell. [...]

I'd like to disagree and agree at the same time :). They tell the
story and based on that you can improve. I mean this is the same as
software testing, testing in it self does not improve the software it
just "tells the story" about the software.

> >>>but what is to stop the inquiry metrics to become control metrics?
>
> In simple terms, inquiry metrics help us to "understand things" - as
> in understanding let us say "how human brain works or how aeroplane

To avoid to much to writing... give me an inquiry metric and I'll
show you that it may be used as a control metric

Michael Bolton

unread,

Jan 1, 2009, 1:31:50 PM1/1/09

to

> >>but what is to stop the inquiry metrics to become control metrics?
> > In simple terms, inquiry metrics help us to "understand things" - as
> > in understanding let us say "how human brain works or how aeroplane
> > flies or how a plant produces its own food". Where as Control metrics
> > let you to "modify" or improve or optimize the phenomenon for better.
> > In group dynamics situation - control metrics let you to take control
> > actions.
> > Control requires far deeper systems knowledge than inquiry....
> > Inquiry metrics cannot become control metrics because former is first
> > order measurement and latter is second order measurement. Software
> > world has not come to maturity level to create and use Second order
> > metrics
>
> You really aren't helping the OP here IMHO with the cryptic words.

Which words do you find cryptic?

> There is a metric M which measures something.
>
> The assumption is that M tells us something *useful* about something
> (s/w, system, development process etc) . If it doesn't, gathering M is
> pointless.
>
> There is also an assumption that you can state that for two values of
> M (call them MX and MY) , that MY is 'better' than MX.
>
> The basic question is for those M that are being gathered, what is the
> desired MY, and what is the current MX.
>
> If we change activities in some way, and if MX is now moving towards
> MY, we have evidence that the activity is improving those things that
> M measures.
>
> That aside, what M is the OP currently using, considering using etc.
> And for what purpose.

There are several words that I find cryptic here. "Better" is one.
"Purpose" is another. "Improving". "We".

It's not that I don't understand the ordinary, dictionary definitions
of these words. What I don't understand--and, in my observation, what
many people fail to perceive--is the missing half of the relationships
in these words. "Better" by what form of measurement? "Better" for
whom? "Better" meaning faster? Cheaper? Of higher perceived value
to some person? Which person?

This is the trap into that the Original Poster--the person who started
the thread, Shrini--seems to want to avoid.

To him, I would recommend the book "Measuring and Managing Performance
in Organizations", by Robert D. Austin, and "Quality Software
Management, Vol. 2: First Order Measurement" by Weinberg, in addition
to the paper by Kaner and Bond that Shrini cited. In addition, he
could check out the Wikipedia article on the Hawthorne Effect, and
follow the links off it.

In companies that I've visited that have dangerous metrics obsessions,
I've observed the following:

1) The first response to some kind of suggestion for making testing or
development more rapid is to stonewall because "the numbers won't come
out right" or "we won't be able to count X" where X is some goal
displaced from the goal of producing good software.
2) The more emphatic the response, the more I tend to hear,
concurrently or shortly afterwards, that the company is in some kind
of business or financial distress.

---Michael B.

Vladimir Trushkin

unread,

Jan 1, 2009, 4:56:02 PM1/1/09

to

On 31 дек 2008, 13:31, "dumitru.corobceanu"

No. I had a positive experience on my own. You just need to know what
the purpose behind the metric is and to never use it as only source of
information. What actually works is asking yourself why the metric you
have got is different, The cause may not be what you expected it to
be. For example, we used to measure unique failed tests to identify if
the quality of the software delivered to testing is becoming better.
We have obtained a serious improvment in this metric on one of our
projects. However, it was a very small in size and we discarded the
result as non-representative. Oue guesses were proven later on, on a
next full-sized project. We did not get the improvement we hoped. This
is a real life store but it gives a good example of how careful one
should be about "plain numbers". Do not just measure. Think what the
difference you see actually means.

----
Best Wishes,
Vladimir

Shrinik

unread,

Jan 1, 2009, 5:14:39 PM1/1/09

to

Responding to Dumitru ....

>>>>They tell the story and based on that you can improve.

No ....they (I suppose you meant metrics) do not tell stories ... They
can not ... they are simple lifeless numbers... But those lifeless
numbers have an amazing power. They will let you (empower) you, the
human tester (of for that matter anyone who wants to use them) to tell
a story that you want to tell. So the focus is on you not on the
metrics. Since you are "making up" the story, you can make up an
"improvement story". If you are careful and skeptical (like a tester)
you would soon notice the trap ... and avoid using metrics to take up
or suggest an improvement plan instead use other non metric,
qualitative measures (so called not_so_useful subjective measures) to
set up a broad based, well grounded improvement plan. Will you?

Alternatively, you can keep faith in metrics and spin any story of any
sort and set some on on improvement path. You can even succeed (on
most cases) to prove him/her that they have improved ...

Can you see the difference between these two ways of understanding/
dealing with metrics?

>>>testing in it self does not improve the software it just "tells the story" about the software.

A small rephrase .. testing does not tell a story- tester tells a
story (if he allowed to tell a story). In most cases, there metrics
are worshiped, a tester has an easy exit ... just give metrics that
manager asks for and let manager tell whatever story that he wants to
make out of them to his/her managers ....

>>> give me an inquiry metric and I'll show you that it may be used as a control metric.

I like the challenge ... before I get into that ... let me say few
words about what we are getting into ... As I have been repeating
(like a broken record) that "metrics" allows you to tell *any*
story ... that applies even to types of metrics that we are talking
about inquiry vs control ... metrics being metrics ... you can flip
them the you want. You might be looking to prove (or convince me) that
the categories like inquiry, control metrics do not exist ...or it
might be your own curiosity to check the difference yourself. I would
say you can (anybody can, for that matter) prove an inquiry metrics as
control metrics (that is the beauty of metrics). But devil is in the
details ...metrics being mere numbers taking control actions on the
basis of them - is like inviting the danger. Like I said, our software
measurement system, model of software has not matured to a degree that
you can make measurements on them and use them to take control
actions.

Anyway, here is a metric (may not be the best example) that I think as
an inquiry metric ... go for it and flip it as control metric (all the
best !!! )

Code coverage metric: Given a set of tests (some are unit tests in TDD
style and some manual test scripts), *this* code coverage tool
"supercoverage" has given a metric : 76.345 % code coverage
(consolidated - you can ask for line coverage, function coverage,
branch coverage, cyclomatic complexity etc).

76.345 % of code covered is the metric in question - a number.

I hope I have avoided "cryptic" terms ....

Shrini

Shrinik

unread,

Jan 1, 2009, 5:33:49 PM1/1/09

to

>>> This is a real life store but it gives a good example of how careful one
should be about "plain numbers". Do not just measure. Think what the
difference you see actually means.

I agree with you Vladimir .... Using metrics is some what like taking
actions based on drug trials. Let us say, one trial on experimenting
with role of antioxidants in fighting cancer. The drug trial might say
that taking specific does of some vitamin has reduced the risk of
cancer development in a sample of 30 healthy men selected from
different ethnic (hereditary) background. Other example may be about
study that says "Asian-African population is prone to diabetes or
heart diseases. Yet another example would be about a drug trial or
study that says that "men are shown to be more prone to develop kidney
stones than women".

Like these studies .. metrics make /collect data about happenings in
software and distill that information into some numbers... it is like
jumping from moon to earth ( high degree of simplification). So be
careful and doubt every metric that you are presented and ask ...what
could be the story(stories) behind the metric.

Software is pretty complex thing in itself ... add humans to it, there
comes huge set of variables like skill, culture, social and economic
considerations and that makes the whole scenario like an ecosystem.
One thing that you observe may be due to several parameters and their
combinations. One change that you observe might lead to several
complex ramifications.

If you have this big picture about software ( as a social-technical-
business-cultural system) - you would take metrics with "pinch of
salt" (meaning being skeptical about them).

Shrini

dumitru.corobceanu

unread,

Jan 2, 2009, 8:46:12 AM1/2/09

to

On 1 Jan, 22:14, Shrinik <Shri...@gmail.com> wrote:
[...]

> 76.345 % of code covered is the metric in question - a number.
>
> I hope I have avoided "cryptic" terms ....
>
> Shrini

I guess I took the problem from the “wrong end of the stick” (I hope
there is such expression).
“I read from metrics” – that’s why I was saying that metrics tells the
story. But it is actually me who “invents” the story based on my
knowledge (or lack of the knowledge) around those numbers.
Let me know if you agree with the fallowing:
[Metrics don’t tell a story. I read a story from the metrics.] Which
based on my personal knowledge may be correct or wrong.

Code coverage metric: 76.345 %.
I want that over the next few months we increased the coverage to 85%.
A test pack would be considered “acceptable” based on how much
coverage it does (And not on WHAT does it cover). This is the kind of
“outcome” I’m afraid of, that is want I’ll have to fight to avoid. I’m
afraid that after defining a set of inquiry_metrics, someone up there
will try to use them to control the work which is carried out.

tada...@yahoo.com

unread,

Jan 2, 2009, 9:15:57 AM1/2/09

to

There are process metrics and their are product metrics. Product
metrics are measures of the product.

For instance, reliability measured by the failure rate under an
operational profile is a product metric.

If you develop a system incrementally, so that there is a growing
operational system available early in development, then you can
measure reliability during development.

I have seen no problem with guiding a project by focusing on
incremental milestone dates and reliability measures. But I would
think there might be a problem with focusing too much on process
metrics.

The other metrics I have used are early project process metrics, like
the number of defects found during reviews. Used to drive process
improvement. But this is not a testing metric.

Shrinik

unread,

Jan 2, 2009, 11:09:07 AM1/2/09

to

>>>Metrics don’t tell a story. I read a story from the metrics- which based on my personal knowledge may be correct or
wrong

I think you are pretty close to what I am thinking .... Metrics can
not (instead of "don't") tell a story. Instead they allow anyone to
create any story they want to tell. You don't READ a story from metric
(that would imply there there is a story(or stories) embedded in
metrics by default or "prepackaged" - which is not the case) instead
you CREATE a story to explain the metrics. It is like interpreting
results of an experiment (though interpretting results of experiments
is more narrower and strict regime). I also would not use the term
"good" or "bad" ... there is nothing good or bad ... it is just the
way you look at the world/numbers. In my opinon there are no Good or
bad views ... they are just views .. nothing more than that.

>>>>This is the kind of “outcome” I’m afraid of, that is want I’ll have to fight to avoid. I’m afraid that after defining a set of inquiry_metrics, someone up there will try to use them to control the work which is carried out.

You are spot on ... that is why it is important to restrict metrics to
"inquiry" purpose than "control" purpose. That is why it is important
to spread the awareness of difference between "inquiry" and "control"
metrics. That is why it is important to prevent "some one up there" to
flip inquiry metrics to be used as "control metrics"... In reality
control metrics are dangerous and to be avoided ... How ... one easy
way is to talk about them, talk about the difference, spread the
awareness ...evangelize ...

Shrini

S Perryman

unread,

Jan 2, 2009, 1:08:43 PM1/2/09

to

dumitru.corobceanu wrote:

> On 1 Jan, 22:14, Shrinik <Shri...@gmail.com> wrote:

> [...]

>>76.345 % of code covered is the metric in question - a number.

>>I hope I have avoided "cryptic" terms ....

> I guess I took the problem from the “wrong end of the stick” (I hope

> there is such expression).
> “I read from metrics” – that’s why I was saying that metrics tells the
> story. But it is actually me who “invents” the story based on my
> knowledge (or lack of the knowledge) around those numbers.
> Let me know if you agree with the fallowing:
> [Metrics don’t tell a story. I read a story from the metrics.] Which
> based on my personal knowledge may be correct or wrong.

> Code coverage metric: 76.345 %.
> I want that over the next few months we increased the coverage to 85%.
> A test pack would be considered “acceptable” based on how much
> coverage it does (And not on WHAT does it cover). This is the kind of
> “outcome” I’m afraid of, that is want I’ll have to fight to avoid.

Extending your example :

So we get a metric (M1) of 85% . Which is better than the 76% .

Yet a metric (M2) defining the number of defects that escape into the field
beyond your test infrastructure has not changed from M2X, to M2Y (which is
what someone wants for M2) .

So the metrics in fact do "tell a story" .
Improving M1 had no effect on M2.
Or, increasing code coverage did not decrease the escaped defects.

Which would prompt me to look at the infrastructure (inputs, output tests
etc) on which M1 is being measured.

> I’m afraid that after defining a set of inquiry_metrics, someone up there
> will try to use them to control the work which is carried out.

This is all about the utility of the metrics you define.
And the fact that many metrics are useless/meaningless in isolation (there
are relationships between some) .

I suspect that I will soon discover how much so this is the case (I am
about to begin an assignment with a client, building systems that attempt
to gather reams of different metrics that can used via data analytics
toolsets to provide useful info for customers to control their telecoms
networks) ...

Regards,
Steven Perryman

Shrinik

unread,

Jan 2, 2009, 2:53:54 PM1/2/09

to

Responding to Perryman

>>.So we get a metric (M1) of 85% . Which is better than the 76%

That is the conclusion you make if you are only looking at numbers.
That is the power of metrics .. they let you to tell something without
probing the context ... it is like asking a kid ... which is better 5
chocolates or 8 chocolates. The choice would then be obvious ...
(unless there is a kid who hates chocolates or had in the recent past
undergone major dental treatment). Metrics will make your life
easier ... just know some basic airthmatics....and a high level
model ... if you are talking about defect density metric ... lower
number is better where as if you are talking about test coverage
higher number is better ....

>>> So the metrics in fact do "tell a story" .

A small correction... you created the story and you are telling it. I
might come up with another story or multiple stories for the same
metric condition. Note that all of us could be RIGHT or WRONG .... we
are looking at one number from different angles and telling a story
based on the view that we see. It is like a three people describing
the view of a building standing from three different directions. All
are right... all are describing the view that they see and believe
that theirs is correct...Metrics allow you engage in such multi
dimensional views.

>>> Improving M1 had no effect on M2. Or, increasing code coverage did not decrease the escaped defects.

Well, these two are stories that you have created out of metrics ...
they might be right or might not be ... unless another person comes up
with exactly contradicting story than yours ... That is how people
play with metrics..

>>>Which would prompt me to look at the infrastructure (inputs, output testsetc) on which M1 is being measured.
You appear to be taking a control action based on the metrics that you
observe and story that make out of ...

>>> This is all about the utility of the metrics you define.

Actually that is true about literally every metric that one can think
of ... Here is a heuristic ... "that is (interpretation of metrics in
multiple ways) is Universal truth about metrics". Can you change
that?

I think we all are going circles... can we get out of it and accept ..
metrics are plain numbers ... We MAKE stories and single metric can
have several hundreds of stories around it? Stories around metrics are
not embedded in metrics ... they are created by the people who define
and use (or abuse?) them.

How many agree with me?

Shrini

Tom Royer

unread,

Jan 2, 2009, 6:38:50 PM1/2/09

to

"Shrinik" <Shr...@gmail.com> wrote in message
news:26e1b9b0-1d75-4d49...@i18g2000prf.googlegroups.com...

I think that most are either coming to your view or already had it.

The key to metrics is to ensure that the metrics you choose are causally
related to the effect you really want to achieve. For example, most people
believe that simple code is better than complex code. The complexity of
module code can be measured by the Cyclometric Complexity (number of
independent paths through the code), that's not the whole story: There is
empirical evicence that the highest quality code (as measured by the number
of "escaped" errors) is actually a result of medium complexity code (neither
the simplest nor the most complex). That's why it's important to carefully
define the desired effects (end quality) and their relation to the selected
metrics.

A single number is like a speedometer reading in your car: what are you
trying to determine? If your goal is minimize the accident potential in
your car, you probably want to look for lower (but not too low) values of
speed. If you're looking to minimize drive time, you might opt for higher
(but not too high) values. Depends of what you want.
>
> Shrini
>

Shrinik

unread,

Jan 3, 2009, 4:44:13 AM1/3/09

to

Responding to Tom,

That is a great way to put it ... I especially liked the way you
compared to metric (of single number ) to speedometer reading ... I
will expand on that in a separate response ...

>>>The key to metrics is to ensure that the metrics you choose are causally related to the effect you really want to achieve.

I think there is more to it .... It is desirable to have relationship
that you would like to establish between a metric and a effect as more
formal and direct (than casual). More direct is better ....Also notice
this mapping of metric and the effect that one would like to relate
metric to - is a FORMAL way of saying "Here is a story that I want to
tell using this metric".

Talking about relationship between metric and effect ... what comes to
my mind is the model that relates these two. For example we are
talking about defect density. In general terms, a model of metric
would consist of "what we want to measure", "what we actually measure"
and a link or relationship between these two. So, we count the number
defects (not the action "count") and count (or calculate) "lines of
code" or "number of function points" or any measure that indicates
size (volume precisely) - then divide the first measure by the second
measure. This is our defect density metric - so many defects per lines
of code. How people would interpret the metric ... lower value means
better quality ... how low is low or how high is high ..... that is
context dependent.

Another example - as Tom pointed out ... people relate cyclomatic
complexity metric and code quality - lower the CC metric, better is
the code quality (by the way that would be an example of inquiry
metric getting used as control metric)
Are they related? So people measure CC metric thinking that it will
give them a measure of code quality ..how far they are correct? when
they are likely to go totally wrong?

This model of defect density assumes that counting number of defects
is related to underlying "faults" or errors in the code. How far this
is true? Also, is lines of code or functional points a reasonable
count of size of volume of the code?

I think key to effective metric is to understand the model behind the
metric and test it (scrutinize it) to determine how far the
relationship between what is being measured and what people think they
are measuring - is valid. Most importantly when relationship fails to
be valid...

Shrini

Tom Royer

unread,

Jan 3, 2009, 6:22:26 AM1/3/09

to

"Shrinik" <Shr...@gmail.com> wrote in message

news:242a100f-863e-4445...@p2g2000prf.googlegroups.com...

> Responding to Tom,
>
> That is a great way to put it ... I especially liked the way you
> compared to metric (of single number ) to speedometer reading ... I
> will expand on that in a separate response ...
>
>>>>The key to metrics is to ensure that the metrics you choose are causally
>>>>related to the effect you really want to achieve.
>
> I think there is more to it .... It is desirable to have relationship
> that you would like to establish between a metric and a effect as more
> formal and direct (than casual).

Perhaps my spelling betrayed me. When I said "causally", I meant "related
to the fundamental cause", not "informally."

Michael Bolton

unread,

Jan 3, 2009, 4:16:05 PM1/3/09

to

On Jan 2, 8:46 am, "dumitru.corobceanu" <dumitru.corobce...@gmail.com>
wrote:

How do you measure code coverage? By executing the application under
a code coverage tool? If so, note

- the tool likely tells you how much of the code was executed; it
doesn't tell you whether the code you covered was the
important code;
- the tool does tell you that some code was not executed while you
were testing, but again doesn't tell you anything about whether the
unexecuted code was important or not;
- the tool (typically) doesn't tell you about code that was (or
wasn't) executed that isn't currently under scrutiny. That is, the
tool doesn't tell you about operating system code, third-party
libraries, or your own libraries that haven't been instrumented by the
tool;
- the tool tells you about the code that was executed while you were
testing; it doesn't tell you what was observed or what was sought--
that is, it tells you only that the code was executed, but not whether
it was /tested/;
- the tool doesn't tell you about which conditions or values weren't
tested (although it may give you hints about that);
- the tool doesn't tell you about code that /was/ tested, but that
wasn't executed while the tool was running (that is, the tool doesn't
tell you anything about whether the code was reviewed or unit tested).

In addition to all that, I'm curious as to why you would want the
coverage number to be 85%, instead of 100%. Is there something
magical about 85%?

This is an excellent example of what Shrini was talking about with
respect to goal displacement; notice how quickly and how easily you've
chosen to use a control metric here. A control metric says "We have
to get that 76.345% number up to 85%," and begs the questions that
would be prompted by an inquiry metric:

- "Why do our tests seem to miss fully one-quarter of the code we're
supposed to be testing?"
- "Are we happy with that number because we feel that our review
process and our unit tests cover the remaining stuff just fine?"
- "Do we need some kind of special equipment or artificial tools to
cover the code that appears to be uncovered by our tests?"
- "What would we have to do to test the other 23.655% of the code?"
- "Are we fooling ourselves into believing that we're doing a good job
of testing, just because that number is around 3/4?"
- "What might remain uncovered even if we were to hit 100% code
coverage?"

---Michael B.

Michael Bolton

unread,

Jan 3, 2009, 4:28:54 PM1/3/09

to

On Jan 2, 1:08 pm, S Perryman <q...@q.net> wrote:

> Yet a metric (M2) defining the number of defects that escape into the field
> beyond your test infrastructure has not changed from M2X, to M2Y (which is
> what someone wants for M2) .
>
> So the metrics in fact do "tell a story" .
> Improving M1 had no effect on M2.
> Or, increasing code coverage did not decrease the escaped defects.

The numbers don't tell the story. Any story that they tell is
completely impoverished unless or until you choose to observe some
other factors. Maybe...

- the increased code coverage trapped more /important/ defects, such
that there were the same number of escaped defects, but they were less
serious
- the increased code coverage trapped /fewer/ important defects, such
that there were the same number of escaped defects, but they were more
serious
- the number of defects that /actually/ escaped this time was way up,
but the clients have given up on reporting them because our
responsiveness was so rotten
- the number of defects that /actually/ escaped this time was way
down, but the clients are reporting more less-serious problems because
our responsiveness was so great
- some M3, like the quality of work from the developers this time out,
cancelled out the effect of M2
- tech support is burying some of the incoming reports because they
found workarounds last time
- the tech support folk aren't even trying to solve most of the
problems, and they aren't burying reports like they used to
- it would be a good idea to ship with lots of problems because if we
don't ship at all, there's no revenue booked this quarter and the drop
in stock price sinks the company

(I'm only just getting started here.)

A handful of numbers can be used to launch a thousand stories if you
allow them to do what they want. Data is multivariate. If we're
looking to tell the truth, we need to recognize that the world is a
complicated place. Better to ask questions, many questions, before
drawing conclusions.

---Michael B.

Shrinik

unread,

Jan 3, 2009, 6:25:16 PM1/3/09

to

Notice that the items that Michael has explained until he got to a
stage to /actually/ start - are the stories that can be created
around a metric (number of escaped defects is low). One metric - many
stories. Which one you want to hear today?

- the increased code coverage trapped more /important/ defects, such
that there were the same number of escaped defects, but they were less
serious
- the increased code coverage trapped /fewer/ important defects, such
that there were the same number of escaped defects, but they were more
serious
- the number of defects that /actually/ escaped this time was way up,
but the clients have given up on reporting them because our
responsiveness was so rotten
- the number of defects that /actually/ escaped this time was way
down, but the clients are reporting more less-serious problems because
our responsiveness was so great
- some M3, like the quality of work from the developers this time out,
cancelled out the effect of M2
- tech support is burying some of the incoming reports because they
found workarounds last time
- the tech support folk aren't even trying to solve most of the
problems, and they aren't burying reports like they used to
- it would be a good idea to ship with lots of problems because if we
don't ship at all, there's no revenue booked this quarter and the drop
in stock price sinks the company

This is an example to illustrate that given a number (such as 78% code
coverage) or numerical comparison (still worse thing to do) of two
metrics ( In this cycle we had 4 defects escaping to production as
compared to only 2 defects in last release) - can be manipulated in
multiple ways.

So be CAREFUL .... if you are a METRIC CONSUMER (one who is required
to understand and do something about them) - take inquiry route ...
listen to many stories, encourage people to tell many stories ... ask
each group (dev, test, PM, support, sales) to interpret one metric ...
you will then have good data/information about something that you are
trying to understand....

If you are a METRIC CREATOR ...(one who defines the formula for
metric, one collects the data for it etc) then you don't have to
worry much ... just make sure you get your story correct and sound. If
possible imagine or think about stories other parties can come up
with... you might have to justify your story if other party attacks
your story ... But at end of the day both are stories ...

I wonder why anyone has not made mention of GQM model of Victor
Basili here ....

http://en.wikipedia.org/wiki/GQM
ftp://ftp.cs.umd.edu/pub/sel/papers/gqm.pdf

Shrini

S Perryman

unread,

Jan 4, 2009, 6:47:34 AM1/4/09

to

Michael Bolton wrote:

> On Jan 2, 1:08 pm, S Perryman <q...@q.net> wrote:

>>Yet a metric (M2) defining the number of defects that escape into the field
>>beyond your test infrastructure has not changed from M2X, to M2Y (which is
>>what someone wants for M2) .

>>So the metrics in fact do "tell a story" .
>>Improving M1 had no effect on M2.
>>Or, increasing code coverage did not decrease the escaped defects.

> The numbers don't tell the story. Any story that they tell is
> completely impoverished unless or until you choose to observe some
> other factors. Maybe...

[ snipped ... but read and acknowledged ]

> A handful of numbers can be used to launch a thousand stories if you
> allow them to do what they want. Data is multivariate. If we're
> looking to tell the truth, we need to recognize that the world is a
> complicated place. Better to ask questions, many questions, before
> drawing conclusions.

No argument with that. This is merely common sense (well it is to me) .

As I stated at the outset, we gather metrics on the assumption that they
tell us something useful.

When a metric doesn't do so, cease collecting it.
When you suspect they might be telling you something, but are not so
sure what/why (relationships etc) , do due diligence where possible.

But I am not going to not gather metrics solely because tis possible
for people to make "stories" out of them for their own mis-guided or
disingenuous means.

That aside, I would be interested to know what metrics experienced test
teams are using to try to give some objective indication about the
effects of their work.

Regards,
Steven Perryman

dumitru.corobceanu

unread,

Jan 4, 2009, 8:29:43 AM1/4/09

to

The point of the exercise was to prove that one can transform a
inquiry metric into a control one, and do it badly! Main idea was to
highlight that the focus shifts to "the number" instead of the code
quality. All the things you pointed out were missed intentionally, to
highlight my concern with introducing metrics. As I said in the post:
"[...]This is the kind of “outcome” I’m afraid of,[...]"

Shrinik

unread,

Jan 4, 2009, 9:13:30 AM1/4/09

to

Responding to Perryman

>>>As I stated at the outset, we gather metrics on the assumption that they tell us something useful.

It would be more appropriate to say "We gather metrics with assumption
that WE WOULD BE (not metrics themselves) ABLE TO TELL some USEFUL
STORY to others (stakeholders)". I think that is key difference that
we should understand firmly if we want to define and use metrics to
their best. It is dangerous to believe that metrics tell story ...
because 1) they can tell story (no story embedded in them at all) 2)
it will put a blind belief on metrics and other people take advantage
of it.

>>>When a metric doesn't do so, cease collecting it.

In other words ... stop collecting metric if the story that you would
like to make out of it is not plausible or story can not stand the
scrutiny of others or others have (can make) better story of this
metric .. so let them....
See, the moment you start thinking that *you* are in the center of the
whole thing not the metrics - you create metric (model, theory and/or
formula), you collect data, you make a story or you junk a story or
you support a story that others have created on metrics that you have
defined - the whole world changes.....

>>> But I am not going to not gather metrics solely because tis possible for people to make "stories" out of them for their own mis-guided or disingenuous means.

That is fair enough... and indicates your firm belief that no matter
what happens you will keep continuing create, use metrics. In your
context ... that might be a wise thing to do ... But I would like to
mention one thing.... that is social, cultural and group dynamics
issues that no matter how good is your metrics (in your own belief
system) - people will make stories about metrics that are often in
opposite directions.....That is the power metrics give you (the
user)...

Shrini

Jorgen Grahn

unread,

Apr 16, 2009, 11:05:18 AM4/16/09

to

On Thu, 1 Jan 2009 04:55:08 -0800 (PST), Shrinik <Shr...@gmail.com> wrote:
> Hi Steven,
>
>>>>You really aren't helping the OP here IMHO with the cryptic words.
> I am willing to explain the cryptic words ... any word can be treated
> as cryptic depending upon the level of understanding of the parties in
> the conversation.

> BTW ... what is OP here ...is not that cryptic? (it
> is certainly for me)

OP means "original poster": whoever asked the question and started the
thread. It is fairly well known. Slightly less known than "IMHO" and
"BTW", maybe.

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!