Testdriven Administration vs. Monitor Driven Administration?

161 views
Skip to first unread message

Patrick Debois

unread,
Mar 17, 2009, 5:41:18 AM3/17/09
to agile-system-...@googlegroups.com
While making an analogy between the software development and system administration, I often hear people compare software testing with monitoring in the sysadmin world.

I agree that monitoring is a crucial part of being on top of the situation: typically we would have

Unit tests
  • thresholds on cpu, disk , network usage
  • check for processes
  • check validity of config files
  • check logfiles for warnings
  • monitor security ports, checksum of files
Functional tests
  • send a test mail
  • call a webpage and check the http status
  • send a query to the database
Integration tests
  • login, post a test entry, ...
  • login, send a test mail and see if it arrives
(do these categories seem right as a mapping?)

In case of alarm they would page, send a mail, send an SNMP trap, to tell us what is happening.
In production we can often only do readonly tests and scenario's that are non destructive.
If this is the only thing you would do as a sysadmin , this is a rather fatalistic approach.

Therefore IMHO an agile sysadmin should go beyond this 'monitor' approach,
within the test environment he should actively develop scenario's and test them.

Let's say you enable a raid file system:
  • Behavior testing:  As a user I want to read and write files from /data
    • Scenario: given a disk has crashed, i want to have no problem writing it to /data
      • A Fixture would be:  dd if=/dev/null of=/dev/hda1
    • Scenario: given a disk is full, i don't want to loose my data
      • A Fixture would be: write a large diskfile to disk is full
    • Scenario: given a heavy load , i don't want to loose data
      • A Fixture could be: a lot of fork and exec simulating heavy load
    • Scenario: if network connection is interrupted no data must be lost in the application
      • A mocking could be: using iptables to simulate the network failure
These scenario's tell a better truth then df -k or an snmp trap not?

What are you people testing before you are confident that a system is 'under control'?
Or is this still adhoc testing after an automated install?



Gildas Le Nadan

unread,
Mar 17, 2009, 1:45:34 PM3/17/09
to agile-system-...@googlegroups.com
Patrick Debois wrote:

Hi Patrick :)

> While making an analogy between the software development and system
> administration, I often hear people compare software testing with
> monitoring in the sysadmin world.
>
> I agree that monitoring is a crucial part of being on top of the
> situation: typically we would have
>

> /Unit tests/
>
> * thresholds on cpu, disk , network usage
> * check for processes
> * check validity of config files
> * check logfiles for warnings
> * monitor security ports, checksum of files
>
> /Functional tests
> /
>
> * send a test mail
> * call a webpage and check the http status
> * send a query to the database
>
> /Integration tests//
> /
>
> * login, post a test entry, ...
> * login, send a test mail and see if it arrives


>
> (do these categories seem right as a mapping?)

Yep for the integration test.

I'm not sure of the granularity for the unit tests vs functional tests.
Your implied rule seem to be system level vs. application level?

Some examples are definitely in the great area...

> In case of alarm they would page, send a mail, send an SNMP trap, to
> tell us what is happening.
> In production we can often only do readonly tests and scenario's that
> are non destructive.

Agreed, but stay aware that read-only != destructive.

You can have scenarios that do changes (say, a transaction on a website
for a dummy customer).

> If this is the only thing you would do as a sysadmin , this is a rather
> fatalistic approach.
>
> Therefore IMHO an agile sysadmin should go beyond this 'monitor' approach,
> within the test environment he should actively develop scenario's and
> test them.
>
> Let's say you enable a raid file system:
>

> * Behavior testing: As a user I want to read and write files from /data
> o Scenario: given a disk has crashed, i want to have no


> problem writing it to /data

> + A Fixture would be: dd if=/dev/null of=/dev/hda1

I'm not sure what you mean in that section, but I think the example
should be
dd if=/dev/zero of=/dev/hda1
or
dd if=/dev/random of=/dev/hda1

> o Scenario: given a disk is full, i don't want to loose my data
> + A Fixture would be: write a large diskfile to disk is full
> o Scenario: given a heavy load , i don't want to loose data
> + A Fixture could be: a lot of fork and exec simulating
> heavy load
> o Scenario: if network connection is interrupted no data must


> be lost in the application

> + A mocking could be: using iptables to simulate the


> network failure
>
> These scenario's tell a better truth then df -k or an snmp trap not?
>
> What are you people testing before you are confident that a system is
> 'under control'?
> Or is this still adhoc testing after an automated install?

I would prefer doing automated testing.

On new hardware, I would run subsystem stability tests: continuous
bonnie++ to stress i/o, cpuburn to stress cpu, memtest to stress memory,
smartctl to test harddrives and so on to allow early detection of
hardware issues

On new production software I'd prefer in a perfect world:
- load testing with either simple or advanced tests scenarios
- optionally using fuzzing tools to test robustness.

I like your ideas of automating failures with iptables for instance if
you want to test configurations with failovers, but I think it's not
easy to configure so it can be used "unattended".

HTH
Gildas

Paul Nasrat

unread,
Mar 18, 2009, 6:10:22 AM3/18/09
to agile-system-...@googlegroups.com
2009/3/17 Patrick Debois <patrick...@gmail.com>:

> I agree that monitoring is a crucial part of being on top of the situation:
> typically we would have

>
> Unit tests
>
> thresholds on cpu, disk , network usage
> check for processes
> check validity of config files
> check logfiles for warnings
> monitor security ports, checksum of files
>
> Functional tests
>
> send a test mail
> call a webpage and check the http status
> send a query to the database
>
> Integration tests
>
> login, post a test entry, ...
> login, send a test mail and see if it arrives
>
> (do these categories seem right as a mapping?)

I quite like thinking about the tests in terms of what they provide
both in terms of quality and confidence they provide you with. For a
development persepctive there are one set of definitions and a nice
diagram of quality/feedback here:

http://www.mockobjects.com/book/tdd-introduction.html#feedback-from-tests

I've seen a lot of people struggle with the naming of what Nat & Steve
call Integration Tests and Acceptance Tests (Functional Tests,
Customer Tests, System Tests).

To a development team Integration Tests would be tests that exercise
the integration with other systems, which map to your Functional Test
category and your Integration Tests would map to Acceptance Tests.

It's important to think about what the different test levels give you
and how that maps to operations/systems engineering. Also we're
hitting the fact you illustrated that testing and monitoring are
conflated.

For development internal quality of code driven by unit tests enables
ease of understanding and ease of change.

Your BDD style example captures intent much more and would allow you
to potentially change the implementation - eg I've seen cases where
things such as kernel parameters for tuning say nfs get baked into a
system and they calcify, no-one remembers the exact criteria for
success, and worse they may vary between systems. Having a set of
tests focussed on that component allows you to say things like I
expect a throughput of X so that ... Then if you change your backend
storage to a filer or another distributed filesystem you can ensure
you meet the requirements, and remove legacy configuration settings.

However in a lot of these cases we'll be testing around the
interfaces/edges of systems, and I'm not convinced they are unit
tests. Really they are the acceptance tests of a particular story
around system performance which will usually have some business value
attached as in your scenarios. To me we'd also probably be writing
unit tests at the implementation level, asserting that the right
configuration is there.

One thing that strikes me is that unlike developing in OO it isn't
always obvious what the owner for the behaviour is and also how to
link the test to the implementation language (which will often end up
being config). If you're writing systems tests in this fashion how are
you finding refactoring or dealing with changing requirements?

Paul

Patrick Debois

unread,
Mar 19, 2009, 7:49:21 AM3/19/09
to agile-system-...@googlegroups.com

> I'm not sure what you mean in that section, but I think the example
> should be
> dd if=/dev/zero of=/dev/hda1
>
Spot on!

>> I would prefer doing automated testing.
>>

>> ...


>>
>> I like your ideas of automating failures with iptables for instance if
>> you want to test configurations with failovers, but I think it's not
>> easy to configure so it can be used "unattended"

Off course you can't test everything, and you have to take a good
balance between the effort you spend and the benefit you get.

Unattended requires a good deal of automation, I agree. Virtualization
can help to get control over this , but it does not have to restrict you.
Rebooting or other stuff might be done via the Lights Out Management
Interface. Network interfaces by scripting the config of a cisco router ...

Did you have a specific scenario in mind of a difficult to configure
scenario? I like a challenge ;-)

Gildas Le Nadan

unread,
Mar 19, 2009, 8:14:35 AM3/19/09
to agile-system-...@googlegroups.com
Patrick Debois wrote:
>>> I would prefer doing automated testing.
>>>
>>> ...
>>>
>>> I like your ideas of automating failures with iptables for instance if
>>> you want to test configurations with failovers, but I think it's not
>>> easy to configure so it can be used "unattended"
> Off course you can't test everything, and you have to take a good
> balance between the effort you spend and the benefit you get.
>
> Unattended requires a good deal of automation, I agree. Virtualization
> can help to get control over this , but it does not have to restrict you.
> Rebooting or other stuff might be done via the Lights Out Management
> Interface. Network interfaces by scripting the config of a cisco router ...
>
> Did you have a specific scenario in mind of a difficult to configure
> scenario? I like a challenge ;-)

Sure, you can probably automate everything, but my point was that some
things might cost you so much it's not worth it.

For instance if you want to test all the possible failure cases that
might happen in an highly-available distributed solution, you can spend
an awful lot of time. (Of course, the more complex it is the more you
have to test, hence we're back to the KISS paradigm.)

I think the low hanging fruit come from the automated integration tests:
checking that after the deployment of a new software version, it works
as intended. There, high level tests such as playing a set of scenarii
once, then in a load testing fashion is probably your best bet.

Hence we're back to the idea that continuous integration tests are the
next step for Ops if you want to gain in maturity, when considering an
environment where you integrate something that is developed in-house.
This is quite a restricted field of application...

Stuff like validation of Operating Systems, hardware and so on can
probably benefit from a subset of this (think performance tweaking of an
OS for instance as was mentioned earlier, checking that it's still valid
after a SP/new version of this OS, or with the new SAN/network
switches /etc is probably worth it) but are imho more often a one shot
manual operation. Checking the performances of your SAN storage when in
degraded rebuild mode make sense of course, but it's far from being easy
to automate.

And I noticed that even in the limited context of continuous
integration, you haven't mentioned the data migration part yet... ;)

Cheers,
Gildas

Patrick Debois

unread,
Mar 19, 2009, 8:36:33 AM3/19/09
to agile-system-...@googlegroups.com

> Sure, you can probably automate everything, but my point was that some
> things might cost you so much it's not worth it.
>
> For instance if you want to test all the possible failure cases that
> might happen in an highly-available distributed solution, you can spend
> an awful lot of time. (Of course, the more complex it is the more you
> have to test, hence we're back to the KISS paradigm.)
>
>
Yep, I've seen clustered being removed and being replaced by rapid
re-deployment.

> I think the low hanging fruit come from the automated integration tests:
> checking that after the deployment of a new software version, it works
> as intended. There, high level tests such as playing a set of scenarii
> once, then in a load testing fashion is probably your best bet.
>
> Hence we're back to the idea that continuous integration tests are the
> next step for Ops if you want to gain in maturity, when considering an
> environment where you integrate something that is developed in-house.
> This is quite a restricted field of application...
>
I don't think it has to do with developed in our out-house.

> Stuff like validation of Operating Systems, hardware and so on can
> probably benefit from a subset of this (think performance tweaking of an
> OS for instance as was mentioned earlier, checking that it's still valid
> after a SP/new version of this OS, or with the new SAN/network
> switches /etc is probably worth it) but are imho more often a one shot
> manual operation.
You might see is as a one shot, still when you want to keep with
security patches for the OS, database, middleware, framework.
I would most certainly want to have more testing available, in or out-house.

I feel one of the reasons we don't patch that often because we are
afraid the impact of the patches.
It used to be like that in development with deployments too. Don't
deploy often because you break things.
And they have succeed using the deploy often now because they have been
investing in tests.
So maybe sysadmins should do the same?


> Checking the performances of your SAN storage when in
> degraded rebuild mode make sense of course, but it's far from being easy
> to automate.
>
> And I noticed that even in the limited context of continuous
> integration, you haven't mentioned the data migration part yet... ;)
>

Ah, you got me here ;-)

Patrick Debois

unread,
Mar 19, 2009, 9:17:11 AM3/19/09
to agile-system-...@googlegroups.com
Paul,

thanks for the pointer, it really helped me visualize the difference
between these tests and that got me thinking:

Why did i say that df -k , cpu are more on internal quality then
external quality?
Well in our case we have a multiserver setup with loadbalancers in front
of it, so having a high CPU and disk usage on one server
might not have an impact on the actual users, so i figure it has to do
with internal quality more.

I agree that all changes should ideally happen based upon a user story.
But as you mentioned, who is our user?
Is it the application (requiring a place to be run) , is it the end user
(having an infrastructure that he can reach), is it the developers (who
want to deploy their app)?
User stories don't always come from the project mode, but this might be
tickets coming in from the endusers or security, bugfixes patches coming
in from vendors.


On your question on refactoring, would you consider the following cases
refactoring?

During Project Mode:

Say you want to start a new application, developers start right a way.
Often you only have a small piece they can use to develop against, while
the actual hardware is being ordered.
I often have seen the Big Design Upfront syndrome within the sysadmin
group: we don't release the environment to development unless it is
completely finished.
This is totally wrong ! Both sysadmin and developer can learn by doing
their first deployment in a not so finished environment.
So you start with one server (doing web, db, dns, firewall, ldap, ...,
backup, ..) and step by step when the hardware becomes available, you do
the migrations.
You switch from host files to DNS, you setup a dedicated firewall
instead of iptables on the box, a dedicated router instead of vlans on
the linux box.
Two network interfaces instead of one for bonding, teaming.
So you change you environment interatively instead of incrementaly.

During Production Mode:

If you have a lot of shared services such as mail, imap, dns, firewall,
proxy, changes are that you have to refactor your environment to
accommodate changes.
Changing Ip ranges, host names, Routings, patches all introduce changes.
Let's say you have to move a website to a bigger server, or newer
hardware. You are actually changing the environment. And similar to TDD,
the important thing is that things after the change keep on working again.

Paul Nasrat

unread,
Mar 19, 2009, 9:17:22 AM3/19/09
to agile-system-...@googlegroups.com
>> Hence we're back to the idea that continuous integration tests are the
>> next step for Ops if you want to gain in maturity, when considering an
>> environment where you integrate something that is developed in-house.
>> This is quite a restricted field of application...
>>
> I don't think it has to do with developed in our out-house.

Most of my experience has been supporting bespoke/in house development
teams. Sadly a lot of operations teams I've seen don't even use source
control for configs/scripts let alone config management. Simple
disciplined practices can help but a lot of teams feel very swamped by
the day to day firefighting.

>> Stuff like validation of Operating Systems, hardware and so on can
>> probably benefit from a subset of this (think performance tweaking of an
>> OS for instance as was mentioned earlier, checking that it's still valid
>>   after a SP/new version of this OS, or with the new SAN/network
>> switches /etc is probably worth it) but are imho more often a one shot
>> manual operation.
> You might see is as a one shot, still when you want to keep with
> security patches for the OS, database, middleware, framework.
> I would most certainly want to have more testing available, in or out-house.

In some ways a lot of the fear and walls that get put up between
operations and development teams and operations and vendor patches is
due to being burnt in the past. If instead we embrace failure and
volatility into our processes and try and figure out smart practices
and principles that let us deal with the inherent complexity of
systems I think that's going to drive out a set of good practices we
can apply.

On a side note do most people have a test environment for operations,
or a development one for that matter?


> I feel one of the reasons we don't patch that often because we are
> afraid the impact of the patches.
> It used to be like that in development with deployments too. Don't
> deploy often because you break things.

Which really freezes the businesses ability to get to market quickly.

Obviously there is a business risk and costs/benefits involved with
patching systems. If understanding if a system is working correctly is
an expensive, time consuming process (eg a bank certifying a
particular stack) then you don't want to do it often. If we can
somehow continually measure and adjust at low cost a system, whilst
keeping it operational then that concern goes away.

There are a lot of interesting solutions like degrading applications
gracefully (as John Allspaw mentions in his book and here
http://highscalability.com/how-succeed-capacity-planning-without-really-trying-interview-flickrs-john-allspaw-his-new-book),
AB testing, incremental test deploy, etc. Is it feasbile to use these
on our supporting systems? I think for some organisations operations
is a "secret sauce" but in others that might not be a suitable model.

> And they have succeed using the deploy often now because they have been
> investing in tests.

Indeed, if something is risky do it more :)

The key here is fail fast (and at the appropriate place) and have a
good feedback cycle. If you don't adopt that then you risk failures
too late which cost more (production is a hard place to fix things).

>> And I noticed that even in the limited context of continuous
>> integration, you haven't mentioned the data migration part yet... ;)
>>
> Ah, you got me here ;-)

When we think about deployment of development applications, we also
have configuration and data going along side that. There are some
strategies for handling this - using things like dbdeploy, django
evolution, to manage schema changes for an application. I'd like to
see data be more of a first class citizen in a fully agile operations
team, from logfiles, to application data, there is a startling amount
of information about systems that gets ignored and that's wasteful.

Some problems are harder to test, but using strategies such as mocks
we can certainly test out exotic failure cases that are much harder to
setup normally.

Paul

Gildas Le Nadan

unread,
Mar 19, 2009, 9:39:55 AM3/19/09
to agile-system-...@googlegroups.com
Patrick Debois wrote:
>
>> Sure, you can probably automate everything, but my point was that some
>> things might cost you so much it's not worth it.
>>
>> For instance if you want to test all the possible failure cases that
>> might happen in an highly-available distributed solution, you can spend
>> an awful lot of time. (Of course, the more complex it is the more you
>> have to test, hence we're back to the KISS paradigm.)
>>
>>
> Yep, I've seen clustered being removed and being replaced by rapid
> re-deployment.

You lucky :) This is what I've been preaching for for a while now but
never succeeded in seeing it actually implemented...

I agree there. Still I am saying this is still sci-fi :) Think
incremental! Every single place I've worked at weren't even monitoring
all their systems for a start...

There's lot to do and I think there's the need for a big paradigm shift
in the industry before we can get hardware/software that enables you to
build a higher level of maturity because they are properly tooled to
help you do so.

>> Checking the performances of your SAN storage when in
>> degraded rebuild mode make sense of course, but it's far from being easy
>> to automate.
>>
>> And I noticed that even in the limited context of continuous
>> integration, you haven't mentioned the data migration part yet... ;)
>>
> Ah, you got me here ;-)

This is maybe because Ops alone is not the right level there and this
imply synergy between Ops and dev?

Cheers,
Gildas

Paul Nasrat

unread,
Mar 19, 2009, 9:47:23 AM3/19/09
to agile-system-...@googlegroups.com
> Paul,
>
> thanks for the pointer, it really helped me visualize the difference
> between these tests and that  got me thinking:

Not a problem, it'd be nice to get a wiki or something for this group
so that we can throw up some pictures, more persistent examples, etc.

For kicks we could even try and do it as an exercise in
collaboratively trying to setup a service in an agile way :)

> Why did i say that df -k , cpu are more on internal quality then
> external quality?

> Well in our case we have a multiserver setup with loadbalancers in front
> of it, so having a high CPU and disk usage on one server
> might not have an impact on the actual users, so i figure it has to do
> with internal quality more.

This is interesting, as just by choosing to think about these
qualities like this there is a requirement around user availability.
In some ways that story leads us in the direction of horizontal
scalability as a design decision (or even shared nothing) as we apply
that criteria to growing capacity.

Trying to think about the why of each of these things really helps me
think about the system in different ways.

> I agree that all changes should ideally happen based upon a user story.
> But as you mentioned, who is our user?
> Is it the application (requiring a place to be run) , is it the end user
> (having an infrastructure that he can reach), is it the developers (who
> want to deploy their app)?

I think we have stories from all those users/stakeholders (and also we
generate stories for development too). I'd probably not say the
application directly as there are usually more human stakeholders who
gain benefit from it running.

> User stories don't always come from the project mode, but this might be
> tickets coming in from the endusers or security, bugfixes patches coming
> in from vendors.

This is something I'm struggling with - trying to think of a better
way of managing the interrupt driven work. I need to look at some of
the Lean approaches to this some more.

> On your question on refactoring, would you consider the following cases
> refactoring?
>
> During Project Mode:
>
> Say you want to start a new application, developers start right a way.
> Often you only have a small piece they can use to develop against, while
> the actual hardware is being ordered.

This is one of the key things that differentiates systems/operations
work with development, large lead times outside your control.
Obviously there are strategies to help deal with easier provisioning
such as virtualisation, cloud computing, etc.

Again if we put the focus on failing fast and lowering the cost of
change we can and should do something pragmatic, but it might also
require saying there are some risky areas that we want to prioritise
first so that there is a greater chance of success.

> I often have seen the Big Design Upfront syndrome within the sysadmin
> group: we don't release the environment to development unless it is
> completely finished.

Yes that's not uncommon.

> This is totally wrong ! Both sysadmin and developer can learn by doing
> their first deployment in a not so finished environment.
> So you start with one server (doing web, db, dns, firewall, ldap, ...,
> backup, ..) and step by step when the hardware becomes available, you do
> the migrations.
> You switch from host files to DNS, you setup a dedicated firewall
> instead of iptables on the box, a dedicated router instead of vlans on
> the linux box.
> Two network interfaces instead of one for bonding, teaming.
> So you change you environment interatively instead of incrementaly.

To zoom out to your question, yes any of these could be refactoring,
but without concrete practical details its hard to say if they are. In
theory you could have tests for something and only change the
internals without changing functionality.

I'm growing more and more fond of the red green refactor process as a
set of continual small steps to make the system better. I was talking
with a colleague about this yesterday in the context of development,
but I think that we should be in some way striving for clean,
understandable systems constantly.

> During Production Mode:
>
> If you have a lot of shared services such as mail, imap, dns, firewall,
> proxy, changes are that you have to refactor your environment to
> accommodate changes.

> Changing Ip ranges, host names, Routings, patches all introduce changes.
> Let's say you have to move a website to a bigger server, or newer
> hardware. You are actually changing the environment. And similar to TDD,
> the important thing is that things after the change keep on working again.

There are probably some common patterns here (migrate service, rename
system, etc) possibly we often try and do too much in one step, rather
than a set of small changes. Given your example of moving to a new
server we can probably break this down into small steps that bring us
one step closer to that without changing the system. There probably
needs to be some thinking about good common techniques that can be
shared for this for systems teams ala Fowlers' refactoring.

To me you can't be confident in making these changes without having
some way of testing that part of the system is operating.

Paul

Gildas Le Nadan

unread,
Mar 19, 2009, 9:59:46 AM3/19/09
to agile-system-...@googlegroups.com

I so very much agree with you there!

I'm so glad I can meet yet another person that shares my point of view
on those matters.

>>> And I noticed that even in the limited context of continuous
>>> integration, you haven't mentioned the data migration part yet... ;)
>>>
>> Ah, you got me here ;-)
>
> When we think about deployment of development applications, we also
> have configuration and data going along side that. There are some
> strategies for handling this - using things like dbdeploy, django
> evolution, to manage schema changes for an application. I'd like to
> see data be more of a first class citizen in a fully agile operations
> team, from logfiles, to application data, there is a startling amount
> of information about systems that gets ignored and that's wasteful.

My gut feeling there is that the data migration needs to be considered
very early in the application conception, this is why I mentioned that I
feel they are out of the scope of Ops team alone.

I haven't thought of the problem of the logfiles and so on but you're
right again, that's also a need. Maybe we can think of different
typology of data sets and describe basic migration schemes for those
different typologies?

> Some problems are harder to test, but using strategies such as mocks
> we can certainly test out exotic failure cases that are much harder to
> setup normally.
>
> Paul

Ok but how would you define that something is definitely not worth
testing? (i-e the cost of testing is way too high when considering the risk)

Gildas

Paul Nasrat

unread,
Mar 19, 2009, 10:32:53 AM3/19/09
to agile-system-...@googlegroups.com
>> Indeed, if something is risky do it more :)
>>
>> The key here is fail fast (and at the appropriate place) and have a
>> good feedback cycle. If you don't adopt that then you risk failures
>> too late which cost more (production is a hard place to fix things).
>>
>
> I so very much agree with you there!

> I'm so glad I can meet yet another person that shares my point of view
> on those matters.

I'm glad, one of the things I've found hard is that often it's hard to
find people to discuss the ideas surrounding this. It feels like we're
getting to a place where communities can form around this.

> My gut feeling there is that the data migration needs to be considered
> very early in the application conception, this is why I mentioned that I
> feel they are out of the scope of Ops team alone.

That's possibly true, although as with a lot of things, having a
collaborative cross functional team working towards the issues that is
inclusive of operations will be imporatnt.

> I haven't thought of the problem of the logfiles and so on but you're
> right again, that's also a need. Maybe we can think of different
> typology of data sets and describe basic migration schemes for those
> different typologies?
>
>> Some problems are harder to test, but using strategies such as mocks
>> we can certainly test out exotic failure cases that are much harder to
>> setup normally.

> Ok but how would you define that something is definitely not worth


> testing? (i-e the cost of testing is way too high when considering the risk)

Not wishing to sound flippant but I think this will end up being
something that is a pragmatic decision by the team.

This happens in developer testing too, particularly around integration
and acceptance testing I often have conversations with pairs about a
particularly gnarly integration test and it can be that sometimes it's
not needed, if you have a known interface (eg say SMTP) and a way of
testing using a stub, then a full end to end test may not be
necessary.

Discussing the value of a test and making a call should be part or the
process. A lot of these approaches applied to operations are just
emerging or at least just starting to consolidate. As such, you'll
probably find teams adopting practices over testing (cf getter/setter
testing when you are learning TDD as a developer), this is natural and
part of the learning process. If you're running a team this way you'd
probably start out with a set of rules, but part of agile processes is
figuring out what will work for you. I've just been reading Pragmatic
Thinking & Learning, which is probably influencing this answer, but
you should find that overtime peoples intuition for that call will
improve.

Paul

Reply all
Reply to author
Forward
0 new messages