Random test data

165 views
Skip to first unread message

Imre Kószó

unread,
Sep 25, 2013, 12:52:21 PM9/25/13
to clean-code...@googlegroups.com
Hi,

I would like to know your opinion about randomizing test data. I have been doing this for quite long now in most tests I write.

Let's see a silly example:
Say we have an Order entity with various details including a list of Item entities. We are writing a test for the "get all orders that have at least one item with a unit price greater than 100 (configurable)" use case interactor. Of course the interactor would not return the entities but some sort of DTO-ish representation of them.

What I usually see in something like this are:
- there is a configurable threshold where we divide orders
- orders not matching the criteria should not be returned
- all orders matching the criteria should be returned
- all properties of DTOs returned should match properties of their respective entities

Now when writing the test for this I would usually randomize (within limits) the following stuff:
- the threshold the interactor makes the decision based on
- number of orders the gateway returns to the interactor
- all the properties of orders
- number of items for each order
- all the properties of items

I would make sure that for each test case I'm writing, there is a certain set of test data that represents the 'valid' data for that case so that I can then compare their details against the returned values.
I would also make sure that the test is structured nicely and reads well.

On the other hand, when I look at examples, books, training material, even the Clean Code videos, test data is usually static and in most cases, values are hardcoded. I tend to not like that but I might be wrong, which is why I'm asking the community.

What's your take on this? Is this an overkill? Are there any guidelines?

Thanks,
Imre

Paolo Laurenti

unread,
Sep 25, 2013, 2:10:42 PM9/25/13
to clean-code...@googlegroups.com
Hi,

i didn't get what's the value you have randomizing data.
When I write use cases tests, I use a set of data that rappresent a valid example of the scenario that I am exercising.
I don't see any value randomizing them. Am I wrong?

My 2 cents

Paolo

--
The only way to go fast is to go well.
---
You received this message because you are subscribed to the Google Groups "Clean Code Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clean-code-discu...@googlegroups.com.
To post to this group, send email to clean-code...@googlegroups.com.
Visit this group at http://groups.google.com/group/clean-code-discussion.


--
Paolo Laurenti

e-mail: lauren...@gmail.com
Twitter: @paololaurenti
Skype: paololaurenti


swkane

unread,
Sep 25, 2013, 4:43:16 PM9/25/13
to clean-code...@googlegroups.com
"On the other hand, when I look at examples, books, training material, even the Clean Code videos, test data is usually static and in most cases, values are hardcoded. I tend to not like that but I might be wrong, which is why I'm asking the community."

I think what you should look at is not 'random testing' but rather 'generative testing'. Generative testing is where you try to iterate in a repeatable fashion over a wide variety of inputs to a function to test it over a wider range of possibilities. This is a very viable and often desirable way of testing. This video has a short description: http://www.youtube.com/watch?v=phqdG74Em3A&noredirect=1

Steven




--

Roberto Guerra

unread,
Sep 25, 2013, 5:46:52 PM9/25/13
to clean-code...@googlegroups.com
Yep, there are some tools in functional languages that do this. This is probably takes it much further, but clojure has a librarly called 'Simulant' that allows you to put your system through some simulations.
To unsubscribe from this group and stop receiving emails from it, send an email to clean-code-discussion+unsub...@googlegroups.com.

Caio Fernando Bertoldi Paes de Andrade

unread,
Sep 25, 2013, 9:18:51 PM9/25/13
to clean-code...@googlegroups.com
Imre,

Given that you already have a static test case that tests something, adding more random tests that test the exact same thing sounds like a lot of redundancy and duplication which can confuse your reader and slow down your suite of tests.

Caio

--------------------------
Q: Why is this email so short?
--
The only way to go fast is to go well.
---
You received this message because you are subscribed to the Google Groups "Clean Code Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clean-code-discu...@googlegroups.com.

Imre Kószó

unread,
Sep 26, 2013, 3:16:23 AM9/26/13
to clean-code...@googlegroups.com
Thanks for all the responses guys. I might have not made my point completely clear though. I wasn't talking about creating random test cases, just not having my test data static.

So for example instead of setting up my test data with "3" valid items and then asserting that the method returns "3" items - typing 3 twice would make me think the code is fragile, so I will store that in a variable, possibly the Count property of the list the items are stored in and assert that the source and result lists are of the same length. Also, instead of setting up and testing that the first item's name is "Item 1", I would set up the source object with a name and assert that the result object's name matches this.

This will mean that I'm only ever assigning stuff once. And while I'm there I don't want to come up with ids and names and all that when generating the test data so I'd write a method or a few methods to generate test data _fit for the current test case_.

James Green

unread,
Sep 26, 2013, 4:11:55 AM9/26/13
to clean-code...@googlegroups.com
Interesting discussion. The only time I use "random" data is the use of a UUID generator for IDs of entities. And that can cause confusion when reading the output of multiple entities being involved.

Consider a bug in your code that a different developer comes to fix where s/he has in that ticket the output from a test case that uses random data. Now that developer has to match the random data logged against the data held at run time. I think we're talking edge-cases here but I'm leaning towards caution as I like repeatability. Even if that means generating several tests to cover a range of values.



--

Sebastian Gozin

unread,
Sep 26, 2013, 5:02:44 AM9/26/13
to clean-code...@googlegroups.com
I have the same concern as James when it comes to confusion whe comparing the results of a test.

Consider for example a test where you expect there to be some state like first name and last name.
I personally like my asserts to trip while telling me firstName != "last-name" because it's much more descriptive than firstName != "askufguwkg". Also in the first case it is very clear to me I put the last name into the first name field by accident while in the latter case I have no idea unless I start the debugger and compare the values in my fixture with the one's being asserted.

I do understand the concern of "possibility to cheat in production code" but I find that isn't as much of a problem when people practice TDD seriously.
Otherwise I'd prefer random suffixes and when it comes to sizes you should already be testing sizes 0, 1 and 2 so is 3 really all that helpful?
To unsubscribe from this group and stop receiving emails from it, send an email to clean-code-discussion+unsub...@googlegroups.com.

Roberto Guerra

unread,
Sep 26, 2013, 9:36:28 AM9/26/13
to clean-code...@googlegroups.com
You have to think a little different about your tests when using an automated  specification-based framework like 'QuickCheck'. For example, consider testing the substring method in a String class, it would be pretty cool to say:

forAll((String a, String b, String c) => (a+b+c).substring(a.length, a.length+b.length) == b

How about testing names:

forAll((String firstName, String lastName) => 
   person = new Person(firstName, lastName)
   person.getFirstName() == firstName
   person.getLastName() == lastName

Uncle Bob

unread,
Sep 26, 2013, 9:45:34 AM9/26/13
to clean-code...@googlegroups.com
There are two conditions in which I have used random test data.

1. I have a test in which the data itself doesn't matter; but what happens to that data is predictable.  

Example:
public class StackTest {
  @Test
  public void whenYouPushX_YouPopX() {
    Stack stack = new Stack();
    int X = anyInt();
    stack.push(X);
    assertThat(stack.pop(), equalTo(X));
  }

  private int anyInt() {
    return (int) (Math.random() * 10000);
}

This test would likely have evolved from the simpler test of 'whenYouPush1_youPop1()'  Once I got that to pass, I'd want to triangulate to get the more general case to pass.  One way to triangulate is to add a second case to the test (e.g. push and pop 2).  The other is to introduce the notion of anyInt().  I like the way anyInt() reads.  It communicates the triangulation far better than 2 does.

2. Testing Random Behavior
Sometimes you want random behavior from your program.  In a game, for example, you might want monsters to randomly move about the map one cell per turn.  How do you test random movement?  The commonly accepted approach is to mock out the random number generator and feed it with values that the test expects.

Another technique is to move the monster 1000 times and ensure that the resultant behavior is statistically correct.  So, for example, you repeat the following 1000 times {put monster in cell 0; move monster randomly; increment counter of cell where monster landed}.
If there were four possible destinations, then you'd expect each counter to be within some delta of 250.  You choose that delta such that the odds against a false positive are so high as to be irrelevant.  


Imre Kószó

unread,
Sep 26, 2013, 11:59:58 AM9/26/13
to clean-code...@googlegroups.com
Interesting responses!

I agree that test error output might be a bit more obscure with random data as James and Sebastian pointed out. However in my opinion if those tests are well structured, identifying the offending test case should not be hard as failing test case names are usually logged. If a developer sets out to fix a failing test, they will (I would) run the reported case first to see it failing anyway.

Usually the tests I have to write fall into the category Uncle Bob marked with 1. Some data comes in, I have to make sure that data goes out to somewhere, written to db, retrieved from db etc. It doesn't matter what the forename of the customer is, what matters is what's being passed to my interactor as forename will get sent to the gateway. It doesn't matter how many items an order has, what matters is that the order DTO being returned from my interactor contains equally as many items. With other details matching of course.

Category 2 is interesting, I haven't had to use this technique before but something to keep in mind.

I'm not sure I'm getting the "data itself doesn't matter; but what happens to that data is predictable" part right in the following example. In the validator here, the reference value is not hardcoded, let's say it's part of an "enterprise" software and it needs to be configurable. In the test I'm using a random reference value and another random value that suits the test case.

class Validator
{
  int _referenceValue;

  public Validator(int referenceValue)
  {
    _referenceValue = referenceValue;
  }

  public bool GreaterThanReference(int i)
  {
    return i > _referenceValue;
  }
}

class ValidatorTests
{
  [Test]
  public void GreaterThanReference_ReturnsTrue_WhenGreaterValueIsPassed()
  {
    int referenceValue = anyInt();
    var validator = new Validator(referenceValue);

    Assert.IsTrue(validator.GreaterThanReference(anyIntAbove(referenceValue)));
  }

  private int anyInt() {
    return (int) (Math.random() * 10000);
  }

  private int anyIntAbove(int minimum) {
    return minimum + 1 + anyInt();
  }
}

Roberto Guerra

unread,
Sep 26, 2013, 12:02:13 PM9/26/13
to clean-code...@googlegroups.com
I would use QuickCheck to generate random values. That way it is kept out of the test code. If you are using Java you can have a look at this https://github.com/pholser/junit-quickcheck

Imre Kószó

unread,
Sep 26, 2013, 12:09:25 PM9/26/13
to clean-code...@googlegroups.com
Will check that out, thanks. On first sight it looks to be a random generator that can generate a lot of different stuff out of the box so saves you the wrapper code of translating the output of Math.random() to whatever you need.

Per Lundholm

unread,
Sep 29, 2013, 2:18:29 AM9/29/13
to clean-code...@googlegroups.com

JUnit theories sounds interesting.

Regards
Per

--
The only way to go fast is to go well.
---
You received this message because you are subscribed to the Google Groups "Clean Code Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clean-code-discu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages