Randomizing tests' order

780 views
Skip to first unread message

Josh Kelley

unread,
Mar 26, 2009, 11:57:28 PM3/26/09
to Google C++ Testing Framework
The code base that I've inherited sometimes depends on uninitialized variables or on promiscuously global/shared state, and I don't always catch this in my unit tests because they run in a fixed order and so happen to set the stack or global state correctly to avoid the problem.

Would it be a useful feature to add a --gtest_shuffle flag that randomized both the order of test cases and the order of tests within each case, to try to expose problems like this?  I wrote a quick implementation of test order randomization and it failed to find a known bug in my code base, but it seems like it could be useful in other circumstances; I can clean up and submit the patch if there's interest.

Josh Kelley

Zhanyong Wan (λx.x x)

unread,
Mar 27, 2009, 1:11:25 AM3/27/09
to Josh Kelley, Google C++ Testing Framework
Hi Josh,

You can reliably catch use of uninitialized variables using Valgrind
(http://valgrind.org).

Nonetheless, something like --gtest_shuffle can be quite useful.
Thanks for proposing it and offering to implement it!

If we decide to do it, we need to make sure we can control the random
number seed for debugging. The test output (text & XML) should
contain the random seed such that a person reading the report can
repro the failure. The shuffle flag should also work with
--gtest_filter and --gtest_also_run_disabled_tests. And it should
respect the rule that all death tests should run before the non-death
tests (see the documentation on death tests in the advanced guide wiki
for why). It's not trivial. :-)

One idea for controlling the seed:

--gtest_shuffle=0 do not shuffle.
--gtest_shuffle=-1 use today's Julian date as the random seed.
--gtest_shuffle=N use N as the random seed, where N is not 0.

Thoughts?

--
Zhanyong

Zhanyong Wan (λx.x x)

unread,
Mar 27, 2009, 1:55:45 AM3/27/09
to Josh Kelley, Google C++ Testing Framework
2009/3/26 Zhanyong Wan (λx.x x) <w...@google.com>:

> Hi Josh,
>
> On Thu, Mar 26, 2009 at 8:57 PM, Josh Kelley <jos...@gmail.com> wrote:
>> The code base that I've inherited sometimes depends on uninitialized
>> variables or on promiscuously global/shared state, and I don't always catch
>> this in my unit tests because they run in a fixed order and so happen to set
>> the stack or global state correctly to avoid the problem.
>> Would it be a useful feature to add a --gtest_shuffle flag that randomized
>> both the order of test cases and the order of tests within each case, to try
>> to expose problems like this?  I wrote a quick implementation of test order
>> randomization and it failed to find a known bug in my code base, but it
>> seems like it could be useful in other circumstances; I can clean up and
>> submit the patch if there's interest.
>
> You can reliably catch use of uninitialized variables using Valgrind
> (http://valgrind.org).
>
> Nonetheless, something like --gtest_shuffle can be quite useful.
> Thanks for proposing it and offering to implement it!
>
> If we decide to do it, we need to make sure we can control the random
> number seed for debugging.  The test output (text & XML) should
> contain the random seed such that a person reading the report can
> repro the failure.  The shuffle flag should also work with
> --gtest_filter and --gtest_also_run_disabled_tests.  And it should
> respect the rule that all death tests should run before the non-death
> tests (see the documentation on death tests in the advanced guide wiki
> for why).  It's not trivial. :-)

Another thing: --gtest_shuffle should work with test sharding
(http://code.google.com/p/googletest/wiki/GoogleTestAdvancedGuide#Distributing_Test_Functions_to_Multiple_Machines)
as well.

>
> One idea for controlling the seed:
>
> --gtest_shuffle=0  do not shuffle.
> --gtest_shuffle=-1  use today's Julian date as the random seed.
> --gtest_shuffle=N  use N as the random seed, where N is not 0.
>
> Thoughts?
>
> --
> Zhanyong
>

--
Zhanyong

David Saff

unread,
Mar 27, 2009, 11:58:26 AM3/27/09
to Zhanyong Wan (λx.x x), Josh Kelley, Google C++ Testing Framework, roch...@google.com
Final attempt to push this through to the googlegroup...

David Saff

2009/3/27 David Saff <da...@saff.net>:
> [-drochberg]
> [+rochberg]
>
> Sorry, typo.
>
>   David Saff
>
> 2009/3/27 David Saff <da...@saff.net>:
>> 2009/3/27 Zhanyong Wan (λx.x x) <w...@google.com>:


>>> Nonetheless, something like --gtest_shuffle can be quite useful.
>>> Thanks for proposing it and offering to implement it!
>>

>> FYI, David Rochberg and I spiked a test-reordering version of Gtest a
>> couple weeks back, and would be happy to contribute that code.  Our
>> implementation tries to run likely-to-fail tests first, but pluggable
>> reordering strategies make sense.
>>
>>   David Saff
>>
>

Josh Kelley

unread,
Mar 27, 2009, 9:53:11 PM3/27/09
to Zhanyong Wan (λx.x x), Google C++ Testing Framework
2009/3/27 Zhanyong Wan (λx.x x) <w...@google.com>
You can reliably catch use of uninitialized variables using Valgrind
(http://valgrind.org).

I should have thought to try that.  Thanks for the suggestion.

Nonetheless, something like --gtest_shuffle can be quite useful.
Thanks for proposing it and offering to implement it!

If we decide to do it, we need to make sure we can control the random
number seed for debugging.  The test output (text & XML) should
contain the random seed such that a person reading the report can
repro the failure.  The shuffle flag should also work with
--gtest_filter and --gtest_also_run_disabled_tests.  And it should
respect the rule that all death tests should run before the non-death
tests (see the documentation on death tests in the advanced guide wiki
for why).  It's not trivial. :-)

One idea for controlling the seed:

--gtest_shuffle=0  do not shuffle.
--gtest_shuffle=-1  use today's Julian date as the random seed.
--gtest_shuffle=N  use N as the random seed, where N is not 0.

Thoughts?

I hadn't considered death tests and sharding.  Thanks.

Currently, I'm using rand() and srand() for simplicity and portability, even though other RNGs are probably better.  I'm using static_cast<unsigned>(GetTimeInMillis()) as the default random seed; it was easy to implement, and it seems to work well in my testing.  Is this okay?

I was considering --gtest_shuffle as a Boolean flag and --gtest_rng_seed=N to specify a seed (and use a time-based seed if N is 0).  This simplifies usage a bit for the default case of shuffling without specifying a seed.  Also, since srand() can take any unsigned int from 0 to UINT_MAX as a parameter, I guess it feels a bit wrong to special case (unsigned)-1.  But maybe it's better to let one flag control everything; just let me know which you prefer.

I haven't looked at the XML schema yet; before I start investigating, do you know offhand where the seed should be recorded?

How should shuffle + repeat + XML output work?  The RNG is only seeded at the beginning of the program, but it looks like each iteration's XML output overwrites the previous iteration's and does not record its own iteration number.  The recorded RNG seed wouldn't work without also including the number of repetitions.

Josh Kelley

Zhanyong Wan (λx.x x)

unread,
Mar 29, 2009, 12:01:02 AM3/29/09
to David Saff, Josh Kelley, Google C++ Testing Framework, roch...@google.com
2009/3/27 David Saff <da...@saff.net>:

Thanks, David. IIRC, you determine which tests are more likely to
fail by looking at the test run history. Where do you store that
history?

>>>
>>>   David Saff
>>>
>>
>

--
Zhanyong

Zhanyong Wan (λx.x x)

unread,
Mar 29, 2009, 12:17:33 AM3/29/09
to Josh Kelley, Google C++ Testing Framework, Peter Reilly
2009/3/27 Josh Kelley <jos...@gmail.com>:

> 2009/3/27 Zhanyong Wan (λx.x x) <w...@google.com>
>>
>> You can reliably catch use of uninitialized variables using Valgrind
>> (http://valgrind.org).
>
> I should have thought to try that.  Thanks for the suggestion.
>>
>> Nonetheless, something like --gtest_shuffle can be quite useful.
>> Thanks for proposing it and offering to implement it!
>>
>> If we decide to do it, we need to make sure we can control the random
>> number seed for debugging.  The test output (text & XML) should
>> contain the random seed such that a person reading the report can
>> repro the failure.  The shuffle flag should also work with
>> --gtest_filter and --gtest_also_run_disabled_tests.  And it should
>> respect the rule that all death tests should run before the non-death
>> tests (see the documentation on death tests in the advanced guide wiki
>> for why).  It's not trivial. :-)
>>
>> One idea for controlling the seed:
>>
>> --gtest_shuffle=0  do not shuffle.
>> --gtest_shuffle=-1  use today's Julian date as the random seed.
>> --gtest_shuffle=N  use N as the random seed, where N is not 0.
>>
>> Thoughts?
>
> I hadn't considered death tests and sharding.  Thanks.
> Currently, I'm using rand() and srand() for simplicity and portability, even
> though other RNGs are probably better.  I'm using

For our purpose, any reasonable pseudo random number generator will
do. Simplicity and portability are more important.

> static_cast<unsigned>(GetTimeInMillis()) as the default random seed; it was
> easy to implement, and it seems to work well in my testing.  Is this okay?

When you uncover a bug via --gtest_shuffle, often you may want to
rerun the test to repro it. Therefore it's important to be able to
rerun the tests with the same random seed as the previous run. The
reason I suggested to use the current Julian date
(http://en.wikipedia.org/wiki/Julian_day) or perhaps the last several
digits of it as the seed is that it's stable in the short term and
still varies over the long term. This allows you to get enough
randomization over time (assuming you run your tests regularly), while
making it easy to debug the tests.

> I was considering --gtest_shuffle as a Boolean flag and --gtest_rng_seed=N
> to specify a seed (and use a time-based seed if N is 0).  This simplifies
> usage a bit for the default case of shuffling without specifying a seed.

Sounds good. rng is not a well-known acronym, so I'd suggest
gtest_random_seed instead.

>  Also, since srand() can take any unsigned int from 0 to UINT_MAX as a
> parameter, I guess it feels a bit wrong to special case (unsigned)-1.  But

We already special case 0.

> maybe it's better to let one flag control everything; just let me know which
> you prefer.

I like gtest_shuffle + gtest_random_seed.

> I haven't looked at the XML schema yet; before I start investigating, do you
> know offhand where the seed should be recorded?

We'll probably have to add a new attribute to the top-level
<testsuites> element. Peter, what do you suggest?

> How should shuffle + repeat + XML output work?  The RNG is only seeded at
> the beginning of the program, but it looks like each iteration's XML output
> overwrites the previous iteration's and does not record its own iteration
> number.  The recorded RNG seed wouldn't work without also including the
> number of repetitions.

The first question to ask is whether we should re-shuffle or use the
same order in each iteration. I think the former makes more sense
given the purpose of the shuffling. For simplicity, I suggest to use
(default random seed + iteration index) as the random seed for each
iteration. In other words, we call srand(default_random_seed + i) at
the beginning of the i-th iteration.

Let's give the group some time to comment before actually do it. Thanks,

--
Zhanyong

Josh Kelley

unread,
Mar 29, 2009, 8:57:20 PM3/29/09
to Zhanyong Wan (λx.x x), Google C++ Testing Framework
2009/3/29 Zhanyong Wan (λx.x x) <w...@google.com>

When you uncover a bug via --gtest_shuffle, often you may want to
rerun the test to repro it.  Therefore it's important to be able to
rerun the tests with the same random seed as the previous run.  The
reason I suggested to use the current Julian date
(http://en.wikipedia.org/wiki/Julian_day) or perhaps the last several
digits of it as the seed is that it's stable in the short term and
still varies over the long term.  This allows you to get enough
randomization over time (assuming you run your tests regularly), while
making it easy to debug the tests.

Like Dean, I'd prefer to use a different seed on every iteration, to maximize randomness in the hope of catching a problem.  I'd even like to be able to manually run a test a few times in a row, with it shuffled differently each time, without having to resort to using --gtest-random_seed or --gtest_repeat.  If the user uncovers a bug via --gtest_shuffle, he can easily rerun the test using --gtest_random_seed; this should provide short term stability if needed.
 
(Using the Julian date as the default seed and calling srand(default_seed + iteration_number) with --gtest_repeat seems a particularly bad combination, since it means that repeating will only use seeds from future dates and you'll likely see fewer unique seeds.)

>  Also, since srand() can take any unsigned int from 0 to UINT_MAX as a
> parameter, I guess it feels a bit wrong to special case (unsigned)-1.  But

We already special case 0.

Yeah, I know.  That doesn't both me as much, since 0 is a common special case and it doesn't involve ugly typecasts.  I know my argument is a bit subjective, sorry. 

> maybe it's better to let one flag control everything; just let me know which
> you prefer.

I like gtest_shuffle + gtest_random_seed.
 
Sounds good to me.

> How should shuffle + repeat + XML output work?  The RNG is only seeded at
> the beginning of the program, but it looks like each iteration's XML output
> overwrites the previous iteration's and does not record its own iteration
> number.  The recorded RNG seed wouldn't work without also including the
> number of repetitions.

The first question to ask is whether we should re-shuffle or use the
same order in each iteration.  I think the former makes more sense
given the purpose of the shuffling.  For simplicity, I suggest to use
(default random seed + iteration index) as the random seed for each
iteration.  In other words, we call srand(default_random_seed + i) at
the beginning of the i-th iteration.

Reshuffling on every iteration is good.

I was under the impression that calling srand() multiple times reduces randomness, but I couldn't find definitive statements to that effect while Googling, and it may not matter for these purposes anyway.  Calling srand() each time has the advantage of letting you repeat a particular iteration and skip the prior iterations if you want to, so that's probably a good thing.

Given the purpose of shuffling, an iteration's results could depend on the order in which previous iterations' tests were run.  Therefore, either the XML output should contain both the random seed and the iteration number, or the XML outputter should be changed to store the results of every iteration, so that the user can predictably repeat a sequence of shuffled iterations if needed.

Josh Kelley

Josh Kelley

unread,
Mar 29, 2009, 8:58:11 PM3/29/09
to Dean Sturtevant, Zhanyong Wan (λx.x x), Google C++ Testing Framework, Peter Reilly
2009/3/29 Dean Sturtevant <dstur...@google.com>
Can someone provide a summary of the current proposal or proposals?
It's hard to unravel this thread.

I'll try to summarize:

Add a flag, --gtest_shuffle, to randomize the order in which test cases and tests will be run.

Randomization must not interfere with other order-dependent functionality (sharding and death tests).  

To let the user reproduce a particular randomized ordering that's causing problems, the seed used will be included both in the text and XML output (although we don't know where in the XML output).  The default seed will come from somewhere (Julian date?  GetTimeInMillis()?  somewhere else?), or the user can specify a seed with the new --gtest_random_seed flag.  (--gtest_random_seed=0 would mean to use the default seed.)

If the user specifies both --gtest_shuffle and gtest_repeat, then each iteration will be shuffled separately.  The random number generator will probably be reseeded on each iteration with a seed of (default_seed + iteration_number).  This should be recorded somehow in the XML output with enough detail for the user to repeat it.

All of this may be implemented by itself, or it may become part of generalized test-reordering functionality along with David Saff's and David Rochberg's feature that first runs tests likely to fail.

Josh Kelley

Zhanyong Wan (λx.x x)

unread,
Mar 30, 2009, 6:22:02 PM3/30/09
to Dean Sturtevant, Josh Kelley, Google C++ Testing Framework, Peter Reilly
2009/3/29 Dean Sturtevant <dstur...@google.com>:

> If a test run involves running the test suite multiple times because
> the test may be flaky, using the same seed on every iteration might be
> a good idea in order to distinguish the cases where order makes a
> difference from cases where the test succeeds unreliably due to other
> factors.

OTOH, reshuffling in each iteration gives you a better chance to
reveal the bug if it's related to test ordering.

I suggest to keep it simple for now: when --gtest_repeat and
--gtest_shuffle are both specified, we always re-shuffle in each
iteration. If you don't think the flakiness is related to test
ordering, use --gtest_repeat alone; otherwise use both flags. If it
turns out that people really need the "shuffle once" behavior, we'll
then do something.

> Using the Julian date is not compelling to me. The person might test
> all day, check in, then have the test fail the next day. Why not use a
> different seed each run? As long as the seed is logged and there is an
> option to set the seed, this would be the most reliable way of quickly
> finding a problem that is due to test order.

Sounds good.

--
Zhanyong

Zhanyong Wan (λx.x x)

unread,
Mar 30, 2009, 6:38:30 PM3/30/09
to Josh Kelley, Google C++ Testing Framework
2009/3/29 Josh Kelley <jos...@gmail.com>:

> Like Dean, I'd prefer to use a different seed on every iteration, to
> maximize randomness in the hope of catching a problem.  I'd even like to be
> able to manually run a test a few times in a row, with it shuffled
> differently each time, without having to resort to using --gtest-random_seed
> or --gtest_repeat.  If the user uncovers a bug via --gtest_shuffle, he can
> easily rerun the test using --gtest_random_seed; this should provide short
> term stability if needed.
>
> (Using the Julian date as the default seed and calling srand(default_seed +
> iteration_number) with --gtest_repeat seems a particularly bad combination,
> since it means that repeating will only use seeds from future dates and
> you'll likely see fewer unique seeds.)

OK. I'm convinced.

> Reshuffling on every iteration is good.
> I was under the impression that calling srand() multiple times reduces
> randomness, but I couldn't find definitive statements to that effect while
> Googling, and it may not matter for these purposes anyway.  Calling srand()
> each time has the advantage of letting you repeat a particular iteration and
> skip the prior iterations if you want to, so that's probably a good thing.
> Given the purpose of shuffling, an iteration's results could depend on the
> order in which previous iterations' tests were run.  Therefore, either the
> XML output should contain both the random seed and the iteration number, or
> the XML outputter should be changed to store the results of every iteration,
> so that the user can predictably repeat a sequence of shuffled iterations if
> needed.

The test result can potentially depend on many things that aren't
under our control (e.g. the current time, the machine load, the
network latency, etc). We cannot possibly record the entire
environment and replay the test. Instead, we try to find a sweat
point that is practical and gives us enough benefit in the common
case. I think recording the result of all previous iteration is an
over kill in most cases and makes the average user experience worse.
I'd like to start with recording only the random seed of the current
iteration.

Another thing: srand() is a global resource (a singleton). If we
modify its state, we can interfere with tests that also use random
numbers. Therefore I think we should implement our own pseudo random
number generator that has its own state. For our purpose, the
generator can be very simple.

Finally, to set the expectation right: my experience suggests that in
most cases the bugs found by --gtest_shuffle will be in the test code
(e.g. you have one test that depends on another test running first, or
one test doesn't tear down properly) instead of the production code.
While it's good to improve your test code (and thus help to improve
your production code in the long run), to many people it may not be as
exciting as improving your production code immediately. To really
benefit from --gtest_shuffle, you need to be disciplined and make a
habit of making sure the tests pass when the flag is specified. I
believe this will pay off in the long run, but it does require more
effort in the short term. Just so that you are aware that it's not
magic. :-)

--
Zhanyong

Zhanyong Wan (λx.x x)

unread,
Mar 30, 2009, 6:43:23 PM3/30/09
to Josh Kelley, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
On Sun, Mar 29, 2009 at 5:58 PM, Josh Kelley <jos...@gmail.com> wrote:
> 2009/3/29 Dean Sturtevant <dstur...@google.com>
>>
>> Can someone provide a summary of the current proposal or proposals?
>> It's hard to unravel this thread.
>
> I'll try to summarize:
> Add a flag, --gtest_shuffle, to randomize the order in which test cases and
> tests will be run.
> Randomization must not interfere with other order-dependent functionality
> (sharding and death tests).

We'll need to consider whether we want to distribute the tests
differently in each iteration. I think yes. We just need to make
sure that all shards make the same decision as to who gets to run
which tests.

> To let the user reproduce a particular randomized ordering that's causing
> problems, the seed used will be included both in the text and XML output

To ease debugging, I suggest to print the seed both at the beginning
of the test program (such that you still know what the seed is if the
program crashes) and at the end of the test program (such that you
don't have to scroll back many pages to see it).

> (although we don't know where in the XML output).  The default seed will
> come from somewhere (Julian date?  GetTimeInMillis()?  somewhere else?), or
> the user can specify a seed with the new --gtest_random_seed flag.
>  (--gtest_random_seed=0 would mean to use the default seed.)
> If the user specifies both --gtest_shuffle and gtest_repeat, then each
> iteration will be shuffled separately.  The random number generator will
> probably be reseeded on each iteration with a seed of (default_seed +
> iteration_number).  This should be recorded somehow in the XML output with
> enough detail for the user to repeat it.
> All of this may be implemented by itself, or it may become part of
> generalized test-reordering functionality along with David Saff's and David
> Rochberg's feature that first runs tests likely to fail.
> Josh Kelley
>

--
Zhanyong

Josh Kelley

unread,
Mar 31, 2009, 12:01:03 AM3/31/09
to Zhanyong Wan (λx.x x), Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
2009/3/30 Zhanyong Wan (λx.x x) <w...@google.com>

We'll need to consider whether we want to distribute the tests
differently in each iteration.  I think yes.  We just need to make
sure that all shards make the same decision as to who gets to run
which tests.

If tests are distributed differently in each iteration, then the test runner would need to distribute an identical seed to each shard, correct?  (Or the shards would need to somehow synchronize start times and clocks if the default seed is based on time.)  That would seem to complicate the test runner; is that a problem?

Josh Kelley

Zhanyong Wan (λx.x x)

unread,
Mar 31, 2009, 2:13:57 AM3/31/09
to Josh Kelley, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
2009/3/30 Josh Kelley <jos...@gmail.com>:

I try to make orthogonal designs. I want a user to be able to combine
individual features freely. Usually that makes the system easier to
learn and more powerful.

That's why I want to think through how test shuffling may interact
with other features and make sure such interaction doesn't produce
unwanted or surprising effects. I want to implement the most natural
and useful behavior for each combination.

However, this doesn't mean we have to implement everything up-front.
As long as we have a plan and don't paint ourselves into a corner,
it's perfectly fine (and actually a good thing) to prioritize the sub
tasks and tackles them one at a time. Lower priority tasks may be
postponed unless there is a concrete need to justify the
implementation cost. Sorry I didn't make this clear.

You are right that any kind of coordination between the shards will
complicate the test runner, and I don't suggest to go that route, at
least not initially. Instead, in the sharding mode, we can pick the
random seed deterministically. For example, we can use the Hash value
of (the test names and the total shard number) as the initial seed.
That way, you'll get a different seed whenever you rename, add, or
remove a test, or change the total number of shards. That's not truly
random, but should be good enough for most cases.

Initially, I'd like to keep it simple. Let's just ignore
--gtest_shuffle in the sharding mode for now. We may even ignore it
in the --gtest_repeat mode too if it's not easy to implement. Once
it's implemented for the simple case (no sharding or repeating), we'll
see how people like it and decide whether it's worthwhile to fully
implement it.

--
Zhanyong

Vlad Losev

unread,
Mar 31, 2009, 2:49:47 AM3/31/09
to Zhanyong Wan (λx.x x), Josh Kelley, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly

Why not figure out the subset of tests for a given shard and then shuffle only that subset?
 


--
Zhanyong

Zhanyong Wan (λx.x x)

unread,
Mar 31, 2009, 3:04:00 AM3/31/09
to Vlad Losev, Josh Kelley, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
2009/3/30 Vlad Losev <vlad...@gmail.com>:

That's one way to do it, but it's not as effective for finding bugs
compared with re-sharding in every iteration. For example, if a bug
manifests only when test A runs after test B on the same machine,
you'll never catch it if A and B always run on different machines.
Re-sharding combined with re-shuffling makes better use of your
machine time.

--
Zhanyong

Zhanyong Wan (λx.x x)

unread,
Apr 9, 2009, 12:33:55 AM4/9/09
to Vlad Losev, Josh Kelley, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly

David Saff

unread,
Apr 9, 2009, 10:38:22 AM4/9/09
to Zhanyong Wan (λx.x x), Vlad Losev, Josh Kelley, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
I assume you mean "This is _now_ tracked". :-)

David Saff

2009/4/9 Zhanyong Wan (λx.x x) <w...@google.com>:

Zhanyong Wan (λx.x x)

unread,
Apr 9, 2009, 10:51:41 AM4/9/09
to David Saff, Vlad Losev, Josh Kelley, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
Doh!

2009/4/9 David Saff <da...@saff.net>:

--
Zhanyong

Josh Kelley

unread,
Apr 26, 2009, 11:53:15 PM4/26/09
to Zhanyong Wan (λx.x x), Vlad Losev, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
I uploaded an initial patch submission to http://codereview.appspot.com/52057.
Comments, current issues and limitations:
When doing shuffling + sharding, the shuffling is done after the sharding subset is determined (as Vlad suggested), because that was simplest for now.  I can change it later (as part of this patch or a later patch?).  Honestly, I still like the simplicity of Vlad's suggestion.  Alternatively, I don't know how much this would complicate the test runner, but what about only shuffling before sharding if the test runner specifies a non-default random seed?
The seed is specified on the command line and in test outputs as a signed int, even though the random number generator internally takes as an unsigned int.  I did it this way because the Google Style Guide discourages the use of unsigned ints and because I didn't really want to have to do UInt32 versions of the option parsing routines, but I don't know if this is best or not. Thoughts?
I tried to organize the code so that the user could turn shuffling on and off for specific test cases by setting GTEST_FLAG(shuffle) from within test cases' setup functions.  This complicated the design just a bit.  I don't know if it's a design goal to let users set flags anywhere (instead of, for example, only before calling RUN_ALL_TESTS, with the exception of death_test_style), and I thought that maybe the documentation should be updated to say when setting different flags is supported.
There should probably be a Python script gtest_shuffle_test.py that does the following:
1) Runs --gtest_shuffle --gtest_repeat=3 and verifies non-repeating seeds.
2) Runs --gtest_shuffle --gtest_random_seed=n and verifies that the order does in fact change.
3) Runs a test suite containing death tests 10 times or so and verifies that death tests always occur before non-death tests.
Does this sound right?  Any particular guidelines for writing Python test scripts?  Can the Python script reuse gtest-death-test_test (since it already has some appropriate tests), or should it have its own set of dummy tests?
Josh Kelley

2009/4/9 Zhanyong Wan (λx.x x) <w...@google.com>

Josh Kelley

unread,
Apr 26, 2009, 11:57:01 PM4/26/09
to Zhanyong Wan (λx.x x), Vlad Losev, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
Gmail doesn't really like Opera, so that message ended up being too hard to read.  Let me try again.  Sorry.

Zhanyong Wan (λx.x x)

unread,
Apr 27, 2009, 8:18:37 PM4/27/09
to Josh Kelley, Vlad Losev, Dean Sturtevant, Google C++ Testing Framework, Peter Reilly
Thanks for your effort, Josh! This is cool.

2009/4/26 Josh Kelley <jos...@gmail.com>:


> Gmail doesn't really like Opera, so that message ended up being too hard to
> read.  Let me try again.  Sorry.
>
> I uploaded an initial patch submission to
> http://codereview.appspot.com/52057.

I'll review the code later. Below are some general comments.

> Comments, current issues and limitations:
>
> When doing shuffling + sharding, the shuffling is done after the sharding
> subset is determined (as Vlad suggested), because that was simplest for now.

That's fine.

> I can change it later (as part of this patch or a later patch?). Honestly, I
> still like the simplicity of Vlad's suggestion. Alternatively, I don't know
> how much this would complicate the test runner, but what about only
> shuffling before sharding if the test runner specifies a non-default random
> seed?
>
> The seed is specified on the command line and in test outputs as a signed
> int, even though the random number generator internally takes as an
> unsigned  int.  I did it this way because the Google Style Guide discourages
> the use  of unsigned ints and because I didn't really want to have to do
> UInt32  versions of the option parsing routines, but I don't know if this is
> best or not. Thoughts?

We don't need the full range of int32 for the random seed. It adds
little value and makes the command line hard to type. I think
limiting the seed to 5 digits (0~99999) should provide enough
randomness without making the syntax inconvenient.

> I tried to organize the code so that the user could turn shuffling on and
> off for specific test cases by setting GTEST_FLAG(shuffle) from within test
> cases' setup functions.  This complicated the design just a bit.  I don't

I'll look at the implementation to see if it's worth it.

> know if it's a design goal to let users set flags anywhere (instead of, for
> example, only before calling RUN_ALL_TESTS, with the exception of
> death_test_style), and I thought that maybe the documentation should be
> updated to say when setting different flags is supported.

It depends on the flag. Good idea to clarify it in the docs.

> There should probably be a Python script gtest_shuffle_test.py that does the
> following:
> 1) Runs --gtest_shuffle --gtest_repeat=3 and verifies non-repeating seeds.
> 2) Runs --gtest_shuffle --gtest_random_seed=n and verifies that the order
> does in fact change.
> 3) Runs a test suite containing death tests 10 times or so and verifies that
> death tests always occur before non-death tests.
> Does this sound right?

Sounds good.

> Any particular guidelines for writing Python test
> scripts?

We don't have a published Python style guide yet. Just try to mimic
the style of existing Python tests, and I'll catch style issues when
reviewing it.

> Can the Python script reuse gtest-death-test_test (since it
> already has some appropriate tests), or should it have its own set of dummy
> tests?

gtest_list_tests_unittest_.cc seems better for this purpose, as it
contains truly dummy tests. Can you add a couple of empty death tests
to it?

Thanks,

--
Zhanyong

Reply all
Reply to author
Forward
0 new messages