You can reliably catch use of uninitialized variables using Valgrind
(http://valgrind.org).
Nonetheless, something like --gtest_shuffle can be quite useful.
Thanks for proposing it and offering to implement it!
If we decide to do it, we need to make sure we can control the random
number seed for debugging. The test output (text & XML) should
contain the random seed such that a person reading the report can
repro the failure. The shuffle flag should also work with
--gtest_filter and --gtest_also_run_disabled_tests. And it should
respect the rule that all death tests should run before the non-death
tests (see the documentation on death tests in the advanced guide wiki
for why). It's not trivial. :-)
One idea for controlling the seed:
--gtest_shuffle=0 do not shuffle.
--gtest_shuffle=-1 use today's Julian date as the random seed.
--gtest_shuffle=N use N as the random seed, where N is not 0.
Thoughts?
--
Zhanyong
Another thing: --gtest_shuffle should work with test sharding
(http://code.google.com/p/googletest/wiki/GoogleTestAdvancedGuide#Distributing_Test_Functions_to_Multiple_Machines)
as well.
>
> One idea for controlling the seed:
>
> --gtest_shuffle=0 do not shuffle.
> --gtest_shuffle=-1 use today's Julian date as the random seed.
> --gtest_shuffle=N use N as the random seed, where N is not 0.
>
> Thoughts?
>
> --
> Zhanyong
>
--
Zhanyong
David Saff
2009/3/27 David Saff <da...@saff.net>:
> [-drochberg]
> [+rochberg]
>
> Sorry, typo.
>
> David Saff
>
> 2009/3/27 David Saff <da...@saff.net>:
>> 2009/3/27 Zhanyong Wan (λx.x x) <w...@google.com>:
>>> Nonetheless, something like --gtest_shuffle can be quite useful.
>>> Thanks for proposing it and offering to implement it!
>>
>> FYI, David Rochberg and I spiked a test-reordering version of Gtest a
>> couple weeks back, and would be happy to contribute that code. Our
>> implementation tries to run likely-to-fail tests first, but pluggable
>> reordering strategies make sense.
>>
>> David Saff
>>
>
You can reliably catch use of uninitialized variables using Valgrind
(http://valgrind.org).
Nonetheless, something like --gtest_shuffle can be quite useful.
Thanks for proposing it and offering to implement it!
If we decide to do it, we need to make sure we can control the random
number seed for debugging. The test output (text & XML) should
contain the random seed such that a person reading the report can
repro the failure. The shuffle flag should also work with
--gtest_filter and --gtest_also_run_disabled_tests. And it should
respect the rule that all death tests should run before the non-death
tests (see the documentation on death tests in the advanced guide wiki
for why). It's not trivial. :-)
One idea for controlling the seed:
--gtest_shuffle=0 do not shuffle.
--gtest_shuffle=-1 use today's Julian date as the random seed.
--gtest_shuffle=N use N as the random seed, where N is not 0.
Thoughts?
Thanks, David. IIRC, you determine which tests are more likely to
fail by looking at the test run history. Where do you store that
history?
>>>
>>> David Saff
>>>
>>
>
--
Zhanyong
For our purpose, any reasonable pseudo random number generator will
do. Simplicity and portability are more important.
> static_cast<unsigned>(GetTimeInMillis()) as the default random seed; it was
> easy to implement, and it seems to work well in my testing. Is this okay?
When you uncover a bug via --gtest_shuffle, often you may want to
rerun the test to repro it. Therefore it's important to be able to
rerun the tests with the same random seed as the previous run. The
reason I suggested to use the current Julian date
(http://en.wikipedia.org/wiki/Julian_day) or perhaps the last several
digits of it as the seed is that it's stable in the short term and
still varies over the long term. This allows you to get enough
randomization over time (assuming you run your tests regularly), while
making it easy to debug the tests.
> I was considering --gtest_shuffle as a Boolean flag and --gtest_rng_seed=N
> to specify a seed (and use a time-based seed if N is 0). This simplifies
> usage a bit for the default case of shuffling without specifying a seed.
Sounds good. rng is not a well-known acronym, so I'd suggest
gtest_random_seed instead.
> Also, since srand() can take any unsigned int from 0 to UINT_MAX as a
> parameter, I guess it feels a bit wrong to special case (unsigned)-1. But
We already special case 0.
> maybe it's better to let one flag control everything; just let me know which
> you prefer.
I like gtest_shuffle + gtest_random_seed.
> I haven't looked at the XML schema yet; before I start investigating, do you
> know offhand where the seed should be recorded?
We'll probably have to add a new attribute to the top-level
<testsuites> element. Peter, what do you suggest?
> How should shuffle + repeat + XML output work? The RNG is only seeded at
> the beginning of the program, but it looks like each iteration's XML output
> overwrites the previous iteration's and does not record its own iteration
> number. The recorded RNG seed wouldn't work without also including the
> number of repetitions.
The first question to ask is whether we should re-shuffle or use the
same order in each iteration. I think the former makes more sense
given the purpose of the shuffling. For simplicity, I suggest to use
(default random seed + iteration index) as the random seed for each
iteration. In other words, we call srand(default_random_seed + i) at
the beginning of the i-th iteration.
Let's give the group some time to comment before actually do it. Thanks,
--
Zhanyong
When you uncover a bug via --gtest_shuffle, often you may want to
rerun the test to repro it. Therefore it's important to be able to
rerun the tests with the same random seed as the previous run. The
reason I suggested to use the current Julian date
(http://en.wikipedia.org/wiki/Julian_day) or perhaps the last several
digits of it as the seed is that it's stable in the short term and
still varies over the long term. This allows you to get enough
randomization over time (assuming you run your tests regularly), while
making it easy to debug the tests.
> Also, since srand() can take any unsigned int from 0 to UINT_MAX as a> parameter, I guess it feels a bit wrong to special case (unsigned)-1. ButWe already special case 0.
> maybe it's better to let one flag control everything; just let me know whichI like gtest_shuffle + gtest_random_seed.
> you prefer.
> How should shuffle + repeat + XML output work? The RNG is only seeded at> the beginning of the program, but it looks like each iteration's XML outputThe first question to ask is whether we should re-shuffle or use the
> overwrites the previous iteration's and does not record its own iteration
> number. The recorded RNG seed wouldn't work without also including the
> number of repetitions.
same order in each iteration. I think the former makes more sense
given the purpose of the shuffling. For simplicity, I suggest to use
(default random seed + iteration index) as the random seed for each
iteration. In other words, we call srand(default_random_seed + i) at
the beginning of the i-th iteration.
Can someone provide a summary of the current proposal or proposals?
It's hard to unravel this thread.
> If a test run involves running the test suite multiple times because
> the test may be flaky, using the same seed on every iteration might be
> a good idea in order to distinguish the cases where order makes a
> difference from cases where the test succeeds unreliably due to other
> factors.
OTOH, reshuffling in each iteration gives you a better chance to
reveal the bug if it's related to test ordering.
I suggest to keep it simple for now: when --gtest_repeat and
--gtest_shuffle are both specified, we always re-shuffle in each
iteration. If you don't think the flakiness is related to test
ordering, use --gtest_repeat alone; otherwise use both flags. If it
turns out that people really need the "shuffle once" behavior, we'll
then do something.
> Using the Julian date is not compelling to me. The person might test
> all day, check in, then have the test fail the next day. Why not use a
> different seed each run? As long as the seed is logged and there is an
> option to set the seed, this would be the most reliable way of quickly
> finding a problem that is due to test order.
Sounds good.
--
Zhanyong
> Like Dean, I'd prefer to use a different seed on every iteration, to
> maximize randomness in the hope of catching a problem. I'd even like to be
> able to manually run a test a few times in a row, with it shuffled
> differently each time, without having to resort to using --gtest-random_seed
> or --gtest_repeat. If the user uncovers a bug via --gtest_shuffle, he can
> easily rerun the test using --gtest_random_seed; this should provide short
> term stability if needed.
>
> (Using the Julian date as the default seed and calling srand(default_seed +
> iteration_number) with --gtest_repeat seems a particularly bad combination,
> since it means that repeating will only use seeds from future dates and
> you'll likely see fewer unique seeds.)
OK. I'm convinced.
> Reshuffling on every iteration is good.
> I was under the impression that calling srand() multiple times reduces
> randomness, but I couldn't find definitive statements to that effect while
> Googling, and it may not matter for these purposes anyway. Calling srand()
> each time has the advantage of letting you repeat a particular iteration and
> skip the prior iterations if you want to, so that's probably a good thing.
> Given the purpose of shuffling, an iteration's results could depend on the
> order in which previous iterations' tests were run. Therefore, either the
> XML output should contain both the random seed and the iteration number, or
> the XML outputter should be changed to store the results of every iteration,
> so that the user can predictably repeat a sequence of shuffled iterations if
> needed.
The test result can potentially depend on many things that aren't
under our control (e.g. the current time, the machine load, the
network latency, etc). We cannot possibly record the entire
environment and replay the test. Instead, we try to find a sweat
point that is practical and gives us enough benefit in the common
case. I think recording the result of all previous iteration is an
over kill in most cases and makes the average user experience worse.
I'd like to start with recording only the random seed of the current
iteration.
Another thing: srand() is a global resource (a singleton). If we
modify its state, we can interfere with tests that also use random
numbers. Therefore I think we should implement our own pseudo random
number generator that has its own state. For our purpose, the
generator can be very simple.
Finally, to set the expectation right: my experience suggests that in
most cases the bugs found by --gtest_shuffle will be in the test code
(e.g. you have one test that depends on another test running first, or
one test doesn't tear down properly) instead of the production code.
While it's good to improve your test code (and thus help to improve
your production code in the long run), to many people it may not be as
exciting as improving your production code immediately. To really
benefit from --gtest_shuffle, you need to be disciplined and make a
habit of making sure the tests pass when the flag is specified. I
believe this will pay off in the long run, but it does require more
effort in the short term. Just so that you are aware that it's not
magic. :-)
--
Zhanyong
We'll need to consider whether we want to distribute the tests
differently in each iteration. I think yes. We just need to make
sure that all shards make the same decision as to who gets to run
which tests.
> To let the user reproduce a particular randomized ordering that's causing
> problems, the seed used will be included both in the text and XML output
To ease debugging, I suggest to print the seed both at the beginning
of the test program (such that you still know what the seed is if the
program crashes) and at the end of the test program (such that you
don't have to scroll back many pages to see it).
> (although we don't know where in the XML output). The default seed will
> come from somewhere (Julian date? GetTimeInMillis()? somewhere else?), or
> the user can specify a seed with the new --gtest_random_seed flag.
> (--gtest_random_seed=0 would mean to use the default seed.)
> If the user specifies both --gtest_shuffle and gtest_repeat, then each
> iteration will be shuffled separately. The random number generator will
> probably be reseeded on each iteration with a seed of (default_seed +
> iteration_number). This should be recorded somehow in the XML output with
> enough detail for the user to repeat it.
> All of this may be implemented by itself, or it may become part of
> generalized test-reordering functionality along with David Saff's and David
> Rochberg's feature that first runs tests likely to fail.
> Josh Kelley
>
--
Zhanyong
We'll need to consider whether we want to distribute the testsdifferently in each iteration. I think yes. We just need to make
sure that all shards make the same decision as to who gets to run
which tests.
I try to make orthogonal designs. I want a user to be able to combine
individual features freely. Usually that makes the system easier to
learn and more powerful.
That's why I want to think through how test shuffling may interact
with other features and make sure such interaction doesn't produce
unwanted or surprising effects. I want to implement the most natural
and useful behavior for each combination.
However, this doesn't mean we have to implement everything up-front.
As long as we have a plan and don't paint ourselves into a corner,
it's perfectly fine (and actually a good thing) to prioritize the sub
tasks and tackles them one at a time. Lower priority tasks may be
postponed unless there is a concrete need to justify the
implementation cost. Sorry I didn't make this clear.
You are right that any kind of coordination between the shards will
complicate the test runner, and I don't suggest to go that route, at
least not initially. Instead, in the sharding mode, we can pick the
random seed deterministically. For example, we can use the Hash value
of (the test names and the total shard number) as the initial seed.
That way, you'll get a different seed whenever you rename, add, or
remove a test, or change the total number of shards. That's not truly
random, but should be good enough for most cases.
Initially, I'd like to keep it simple. Let's just ignore
--gtest_shuffle in the sharding mode for now. We may even ignore it
in the --gtest_repeat mode too if it's not easy to implement. Once
it's implemented for the simple case (no sharding or repeating), we'll
see how people like it and decide whether it's worthwhile to fully
implement it.
--
Zhanyong
--
Zhanyong
That's one way to do it, but it's not as effective for finding bugs
compared with re-sharding in every iteration. For example, if a bug
manifests only when test A runs after test B on the same machine,
you'll never catch it if A and B always run on different machines.
Re-sharding combined with re-shuffling makes better use of your
machine time.
--
Zhanyong
David Saff
2009/4/9 Zhanyong Wan (λx.x x) <w...@google.com>:
2009/4/26 Josh Kelley <jos...@gmail.com>:
> Gmail doesn't really like Opera, so that message ended up being too hard to
> read. Let me try again. Sorry.
>
> I uploaded an initial patch submission to
> http://codereview.appspot.com/52057.
I'll review the code later. Below are some general comments.
> Comments, current issues and limitations:
>
> When doing shuffling + sharding, the shuffling is done after the sharding
> subset is determined (as Vlad suggested), because that was simplest for now.
That's fine.
> I can change it later (as part of this patch or a later patch?). Honestly, I
> still like the simplicity of Vlad's suggestion. Alternatively, I don't know
> how much this would complicate the test runner, but what about only
> shuffling before sharding if the test runner specifies a non-default random
> seed?
>
> The seed is specified on the command line and in test outputs as a signed
> int, even though the random number generator internally takes as an
> unsigned int. I did it this way because the Google Style Guide discourages
> the use of unsigned ints and because I didn't really want to have to do
> UInt32 versions of the option parsing routines, but I don't know if this is
> best or not. Thoughts?
We don't need the full range of int32 for the random seed. It adds
little value and makes the command line hard to type. I think
limiting the seed to 5 digits (0~99999) should provide enough
randomness without making the syntax inconvenient.
> I tried to organize the code so that the user could turn shuffling on and
> off for specific test cases by setting GTEST_FLAG(shuffle) from within test
> cases' setup functions. This complicated the design just a bit. I don't
I'll look at the implementation to see if it's worth it.
> know if it's a design goal to let users set flags anywhere (instead of, for
> example, only before calling RUN_ALL_TESTS, with the exception of
> death_test_style), and I thought that maybe the documentation should be
> updated to say when setting different flags is supported.
It depends on the flag. Good idea to clarify it in the docs.
> There should probably be a Python script gtest_shuffle_test.py that does the
> following:
> 1) Runs --gtest_shuffle --gtest_repeat=3 and verifies non-repeating seeds.
> 2) Runs --gtest_shuffle --gtest_random_seed=n and verifies that the order
> does in fact change.
> 3) Runs a test suite containing death tests 10 times or so and verifies that
> death tests always occur before non-death tests.
> Does this sound right?
Sounds good.
> Any particular guidelines for writing Python test
> scripts?
We don't have a published Python style guide yet. Just try to mimic
the style of existing Python tests, and I'll catch style issues when
reviewing it.
> Can the Python script reuse gtest-death-test_test (since it
> already has some appropriate tests), or should it have its own set of dummy
> tests?
gtest_list_tests_unittest_.cc seems better for this purpose, as it
contains truly dummy tests. Can you add a couple of empty death tests
to it?
Thanks,
--
Zhanyong