Re: option to run the tests in a random order

zhanyo...@gmail.com

unread,

Jun 18, 2009, 5:10:15 PM6/18/09

to jos...@gmail.com, googletes...@googlegroups.com

One thing I learned from this is that big patches can take a long time
to be reviewed and get ready for check-in, as we are all multi-tasking
and have other important work to tend to. During the time it takes to
review and improve the patch, the code base is constantly changing,
often creating merge conflicts. This makes the experience suck.

One thing we can try to make our lives easier is to break down a big
patch into smaller, but still coherent ones. For example, your first
patch in this series can be adding the logic to parse the new flags (but
don't actually implement any behavior for these flags). I imagine that
can be reviewed and checked in quickly. Once that's done, you may want
to contribute the code to shuffle a list (but don't yet hook it up with
the rest of gtest), or maybe an easier-to-shuffle data structure to
replace List. Then we take yet another step...

http://codereview.appspot.com/52057

jos...@gmail.com

unread,

Jun 24, 2009, 10:07:02 PM6/24/09

to zhanyo...@gmail.com, googletes...@googlegroups.com

Reviewers: Zhanyong Wan,

Message:
I uploaded a reduced patch set, as you suggested, that only implements
parsing, managing, and displaying the flags. (I realize that a couple
of aspects of managing and displaying may change depending on your
review here.)

I tried to address your other comments, too, but feel free to ignore
those until those patches are ready.

I honestly didn't think that shuffling the list was that bad (although
the unit tests for shuffling were pretty ugly). To help me understand
better, could you explain your concerns? (Of course, if the event
listener implementation will be replacing List with something randomly
accessible anyway, then it's irrelevant.)

You mentioned that many more tests are needed. I know that I'd
originally suggested adding a Python script gtest_shuffle_test.py that
does the following:
1) Runs --gtest_shuffle --gtest_repeat=3 and verifies non-repeating
seeds.
2) Runs --gtest_shuffle --gtest_random_seed=n and verifies that the
order does in fact change.
3) Runs a test suite containing death tests 10 times or so and verifies
that death tests always occur before non-death tests.
What other tests would you like to see?

http://codereview.appspot.com/52057/diff/1/3
File include/gtest/internal/gtest-internal.h (right):

http://codereview.appspot.com/52057/diff/1/3#newcode761
Line 761: explicit Random(UInt32 state = 1) : state_(state) {}
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Google's C++ style guide bans default arguments.

I saw that the guide also bans overloading used to simulate default
arguments, so I wasn't certain what to do.

Is
Random() : state_(1) {}
explicit Random(UInt32 state) : state_(state) {}
okay even though it's overloaded to (sort of) simulate default
arguments?

http://codereview.appspot.com/52057/diff/1/3#newcode763
Line 763: int Generate(int range);
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Why aren't the range and return type UInt32?

I saw that the style guide strongly discourages the use of unsigned, and
so it seemed better to treat the unsignedness and bit width as
implementation details and use the default int for the public interface.

http://codereview.appspot.com/52057/diff/1/5
File src/gtest-internal-inl.h (right):

http://codereview.appspot.com/52057/diff/1/5#newcode1373
Line 1373: internal::Random* random_;
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Why not use an object (instead of a pointer)?

I think that was left over from an earlier design I'd tried. Thanks for
catching this.

http://codereview.appspot.com/52057/diff/1/6
File src/gtest.cc (right):

http://codereview.appspot.com/52057/diff/1/6#newcode257
Line 257: const UInt32 kM = 0x7fffffffu; // analogous to RAND_MAX
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Why isn't this 0xffffffff?

This (as well as kA and kC) came from glibc's implementation of rand().

After rereading the Wikipedia article on LCGs, it looks like other
numbers would give better results, but only if I use a 64-bit temporary.
I assume that's not worth doing.

http://codereview.appspot.com/52057/diff/1/6#newcode264
Line 264: return int(double(state_) / (double(kM) + 1) * range);
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Is this any better than state_ % range?

state_ % range introduces a downward bias, since values for state_ in
the range of kM - kM % range through kM, inclusive, result in numbers in
the lower portion of [0, range).

Converting via double also introduces bias, since the number of values
of state_ that map to each value of [0, range) isn't consistent, but the
bias is at least spread throughout.

Converting via double should also help avoid the problem that some LCGs
have where the lower bits have relatively little randomness.

http://codereview.appspot.com/52057/diff/1/6#newcode2677
Line 2677: "Note: Randomizing tests' orders with a seed of %i\n",
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> This info is better printed in OnUnitTestStart() as we want it to be
easily
> findable.

Printing in OnUnitTestStart() would look nicer, but printing it here
reflects any changes that the user makes in a test environment.

Which do you think is better?

http://codereview.appspot.com/52057/diff/1/6#newcode4377
Line 4377: GTEST_FLAG(random_seed) =
static_cast<Int32>(GetTimeInMillis());
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> It's best to treat the random_seed flag as read-only by gtest. We
don't want to
> lose the information that the user set it to 0, for example.

I don't follow. If it defaults to 0, then how does treating it as
read-only preserve the information that the user set it to 0? If 0
means "use a default/time-based seed," then what's wrong with
overwriting 0, whether 0 came from the user or the default?

If it is tentatively a goal to let the user manipulate the
shuffling-related flags in global and test case setup and teardown, then
I thought that reusing or overwriting random_seed would be the best way
to do this.

Please review this at http://codereview.appspot.com/52057

Affected files:
M include/gtest/gtest.h
M src/gtest-internal-inl.h
M src/gtest.cc
M test/gtest_unittest.cc

Zhanyong Wan (λx.x x)

unread,

Jun 29, 2009, 5:32:12 PM6/29/09

to jos...@gmail.com, zhanyo...@gmail.com, googletes...@googlegroups.com

On Wed, Jun 24, 2009 at 7:07 PM, <jos...@gmail.com> wrote:
> Reviewers: Zhanyong Wan,
>
> Message:
> I uploaded a reduced patch set, as you suggested, that only implements
> parsing, managing, and displaying the flags. (I realize that a couple
> of aspects of managing and displaying may change depending on your
> review here.)

Thanks!

> I tried to address your other comments, too, but feel free to ignore
> those until those patches are ready.
>
> I honestly didn't think that shuffling the list was that bad (although
> the unit tests for shuffling were pretty ugly). To help me understand
> better, could you explain your concerns? (Of course, if the event
> listener implementation will be replacing List with something randomly
> accessible anyway, then it's irrelevant.)

My concern is that the code is not obviously correct. I believe the
implementation can be much more straightforward if we use a
random-access data structure to hold the tests.

I agree the shuffling logic is not that bad, so the concern is not a
big one. I just felt that it would've been nicer to do this after we
have this random-access data structure implemented.

We haven't determined the priority of the random-access data
structure. I'll give it more thoughts soon.

> You mentioned that many more tests are needed. I know that I'd
> originally suggested adding a Python script gtest_shuffle_test.py that
> does the following:
> 1) Runs --gtest_shuffle --gtest_repeat=3 and verifies non-repeating
> seeds.
> 2) Runs --gtest_shuffle --gtest_random_seed=n and verifies that the
> order does in fact change.
> 3) Runs a test suite containing death tests 10 times or so and verifies
> that death tests always occur before non-death tests.

10 times are probably too much as we want the test suite to be fast.
Once should enough.

> What other tests would you like to see?

- --gtest_shuffle leads to a random seed based on the clock.
- --gtest_shuffle --gtest_random_seed=0 does the same as above.
- shuffling doesn't interfere with --gtest_list_tests.
- shuffling works with sharding.
- shuffling works with --gtest_filter.
- shuffling works with --gtest_also_run_disabled_tests.
- the seed is printed in the output.
- the seed is in the XML report.

Thanks,

--
Zhanyong

Josh Kelley

unread,

Jul 9, 2009, 10:20:03 PM7/9/09

to Zhanyong Wan (λx.x x), zhanyo...@gmail.com, googletes...@googlegroups.com

I uploaded a new version of the patch to
http://codereview.appspot.com/52057. I apologize for the delay in
taking care of this and for the mistakes in my last patch.