Den 2011-09-28 21:30:29 skrev Ryan King <ryan...@gmail.com>:
> I keep seeing my checks fail due to to many tests being discarded. I
> use arbitrary[Long] with a uniqueness filter heavily for my
> checks[1], so I looked into whether it was generating random numbers.
> It isn't doing that well.
>
> I generated 100,000 random longs from this [2] and here's the top 10
> results (output from uniq -c):
>
> 10139 0
> 10111 -4611686018427387904
> 10006 1
> 9982 -1
> 9935 4611686018427387903
> 1 999408985813006357
> 1 999404514076553559
> 1 999243136190674913
> 1 999141327322374406
> 1 998717611447317269
>
> Is there something I can tune to get better random numbers?
Well, arbitrary[Long] does give higher weight to the edge
cases (-1, 0, 1, MIN_LONG/2, MAX_LONG/2). The sum of the
weights for the edge cases is equal to the weight of the
rest of the numbers, and this is completely in line with
your findings. There is no way to tweak the arbitrary generator.
Maybe the weights could be adjusted a bit, so we didn't get
quite as many edge cases generated.
However, if you want implement a generator that produces
unique Longs, it is probably better to start out with the
Gen.choose generator instead of Arbitrary.arbitrary:
Gen.choose(Long.MinValue/2, Long.MaxValue/2).filter(new UniqueLong())
(the reason you have do divide max and min by 2 is that
Gen.choose can't handle an interval that is larger than
Long.MaxValue.)
Would that help you?
Best regards,
/ Rickard
* There are lots of useful distributions for a given type. For
example, for number types like Long we might want:
- A uniform distribution across all long values, possibly with edge
cases boosted as Scalacheck does by default.
- What I call a "log uniform" distribution (not sure if this has
another name) - where a number between 1 - 10 is as likely as one
between 10-100, and as likely as one between 100-1000. Ie, the
distribution becomes sparser at larger values. This is what I use as
my default.
- A bounded distribution, such as positive numbers only, or 0-1 for
fractional numbers.
So I wondered whether I could write properties dependent upon values
(of type T) sourced from Arbritrary[T], and still be free to vary the
underlying generator/distribution. This can be done, with a little
care, by
- Keep definitions of generators separate from the definition of
Arbitrary implicit instances for them.
- Import the generator you wish to use into scope, overriding the
default (global) Scalacheck Arbitrary instance.
* I think quite alot about the idea of generator /composition/, ie,
composing complex objects automatically. For example, a random
triangle can be constructed from 3 random points. And a random point
can be constructed (in 2D) from 2 random numbers.
- When we do this kind of composition, its likely that the underlying
component generators get filtered, transformed and merged in
unpredictable ways. So its not safe to assume that a all of a
generator's values get used, or that they're used in the same order as
they are generated. For example, a integer generator that always emits
(0, 1, -1, +Max/2, -Max/2, <random>, <random> ...) is going to
generate a very poor and predictable first triangle if its leading 6
values are used to build 3 triangle coordinates.
My rule of thumb: generators should always be stochastic and be
capable of revisiting every possible emitted value.
* I think there is a fairly generic parameter to generators: "the
amount that boundary and special values frequency is boosted". For
example, Rickard said below that boundary values are boosted to 50% of
all Long values. When composing generators, it may not make sense to
use such a rich mix of boundary values in component generators. For
example, building triangles out of points: we probably dont want half
of all triangle coordinates to lie on boundaries. Triangles have
different special cases anyway (eg very thin and pointed, right
angle). It would be nice if Scalacheck allows the passing some some
optional parameter to generators that could specify how big the "boost
factor" for special case data items was.
-Ben
> --
> You received this message because you are subscribed to the Google Groups
> "scalacheck" group.
> To post to this group, send email to scala...@googlegroups.com.
> To unsubscribe from this group, send email to
> scalacheck+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/scalacheck?hl=en.
>
>
When I first read this, I thought to myself "Surely a Gen[Double] for
the range [-1, 1], which boosted edge cases to eg 10% frequency, would
be simpler, and yield corner cases acceptably often?" But on further
reflection, I realised that the chance of this generating a square
with all 4 corners at the extremes is then 10E-8!
With edge cases 50% likely its P(1/256), which is acceptably often IMO.
Thats another use-case for giving the user a dial where they can turn
edge-case-frequency up and down.
-Ben