In order to seed a pseudo random number generator of type `EngineT` that
meets the requirements of *random number engine* (§ 20.5.1.4) with a
non-deterministic seed, most people seem to use code like in the
following snippet.
template <typename EngineT>
// requires RandomNumberEngine(EngineT)
void
seed_non_deterministically(EngineT& engine)
{
std::random_device rnddev {};
engine.seed(rnddev());
}
While apparently simple, this technique has the disadvantage that it can
only put the engine into one of
1ULL << CHAR_BIT * sizeof(std::random_device::result_type)
different states. If the state size of `EngineT` is less than or equal
to 32 bits, this is guaranteed to be fine. However, almost all random
number engines specified by the standard have considerably larger state
sizes. In order to seed such an engine in a way that each of its
possible states is about equally likely, I have resorted to using a
*seed sequence* (§ 26.5.1.2) created from an array of non-deterministic
random numbers. Baum mit Augen has independently discovered this
workaround and initiated a discussion [1] on *Code Review* where all
participants concluded that -- apart from minor tweaks -- there was no
better way to seed the engine. The code might look something like this.
template <typename EngineT, std::size_t StateSize = EngineT::state_size>
// requires RandomNumberEngine(EngineT)
void
seed_non_deterministically(EngineT& engine)
{
using result_type = typename EngineT::result_type;
std::array<result_type, StateSize> noise {};
std::random_device rnddev {};
std::generate(noise.begin(), noise.end(), std::ref(rnddev));
std::seed_seq seedseq(noise.cbegin(), noise.cend());
engine.seed(seedseq);
}
I have just realized that the above code is also flawed as it is not
guaranteed that
sizeof(std::random_device::result_type) >= sizeof(result_type)
In fact, on my implementation, this is not true. This defect can be
fixed but it shows how difficult it is to correctly seed the engine -- a
supposedly simple task.
The use of a seed sequence is also less than desirable in other
respects. It scrambles the numbers from its input sequence beyond
recognition in an attempt to remove any skew but `std::random_device` is
already supposed to produce highly uniform output so at best this
scrambling is wasting CPU cycles. At worst, it *introduces* skew. This
shortcoming could be avoided by implementing a custom seed sequence type
that takes the input sequence as given. A seed sequence also needs to
store its internal state in a dynamically allocated buffer which seems
wasteful and unnecessary.
I believe that the standard library should address this issue and
provide an easy-to-use and difficult-to-use-incorrectly way to seed a
random number engine with a non-deterministic seed such that each of its
possible states is entered with about equal chance.
The seemingly obvious solution of making `std::random_device` meet the
requirements of seed sequence unfortunately doesn't work because a seed
sequence must (1) have a finite state and (2) needs to be able to
serialize that state. This is contrary to the very idea of
`std::random_device`.
However, the only functionality of seed sequence that the `seed`
function of the random number engines *really* need is its `generate`
member function. Implementing this function would totally be in scope
for `std::random_device`.
As a matter of fact, it would also allow for a performance optimization.
Currently, my libstdc++ implementation makes a call to `read` (or even
`fread` on systems where `unistd.h` is not available) for each and every
call to `std::random_device::operator()` [*]. When generating more than
a few integers, this becomes incredibly slow. `generate` -- called with
appropriate iterators -- on the other hand, could be optimized to only
make a single system call that reads directly into the destination
buffer. This could also be useful in other contexts.
So my proposed change is to weaken the requirements for the `seed`
function taking a seed sequence (and the corresponding constructor) in
random number engine to not require a seed sequence but only a type with
the `generate` member function specified in the requirements for seed
sequence. In addition, such member function should be added to the
specification of `std::random_device`. I think this is a small
non-breaking change that would be highly useful in many programs.
I would be happy to hear feedback about the proposed change. I'd also
be interested in opinions about what procedure should I follow to push
this further. Should I write a proposal or file a defect report?
Thanks for reading
Moritz
Footnotes:
[*] It does this when the source of entropy is a file like
`/dev/urandom`. On platforms that support it, libstdc++ will use a
special hardware instruction to generate non-deterministic random
numbers extremely fast.
References:
[1]
http://codereview.stackexchange.com/q/109260