Allow Seeding Random Engines with `std::random_device`

117 views
Skip to first unread message

Moritz Klammler

unread,
Jan 2, 2016, 10:11:39 PM1/2/16
to std-pr...@isocpp.org
In order to seed a pseudo random number generator of type `EngineT` that
meets the requirements of *random number engine* (§ 20.5.1.4) with a
non-deterministic seed, most people seem to use code like in the
following snippet.

template <typename EngineT>
// requires RandomNumberEngine(EngineT)
void
seed_non_deterministically(EngineT& engine)
{
std::random_device rnddev {};
engine.seed(rnddev());
}

While apparently simple, this technique has the disadvantage that it can
only put the engine into one of

1ULL << CHAR_BIT * sizeof(std::random_device::result_type)

different states. If the state size of `EngineT` is less than or equal
to 32 bits, this is guaranteed to be fine. However, almost all random
number engines specified by the standard have considerably larger state
sizes. In order to seed such an engine in a way that each of its
possible states is about equally likely, I have resorted to using a
*seed sequence* (§ 26.5.1.2) created from an array of non-deterministic
random numbers. Baum mit Augen has independently discovered this
workaround and initiated a discussion [1] on *Code Review* where all
participants concluded that -- apart from minor tweaks -- there was no
better way to seed the engine. The code might look something like this.

template <typename EngineT, std::size_t StateSize = EngineT::state_size>
// requires RandomNumberEngine(EngineT)
void
seed_non_deterministically(EngineT& engine)
{
using result_type = typename EngineT::result_type;
std::array<result_type, StateSize> noise {};
std::random_device rnddev {};
std::generate(noise.begin(), noise.end(), std::ref(rnddev));
std::seed_seq seedseq(noise.cbegin(), noise.cend());
engine.seed(seedseq);
}

I have just realized that the above code is also flawed as it is not
guaranteed that

sizeof(std::random_device::result_type) >= sizeof(result_type)

In fact, on my implementation, this is not true. This defect can be
fixed but it shows how difficult it is to correctly seed the engine -- a
supposedly simple task.

The use of a seed sequence is also less than desirable in other
respects. It scrambles the numbers from its input sequence beyond
recognition in an attempt to remove any skew but `std::random_device` is
already supposed to produce highly uniform output so at best this
scrambling is wasting CPU cycles. At worst, it *introduces* skew. This
shortcoming could be avoided by implementing a custom seed sequence type
that takes the input sequence as given. A seed sequence also needs to
store its internal state in a dynamically allocated buffer which seems
wasteful and unnecessary.

I believe that the standard library should address this issue and
provide an easy-to-use and difficult-to-use-incorrectly way to seed a
random number engine with a non-deterministic seed such that each of its
possible states is entered with about equal chance.

The seemingly obvious solution of making `std::random_device` meet the
requirements of seed sequence unfortunately doesn't work because a seed
sequence must (1) have a finite state and (2) needs to be able to
serialize that state. This is contrary to the very idea of
`std::random_device`.

However, the only functionality of seed sequence that the `seed`
function of the random number engines *really* need is its `generate`
member function. Implementing this function would totally be in scope
for `std::random_device`.

As a matter of fact, it would also allow for a performance optimization.
Currently, my libstdc++ implementation makes a call to `read` (or even
`fread` on systems where `unistd.h` is not available) for each and every
call to `std::random_device::operator()` [*]. When generating more than
a few integers, this becomes incredibly slow. `generate` -- called with
appropriate iterators -- on the other hand, could be optimized to only
make a single system call that reads directly into the destination
buffer. This could also be useful in other contexts.

So my proposed change is to weaken the requirements for the `seed`
function taking a seed sequence (and the corresponding constructor) in
random number engine to not require a seed sequence but only a type with
the `generate` member function specified in the requirements for seed
sequence. In addition, such member function should be added to the
specification of `std::random_device`. I think this is a small
non-breaking change that would be highly useful in many programs.

I would be happy to hear feedback about the proposed change. I'd also
be interested in opinions about what procedure should I follow to push
this further. Should I write a proposal or file a defect report?


Thanks for reading

Moritz


Footnotes:

[*] It does this when the source of entropy is a file like
`/dev/urandom`. On platforms that support it, libstdc++ will use a
special hardware instruction to generate non-deterministic random
numbers extremely fast.

References:

[1] http://codereview.stackexchange.com/q/109260

Nicol Bolas

unread,
Jan 2, 2016, 11:08:11 PM1/2/16
to ISO C++ Standard - Future Proposals

Random-access iterators are not necessarily contiguous. So you aren't allowed to read into the iterators passed to `generate` like that. Of course, you can always create your own buffer, read into that, and then write each value out to the random-access iterators.

That being said, requiring that these be contiguous iterators (once we get those into the standard) would be entirely legitimate.
 
This could also be useful in other contexts.

So my proposed change is to weaken the requirements for the `seed`
function taking a seed sequence (and the corresponding constructor) in
random number engine to not require a seed sequence but only a type with
the `generate` member function specified in the requirements for seed
sequence.  In addition, such member function should be added to the
specification of `std::random_device`.  I think this is a small
non-breaking change that would be highly useful in many programs.

I would be happy to hear feedback about the proposed change.  I'd also
be interested in opinions about what procedure should I follow to push
this further.  Should I write a proposal or file a defect report?


Thanks for reading

Moritz


Footnotes:

[*] It does this when the source of entropy is a file like
    `/dev/urandom`.  On platforms that support it, libstdc++ will use a
    special hardware instruction to generate non-deterministic random
    numbers extremely fast.

References:

[1] http://codereview.stackexchange.com/q/109260

There may be something you have not considered here.

For all random number engines, the constructor that takes a seed sequence and the `seed` member function that takes such a sequence is required to:

1) Set up the initial state such that it "depends on a sequence produced by one call to `q.generate`" (emphasis added)

2) Have complexity that is the "same as complexity of `q.generate` called on a sequence whose length is size of state".

The quotes are taken from Table 117 in section 26.5.1.4, p4 (in N4140).

Given those complexity guarantees, there's not much a valid implementation of sequence-based initializers can do. They can't call `param`, since that breaks the complexity guarantee. They can't create a non-default sequence. And sequences aren't required to be copy/moveable. So the only valid operations they could do and still maintain their required complexity guarantees are default construction, calling `generate`, and calling `size`.

So I think your requirements are already satisfied in that regard.

Granted, what that will do once concepts come to random number engines, I can't say. But for the time being, there's no reason you couldn't get away with just providing `generate` and a constant `size`.

That's not to say that a specialized tool for doing this would not be unreasonable. I don't much care for having to use seed_seq's memory allocation just to seed an RNG, so a general solution would be good.

I think it would be best to do it with a template function based on the engine:

template<typename engine>
engine random_device_seed
();

The function would be able to use a compile-time determined size for the state of the seed sequence, since it knows exactly how much the `engine` will require. So it won't need to allocate any memory, outside of the stack.

Zhihao Yuan

unread,
Jan 2, 2016, 11:40:38 PM1/2/16
to std-pr...@isocpp.org
On Sat, Jan 2, 2016 at 9:11 PM, Moritz Klammler
<moritz....@gmail.com> wrote:
>
> So my proposed change is to weaken the requirements for the `seed`
> function taking a seed sequence (and the corresponding constructor) in
> random number engine to not require a seed sequence but only a type with
> the `generate` member function specified in the requirements for seed
> sequence. In addition, such member function should be added to the
> specification of `std::random_device`. I think this is a small
> non-breaking change that would be highly useful in many programs.

Same idea came up here:

"C++ Seeding Surprises"
http://www.pcg-random.org/posts/cpp-seeding-surprises.html

, but if you continue read all the articles posted later by
the same author, you may find that tweaking random_device
etc is not a preferable way; we need a guaranteed source
of entropy for seeding, and also better seed sequences.
Purely addition, doing better job.

> I would be happy to hear feedback about the proposed change. I'd also
> be interested in opinions about what procedure should I follow to push
> this further. Should I write a proposal or file a defect report?

That will be a proposal, maybe even multiple ones. You may
want to contact the author of the blog I mentioned above for
co-authoring.

--
Zhihao Yuan, ID lichray
The best way to predict the future is to invent it.
___________________________________________________
4BSD -- http://bit.ly/blog4bsd

Moritz Klammler

unread,
Jan 3, 2016, 9:48:56 PM1/3/16
to std-pr...@isocpp.org
>> As a matter of fact, it would also allow for a performance
>> optimization. Currently, my libstdc++ implementation makes a call to
>> `read` (or even `fread` on systems where `unistd.h` is not available)
>> for each and every call to `std::random_device::operator()` [*].
>> When generating more than a few integers, this becomes incredibly
>> slow. `generate` -- called with appropriate iterators -- on the
>> other hand, could be optimized to only make a single system call that
>> reads directly into the destination buffer.
>
>
> Random-access iterators are not necessarily contiguous. So you aren't
> allowed to read into the iterators passed to `generate` like that. Of
> course, you can always create your own buffer, read into that, and
> then write each value out to the random-access iterators.

My idea was that implementations could specialize the template for the
case where the iterators are raw pointers (or types from the standard
library that are known to wrap raw pointers like
`std::vector::iterator`). Of course, they'd still need a fallback for
when the iterators don't refer to contiguous memory. Anyway, this is
only a potential optimization that should be left to the
implementation. The standard should not mandate anything like this.

> There may be something you have not considered here.
>
> For all random number engines, the constructor that takes a seed
> sequence and the `seed` member function that takes such a sequence is
> required to:
>
> 1) Set up the initial state such that it "depends on a sequence
> produced by *one call* to `q.generate`" (emphasis added)
>
> 2) Have complexity that is the "same as complexity of `q.generate`
> called on a sequence whose length is size of state".
>
> The quotes are taken from Table 117 in section 26.5.1.4, p4 (in N4140
> <http://wg21.link/N4140>).
>
> Given those complexity guarantees, there's not much a valid
> implementation of sequence-based initializers can do. They can't call
> `param`, since that breaks the complexity guarantee. They can't create
> a non-default sequence. And sequences aren't required to be
> copy/moveable. So the only valid operations they could do and still
> maintain their required complexity guarantees are default
> construction, calling `generate`, and calling `size`.
>
> So I think your requirements are already satisfied in that regard.
>
> Granted, what that will do once concepts come to random number
> engines, I can't say. But for the time being, there's no reason you
> couldn't get away with just providing `generate` and a constant
> `size`.

I wasn't aware of those complexity constraints.

I have in fact tried out the following and it "works" with current
libstdc++ (GCC 5.3.0). (I have also verified this by reading the source
code.)

#include <algorithm>
#include <functional>
#include <iostream>
#include <random>

namespace /* anonymous */
{

class nd_seed_seq final
{

private:

std::random_device rnddev_ {};

public:

template <typename OutputIterT>
void
generate(const OutputIterT first, const OutputIterT last)
{
std::generate(first, last, std::ref(this->rnddev_));
}

};

}

int
main()
{
nd_seed_seq ndseed {};
std::mt19937 rndeng {ndseed};
std::cout << rndeng() << std::endl;
}

However, I believe that it formally invokes undefined behavior because
`nd_seed_seq` does not meet the requirements of *seed sequence*. Even
if an implementation might not be allowed to *call* those functions, it
certainly would be allowed to `static_assert` on their existence. After
thinking about your remarks, however, I think that I could get away with
implementing those member functions with the correct signature but
make their bodies simply unconditionally throw an exception or likewise.

Nevertheless, while this might be a feasible workaround for C++14, I
would very much welcome a standard library feature to do this in a
simple way without having to write my own adapters.

Moritz Klammler

unread,
Jan 3, 2016, 9:52:24 PM1/3/16
to std-pr...@isocpp.org
>> So my proposed change is to weaken the requirements for the `seed`
>> function taking a seed sequence (and the corresponding constructor)
>> in random number engine to not require a seed sequence but only a
>> type with the `generate` member function specified in the
>> requirements for seed sequence. In addition, such member function
>> should be added to the specification of `std::random_device`. I
>> think this is a small non-breaking change that would be highly useful
>> in many programs.
>
> Same idea came up here:
>
> "C++ Seeding Surprises"
> http://www.pcg-random.org/posts/cpp-seeding-surprises.html
>
> , but if you continue read all the articles posted later by the same
> author, you may find that tweaking random_device etc is not a
> preferable way; we need a guaranteed source of entropy for seeding,
> and also better seed sequences. Purely addition, doing better job.
>
>> I would be happy to hear feedback about the proposed change. I'd
>> also be interested in opinions about what procedure should I follow
>> to push this further. Should I write a proposal or file a defect
>> report?
>
> That will be a proposal, maybe even multiple ones. You may want to
> contact the author of the blog I mentioned above for co-authoring.

That's a great article, thank you. It's assuring to see that other
people already had the same idea. I have contacted the author and asked
her whether she would be interested in co-authoring a proposal.
Reply all
Reply to author
Forward
0 new messages