Some notes on random number generation

32 views

Skip to first unread message

Alaric Snell-Pym

unread,

Mar 24, 2026, 3:55:27 PMMar 24

to scheme-reports-wg2

First, a little history: the typical provision of random numbers in
programming language standard libraries is typified by the rand() and
srand() functions present in C. rand() returns an unsigned integer from
0 to RAND_MAX and srand sets the global seed to a provided unsigned
integer. Traditionally these were implemented by a simple generator with
a single integer of internal state; an attacker who obtains few (maybe
as few as one!) outputs from rand() can generally figure out the entire
state and, thus, predict the entire sequence. Users who don't want the
same sequence on every run of the program are encouraged to seed the RNG
from the system clock, as the initial state of the seed isn't
particularly specified, and users who do want a repeatable
sequence are able to specify a known seed - but as the algorithm used
isn't specified, the scope of that repeatability isn't very clearly
defined; repeated runs of the same compiled binary can probably be
assumed to produce the same results (although dynamic linking to libc
might even put that in jeopardy) but the sequences produced by different
C compilers or on different operating systems or architectures may vary.

Uses for random numbers are varied, and can include:

- Things where an attacker guessing future outputs from past outputs, or
bein able to influence future outputs in some way, would be a disaster -
eg, picking short-term session keys, choosing hard-to-guess unique IDs,
or making long-term cryptographic keys.

- Statistical simulations, where you'll usually want random numbers with
good statistical properties (known distributions), and may or may not
need the ability to repeat a "run" with the same sequence

- Testing applications such as property-based testing or fuzzing where
you want to apply a wide variety of inputs to a component under test,
and check that its output/behaviour are valid in all cases.
Repeatability may be desirable when a test run finds a problem, so that
a proposed fix for that problem can then be tested with exactly the same
sequence of inputs to confirm whether it worked or not, so it's not
uncommon to either always run with the same seed or to pick a random
seed and log it so that a run can be "re-run" by providing the logged seed.

- Computer games, where unpredictability is needed to model the game
world. Reproducibility might be desired if the random source is used
to drive a procedurally generated world that needs to look the same
every time the player returns to it, but also, unpredictability may be
desired when modelling the reactions of non-player characters and other
such events that are meant to surprise the player; multiplayer online
games have sometimes suffered from players being able to manipulate RNGs
to their benefit (eg, Minecraft - not just online gambling!). So
computer games may need either secure or repeatable random numbers,
depending on context!

- Toy programs, that demonstrate a principle but aren't necessarily
meant for "real-world" use, such as filling an array with random numbers
to demonstrate a sorting algorithm, making simple "guess the number!
Higher or lower!" games as a programming exercise, etc. I don't want to
underestimate this category - educational examples are important - but
it is a niche.

rand()/srand() provides a rather poor service to users - the sequences
are quite likely to be predictable from previous outputs, but the effect
of re-using a seed to repeat the sequence is also poorly specified.
Also, the fact that rand() returns a number from 0 to RAND_MAX has led
to the convention of using "rand() % max" to obtain an integer in the
range from 0..max-1, which won't produce a uniform distribution if max
isn't a factor of RAND_MAX (imagine RAND_MAX is 2 so the results of
rand() can be 0, 1, or 2 and max is 2 so we want 0 or 1 as results - the
result will be 0 if rand() returns 0 or 2 or 1 if rand() returns 1, so
returning 0 is twice as likely as 1). However, this kind of interface is
traditional and it's become what people expect from a default random
number source in a standard library - but I think this is unfortunate
and we should go back to the actual needs of random number users rather
than copying this pattern. I'm suspicious of a poorly specified "default
random number generator" - whose needs are it adequate for? Just the
writers of toy programs in education? But their needs are just as well
served by any specific RNG they could pick out of a hat.

The search for unpredictable "secure" random numbers is an interesting
one, because it presents a technical challenge in that computers are
largely deterministic devices. You can get dedicated hardware that
digitises quantum noise sources, but such things add cost and
potentially add vulnerabilities: such hardware is often hard to inspect
and makes an attractive target for attackers who can plant somebody
inside the chip design house or even fabrication plant. Another popular
source of nondeterminism is external events such as the contents or
arrival times of network packets, and detectable timing differences in
the behaviour of less-deterministic (eg, temperature-sensitive) hardware
in the system. Gaining reliable access to all this low-level hardware
timing information tends to be difficult outside of the kernel, so it
has become traditional for operating systems to offer a kernel-based
randomness source; this is usually a secure (eg, hard to predict) PRNG
whose internal state (known as the "pool") is constantly being "fed"
with nondeterministic information from hardware devices as it becomes
available. Outputs from this PRNG are made available via syscalls or
special devices like /dev/random.

Earlier implementations tried to track "how random" the pool was at any
point in time, increasing an entropy estimate as randomness was fed in
from hardware and reducing it as random bytes were pulled out by users.
/dev/random would block on the available entropy, only emitting a byte
every time the entropy counter exceeded eight bits (and reducing it by
eight each time), while /dev/urandom wouldn't block (but would still
reduce the counter). However, over time it was decided that this was a
meaningless distinction in practice, and possibly leaked information,
and opened an avenue for denial of service by using up all the entropy
so that things like TLS connection setups would then stall; so these
days /dev/random and /dev/urandom are either identical or perhaps /dev/
random blocks only when the kernel entropy pool hasn't yet been filled
after boot; the manpages on my linux system seem to record a history of
thinking on this changing over time and the promises made by random(7)
and random(4) seem mildly contradictory if I read them correctly!

Which leads us onto the weaknesses of kernel entropy pools - any modern
OS will provide one, so they're largely ubiquitous, but can't always be
relied upon to be safe. A freshly booted system will have the pool
initialised to some initial state. Modern Linux will, during shutdown,
save a poolfull of randomness into a file that is read into the pool in
early boot to conserve entropy across reboots, but LiveCDs or virtual
machines whose disk state is reset will not have this luxury, and will
be reliant on entropy from some external source to provide
non-predictable randomness. Virtual machines (and a lot of software is
run in VMs now, eg in cloud providers) will often have limited access to
hardware-level timing randomness, and limited access to accurate clocks
to measure it with, so even kernel random pools may be predictable (or
manipulable) in some contexts. However, they're often the best we've
got, and as application developers, we have little control over (or
ability to detect) the circumstances under which they're not to be
trusted, so we tend to have to just use them and hope that whoever hosts
the application has made the correct security decisions
in doing so.

Access to in-kernel entropy pools can be slow, as even without the issue
of blocking on available entropy, it's still a syscall and can involve
gaining exclusive locks inside the kernel to access the entropy pool, a
limitation on highly multicore systems. As such, applications may wish
to operate their own entropy pools, perhaps even dedicated to each
thread to avoid locking and fighting over cache lines, initialising them
from the kernel randomness source at startup and periodically injecting
some entropy from the kernel. Between such kernel entropy injections,
this pool operates as a pseudorandom number generater - whose output is
repeatable if it's reset to the known seed! But because it's only ever
seeded from secure random sources, and algorithms will be chosen that
are deemed to not allow an attacker with access to past outputs to
be able to predict future outputs (even with an attacker who can request
outputs under their control - imagine an algorithm with a small weakness
that the next output would have a known bias if the previous output fell
into a certain range; an attacker could request outputs until they see
one in that range then sit back and wait for the system they're
attacking to generate a biased output).

As such, there's a number of categories a random source might have, from
the perspective of a userland process:

1) Is it a fully deterministic algorithm, starting from a default or
provided seed?

2) Is it a hopefully-secure external entropy source, eg a kernel entropy
pool or a dedicated hardware randomness-source device accessed from
userland?

3) Is it a hybrid of the two - a fully deterministic type (1) algorithm,
but seeded from an external type (2) entropy source?

And then a separate property with is orthogonal to type (1) but implied
by types (2) and (3):

4) Is its output unpredictable, even by an adversarial user who can read
past outputs and stimulate the generation of outputs on demand?

A type (1) and type (4) algorithm is often known as a "CPRNG"
(cryptographic PRNG). If you just say "PRNG" you probably mean a type
(1) that isn't type (4).

Ideally, cryptographic applications need a type (4) source that's also
type (2) or (3). One might say that type (2) is better than type (3) as
the hybrid approach introduces the risk of vulnerabilities in the
underlying type (1) algorithm, but the in-kernel randomness source of
(2) is probably actually a type (3) system inside the kernel. A type (1)
system can still be suitable for cryptographic applications as long as
it is still type (4) and the attacker can't find the seed - a system
with no hardware randomness sources at all could still be provided an
initial random number by a user, or something like its own serial number
(as long as they're not predictable!), and generate fully deterministic
random numbers that an attacker can't exploit *as long as
property (4) holds reliably*.

Anyway, enough history and background. I have a few open questions for
the group to consider:

1) The source of random bits is distinct from converting those random
bits into useful things from the basic "evenly distributed integers in a
specified range" to exotic things like interesting distributions or
points on a sphere, and the algorithms for doing so aren't necessarily
trivial to get right, so I think it's uncontroversial that there need to
be low-level libraries for PRNGs and kernel or hardware entropy sources,
and high-level libraries to turn their outputs into useful values, with
perhaps only applications that literally require raw random bytes (for
UUIDs or cryptographic keys and so on) not needing any higher-level
processing. Is "I would like a random bytevector of a specified length"
the correct interface between the high and low level, so an RNG source
might be characterised as a function that accepts a natural number and
returns a bytevector of that length? (I suspect so).

1a) I presume we would want to make it easy for users to construct
compliant random sources of their own devising, eg their own RNG
algorithm, and use the same higher-level libraries with it; and also to
construct higher-level distribution libraries that can run from any
compliant random source.

1b) Converting raw bytes into useful distributions, even in such simple
cases as "integers or reals in a range", is surprisingly hard (see Zhu's
recent email) so users shouldn't be encouraged to try and roll their own
- we definitely need libraries. I think providing a rand()-style "Return
a random fixnum in some range" is too low-level and dangerous for users
- they should either get raw random bytes or use a library that provides
problem-domain random data. Which can include things like "pick N random
elements from this list/vector with/without replacement" and "randomly
permute this vector/list" as well as the obvious return-a-number procedures.

2) Applications that want secure randomness shouldn't be accidentally
given an insecure RNG, as that presents a possibly serious safety
failure. Putting aside the issues of whether we can truly know if we
have secure randomness or not due to system entropy pool initialisation
problems as we can't do much about it, it seems like it would be a good
idea if an RNG source can be asked if it's repeatable, or secure
(unpredictable, even to an attacker), or some combination of the two
(one can have an RNG that is entirely deterministic but whose outputs
can't be used to predict future outputs because they don't leak useful
information about the state, or an RNG that is insecure and also
unrepeatable because we don't have a way to manually seed it). How
should this information be exposed? Just predicates on an RNG source -
(random-source-repeatable?) (my type (1) above) and
(random-source-adversarialy-unpredictable?) (my type (4) above) - and
users attempting to produce cryptographic keys should check they have an
unrepeatable and unpredictable source? As a convenience, should we
provide (assert-random-source-adversarialy-unpredictable) and
(assert-random-source-unrepeatable) procedures that calls (error) if the
source doesn't meet the requirements, to encourage users to check? Are
these distinctions too fine, or will stating them separately make users
more aware of the subtleties?

3) Going further, are we right to conflate cryptographic and statistical
uses of RNGs at all - should we have totally distinct libraries for the
two, each more highly specialised? The overlap between the two with a
type (3) hybrid approach might lead to some sharing of code *inside* the
implementation but that's not necessarily exposed to the users.

4) For applications that want repeatable randomness in a portable
manner, you really need to specify an algorithm as well as a seed. For
applications that want to do crypto things, you want to obtain a source
that have the appropriate cryptographic safety properties. Should we
still provide a weakly-specified "default random source", and who's it
for? If just toy programs, is there any benefit to leaving it weakly
specified, or do we expect toy program authors to import an xorshift
library and construct a random number source from (make-xorshift64-rng
(get-current-unix-time)) themselves? I'm nervous about the concept of a
"default" RNG and the implication that, as the default, it might be good
for any particular purpose. Perhaps I'd be happier if we provided a
means to obtain an *arbitrary* RNG rather than a *default* one - a user
who asks for "an arbitrary" RNG is making a more explicitly statement
that they don't care what RNG they get, and the implementation is free
to provide whatever they feel is a generally good performance/quality
tradeoff.

5) How much do we want to provide in R7RS-large, and where? Access to
kernel or hardware RNGs is clearly an environments issue, but such
facilities are now so widespread that we might be able to assume they're
as available as memory that can be allocated or an ALU that can do
addition, with maybe a caveat that the RNG is reliably there but might
block arbitrarily or raise an error that needs to be handled at run
time. PRNG algorithms can be portably implemented but may benefit from
platform-specific implementations that use assembly language
bit-twiddling to go faster, and how much do we provide out of the box so
users can rely on them being present, versus letting a thousand portable
libraries bloom? Bearing in mind that cryptographic algorithms are a
continually evolving landscape, with once highly prized algorithms
falling into disrepute and being replaced by faster, better, more
trusted ones; but even despite all my pearl-clutching about default
RNGs, educators may curse us if we don't provide a simple reliable
one-line access to low-grade randomish numbers out of the box.

--
Alaric Snell-Pym (M0KTN neé M7KIT)
http://www.snell-pym.org.uk/alaric/

Arthur A. Gleckler

unread,

Mar 24, 2026, 4:34:44 PMMar 24

to scheme-re...@googlegroups.com

On Tue, Mar 24, 2026 at 12:55 PM Alaric Snell-Pym <ala...@snell-pym.org.uk> wrote:

First, a little history: the typical provision of random numbers in

programming language standard libraries [...]

Wow, thanks for that detailed explanation. It's good to see all of that in one place.

I'm in favor of your proposal to provide random-source-repeatable?, etc. It's good to make it clear in code, not just comments, exactly what a program requires. One concern: it may be the case that the author of the underlying implementation does not know the answer to one of those Boolean questions, for example because their code is based on some underlying code that isn't explicit about that property. In that case, we should specify the conservative answer that should be given, for example #false in the case of random-source-repeatable?. (I'm assuming that there is actually a conservative answer in each case. I'd hate to have to provide ternary answers.)

Peter McGoron

unread,

Mar 24, 2026, 6:33:00 PMMar 24

to scheme-re...@googlegroups.com

Before we get too far into the weeds, we should look at what
implementations do. AFAIK the following are the Schemes that implement
cryptographic RNGs into their standard libraries:

* Racket:
https://docs.racket-lang.org/reference/generic-numbers.html#(def._((lib._racket%2Frandom..rkt)._crypto-random-bytes))
* Gerbil: https://gerbil.scheme.org/reference/std/crypto.html
* Sagittarius:
https://ktakashi.github.io/sections/lib.sagittarius.crypto.html#(sagittarius%20crypto%20random)_95

Sagittarius in particular has a configurable system that allows for
multiple PRNGs to share the same API, so they can be swapped out as needed.

Many implementations have PRNGs, some of them (e.g. Guile) state that
they are not for cryptography.

________________________________________________

I think a Scheme predicate that tries to say if the random source is
"cryptographic" is not a good idea. Such a thing is not algorithmically
decidable and will become out of date as time goes by. We should keep in
mind that a programmer might be pinned to a very old version of an
implementation that supports -Large, and may not get security updates to
transition the standard PRNG to something more up-to-date.

I would expect that people who care about cryptographic random number
generation will pull in a library, either libSSL or some CSPRNG written
in Scheme, to generate their random numbers. Hence I do not think that
we should mandate a CSPRNG in -Large, or mandate that the default PRNG
is crypto-safe.

__________________________________________________

This is my minimal sketch for what RNGs in Foundations looks like, to
support the RNG APIs of most Scheme implementations:

## Foundations

These procedures operate on the representation of fixnums and flonums
and deserve to be in Foundations. They are very tricky to get right
portably. The `generator` can probably be inlined by a fancy compiler.
It might be possible to use a bytevector input instead, but some
generator algorithms may take a variable number of random bytes to
generate their output.

(random-bytes->fixnum generator max)

Assume that `generator` generates uniform random bytes. Then return a
uniformly distributed fixnum between `[0, max)`.

(random-bytes->flonum generator min max inclusive-min? inclusive-max?)

Assume that `generator` generates uniform random bytes. Return a
uniformly distributed flonum between `(min, max)`, `(min, max]`, etc. as
appropriate. API is designed around Goualard's paper.

## Environments

(get-entropy! bv [start [end]])

Fill `bv` between `start` and `end` (default: whole bytevector) with
random bytes from a system entropy source. Raises `&no-entropy` if it
cannot fulfill this request. The implementation may supplement the
raised exception with more conditions.

## Batteries

Dunno. A pluggable RNG like in Sagittarius is my ideal, but if we can't
figure out a good enough API to embrace both fast PRNGs for video games
*and* fancy deterministic CSPRNGs, then we should leave it out.

Perhaps procedures to generate different distributions from
`random-bytes->flonum` without concern for what (P)RNG is used.

John Cowan

unread,

Mar 25, 2026, 12:47:10 AMMar 25

to scheme-re...@googlegroups.com

On Tue, Mar 24, 2026 at 4:34 PM Arthur A. Gleckler <a...@speechcode.com> wrote:

> such hardware is often hard to inspect and makes an attractive target for attackers who can plant somebody inside the chip design house or even fabrication plant.

If you can do that, I should think you are better off perverting the
CPU's behavior, something like the Pentium divide bug.

> Modern Linux will, during shutdown, save a poolfull of randomness into a file that is read into the pool in early boot to conserve entropy across reboots

The same is true for the BSDs and MacOS and for Windows (which stores
it in the registry).

> A type (1) and type (4) algorithm is often known as a "CPRNG" (cryptographic PRNG). If you just say "PRNG" you probably mean a type (1) that isn't type (4).

What are the advantages of having a type 1 (deterministic) algorithm
that is *not* type 4 (unpredictable)? Performance?

> Ideally, cryptographic applications need a type (4) source that's also type (2) [hardware] or (3) [hybrid]. One might say that type (2) is better than type (3) as the hybrid approach introduces the risk of vulnerabilities in the underlying type (1) algorithm, but the in-kernel randomness source of (2) is probably actually a type (3) system inside the kernel.

That being so, it seema to me that differentiating between them is not
important at the user level.

> A type (1) system can still be suitable for cryptographic applications as long as it is still type (4) and the attacker can't find the seed - a system with no hardware randomness sources at all could still be provided an initial random number by a user, or something like its own serial number (as long as they're not predictable!), and generate fully deterministic random numbers that an attacker can't exploit *as long as property (4) holds reliably*.

My understanding is (I may be wrong here) that there is no such thing
as a *permanently* unpredictable algorithm. Eventually it will loop,
though the Mersenne Twister loops only after 2^19937 - 1 requests,
which is a lot.

> 1) [...] Is "I would like a random bytevector of a specified length" the correct interface between the high and low level, so an RNG source might be characterised as a function that accepts a natural number and returns a bytevector of that length? (I suspect so).

To avoid allocation, there should also be an interface that accepts a
bytevector, a start index, and a length, like `bytevector-copy!`.

.> 1a) I presume we would want to make it easy for users to construct

compliant random sources of their own devising, eg their own RNG
algorithm, and use the same higher-level libraries with it; and also
to construct higher-level distribution libraries that can run from any
compliant random source.

Yes. SRFI 194 uses a parameter which you bind to specify the
random-number source, though it might be better to pass it as an
argument to all high-level algorithms.

> 1b) [...] I think providing a rand()-style "Return a random fixnum in some range" is too low-level and dangerous for users - they should either get raw random bytes or use a library that provides problem-domain random data. Which can include things like "pick N random elements from this list/vector with/without replacement" and "randomly permute this vector/list" as well as the obvious return-a-number procedures.

Definitely. An enhancement of 194 would serve this purpose. However,
I think there are advantages to providing the random-integer and
random-float interfaces in the Foundations, because they involve
bit-twiddling. If you want random bignums, you shouldn't have to
create random bytevectors only to throw them away right away.

> 2) [...] How should this information be exposed? Just predicates on an RNG source - (random-source-repeatable?) (my type (1) above) and (random-source-adversarialy-unpredictable?) (my type (4) above) - and users attempting to produce cryptographic keys should check they have an unrepeatable and unpredictable source?

I think a random source should export an immutable record of fixed
type, which contains the answers to these questions plus the
random-bytes, random-bytes!, and if they are provided the
random-integer and random-inexact functions. This is what Java calls
a service provider interface (SPI), which allows a plugin host to
introspect on the plugin.

> As a convenience, should we provide (assert-random-source-adversarialy-unpredictable) and (assert-random-source-unrepeatable) procedures that calls (error) if the source doesn't meet the requirements, to encourage users to check? Are these distinctions too fine, or will stating them separately make users more aware of the subtleties?

These are trivial variants of `assert`, so I wouldn't bother.

> 3) Going further, are we right to conflate cryptographic and statistical uses of RNGs at all - should we have totally distinct libraries for the two, each more highly specialised?

I think we *should* conflate them, using the SPI record to distinguish
when necessary. The algorithm that takes floats and gives you points
on the unit sphere shouldn't have to care where the floats come from.

> 4) [...] Should we still provide a weakly-specified "default random source", and who's it for?

People who don't care about the quality of their random numbers, which
is to say that money (or the equivalent) and/or lives are not at
stake.

> I'm nervous about the concept of a "default" RNG and the implication that, as the default, it might be good for any particular purpose. Perhaps I'd be happier if we provided a means to obtain an *arbitrary* RNG rather than a *default* one - a user who asks for "an arbitrary" RNG is making a more explicitly statement that they don't care what RNG they get, and the implementation is free to provide whatever they feel is a generally good performance/quality tradeoff.

The difference seems purely terminological, but I'm fine with it.

5) [...] how much do we provide out of the box so users can rely on
them being present

I think one seedable and perhaps predictable algorithm, one seedable
and unpredictable algorithm, and one unseedable and hardware/hybrid
algorithm should be guaranteed to be present, based on my current
understandings.

> educators may curse us if we don't provide a simple reliable one-line access to low-grade randomish numbers out of the box.

They surely will.

'Peter McGoron' writes:

> These procedures operate on the representation of fixnums and flonums and deserve to be in Foundations. They are very tricky to get right portably. The `generator` can probably be inlined by a fancy compiler. It might be possible to use a bytevector input instead, but some generator algorithms may take a variable number of random bytes to generate their output.

I think for that purpose we should add another field to the SPI,
namely `random-seed-size`, which returns how big the random seed needs
to be (in bytes) for this particular algorithm, or #f if there is no
way to set a seed.

> Assume that `generator` generates uniform random bytes. Then return a uniformly distributed fixnum between `[0, max)`.

I continue to believe that specifying fixnums is not a good idea, and
that this should return arbitrary size integers within a range, just
like your flonum procedure. Otherwise your API seems reasonable.

Alaric Snell-Pym

unread,

Mar 25, 2026, 5:41:29 AMMar 25

to scheme-re...@googlegroups.com

On 24/03/2026 22:28, 'Peter McGoron' via scheme-reports-wg2 wrote:
> Before we get too far into the weeds, we should look at what
> implementations do. AFAIK the following are the Schemes that implement
> cryptographic RNGs into their standard libraries:
>
> * Racket: https://docs.racket-lang.org/reference/generic-
> numbers.html#(def._((lib._racket%2Frandom..rkt)._crypto-random-bytes))
> * Gerbil: https://gerbil.scheme.org/reference/std/crypto.html
> * Sagittarius: https://ktakashi.github.io/sections/
> lib.sagittarius.crypto.html#(sagittarius%20crypto%20random)_95
>
> Sagittarius in particular has a configurable system that allows for
> multiple PRNGs to share the same API, so they can be swapped out as needed.
>
> Many implementations have PRNGs, some of them (e.g. Guile) state that
> they are not for cryptography.
>
> ________________________________________________
>
> I think a Scheme predicate that tries to say if the random source is
> "cryptographic" is not a good idea. Such a thing is not algorithmically
> decidable and will become out of date as time goes by. We should keep in
> mind that a programmer might be pinned to a very old version of an
> implementation that supports -Large, and may not get security updates to
> transition the standard PRNG to something more up-to-date.

Yes. I'd also like to avoid using the phrase "secure" and instead focus
on measurable properties like repeatability; but the fact that the
useful property of "an attacker can't predict future values from past
values or influence future values by influencing the requesting of
values" is (a) quite a mouthful and (b) only a property that can be
*believed* to be held by an algorithm and later turn out to be refuted
due to a clever attack.

Users of an RNG that want repeatability would want to get that exact
algorithm again in future if they ask for it, and that desire would
probably be more important than security properties of the RNG - the
kinds of applications you want repeatability probably aren't
applications where you need unpredictability. For entropy pools seeded
by external entropy to maximise unpredictability, all desire for
repeatability has been thrown out of the window anyway and so it's much
more likely the user would be asking for "the best algorithm you've got"
and be very glad if they don't need to change their code when that
changes. On the other hand, people writing cryptographic protocol
kernels might also be (rightly) control freaks who want to carefully
pick their CPRNG algorithm, even if it means needing to pick a new one
when fashions change - they want to choose when that happens and what to
move to next. Which suggests to me that an implementation should provide
a suite of named algorithms with standard interfaces (with some subset
of them mandated by the spec, for portability) and maybe also a pair of
"dealer's choice" constructors that return the implementation's choice
of the fastest RNG of 'acceptable quality' and the 'best' CPRNG.

But, again, I think neither should be called *the* default RNG - they
could be called something like the "default fast RNG" and the "default
impractical-to-predict RNG" :-)

> These procedures operate on the representation of fixnums and flonums
> and deserve to be in Foundations. They are very tricky to get right
> portably. The `generator` can probably be inlined by a fancy compiler.
> It might be possible to use a bytevector input instead, but some
> generator algorithms may take a variable number of random bytes to
> generate their output.
>
> (random-bytes->fixnum generator max)
>
> Assume that `generator` generates uniform random bytes. Then return a
> uniformly distributed fixnum between `[0, max)`.

Could we say "integer" instead of "fixnum"? Both to avoid getting into
numeric implementation details of fixnums, and because it implies
there's a maximum fixnum size that max is bound by, which would need to
be specified and the cases of users asking for more described and so on.
It might be helpful (and consistent with the float generator) to provide
min and max; I know users can just add a base offset, and having 0 as
the min is a common case, though.

>
> (random-bytes->flonum generator min max inclusive-min? inclusive-max?)
>
> Assume that `generator` generates uniform random bytes. Return a
> uniformly distributed flonum between `(min, max)`, `(min, max]`, etc. as
> appropriate. API is designed around Goualard's paper.

Likewise, why "flonum" rather than "inexact" or "inexact-real"?

OpenPGP_signature.asc

Alaric Snell-Pym

unread,

Mar 25, 2026, 6:04:31 AMMar 25

to scheme-re...@googlegroups.com

On 25/03/2026 04:46, John Cowan wrote:
> On Tue, Mar 24, 2026 at 4:34 PM Arthur A. Gleckler <a...@speechcode.com> wrote:
>
>> such hardware is often hard to inspect and makes an attractive target for attackers who can plant somebody inside the chip design house or even fabrication plant.
>
> If you can do that, I should think you are better off perverting the
> CPU's behavior, something like the Pentium divide bug.

The two may be one and the same, as hardware RNGs are often incorporated
in CPUs: https://en.wikipedia.org/wiki/RDRAND#Reception

However, targeting RNGs can be more attractive than general CPU exploits
because it's harder to detect a problem in an RNG - if you manage to
switch a target's RNG to emit a deterministic sequence known only to
you, it will still look like white noise to anyone unaware of the sequence.

>> A type (1) and type (4) algorithm is often known as a "CPRNG" (cryptographic PRNG). If you just say "PRNG" you probably mean a type (1) that isn't type (4).
>
> What are the advantages of having a type 1 (deterministic) algorithm
> that is *not* type 4 (unpredictable)? Performance?

Yeah, just performance I think. Xorshift is a popular example:
https://en.wikipedia.org/wiki/Xorshift

>> Ideally, cryptographic applications need a type (4) source that's also type (2) [hardware] or (3) [hybrid]. One might say that type (2) is better than type (3) as the hybrid approach introduces the risk of vulnerabilities in the underlying type (1) algorithm, but the in-kernel randomness source of (2) is probably actually a type (3) system inside the kernel.
>
> That being so, it seema to me that differentiating between them is not
> important at the user level.

Quite. The security community used to be concerned about the potential
for weaknesses in entropy pool mixing CPRNGS
(https://en.wikipedia.org/wiki/Fortuna_(PRNG) for example) and so for a
while experimented with making a distinction, but they seem to have
resisted analysis well, and been deployed in contexts with lots of
defence-in-depth (eg, the pool state is hidden in the kernel and
userland only gets to view the outputs - even "saving the pool across
reboots" involves saving some *output* from the pool and using it to
*reseed* the pool rather than having any kind of direct access), so the
distinct seems to be considered uninteresting these days.

>> A type (1) system can still be suitable for cryptographic applications as long as it is still type (4) and the attacker can't find the seed - a system with no hardware randomness sources at all could still be provided an initial random number by a user, or something like its own serial number (as long as they're not predictable!), and generate fully deterministic random numbers that an attacker can't exploit *as long as property (4) holds reliably*.
>
> My understanding is (I may be wrong here) that there is no such thing
> as a *permanently* unpredictable algorithm. Eventually it will loop,
> though the Mersenne Twister loops only after 2^19937 - 1 requests,
> which is a lot.

Yeah, information theory sadly introduces some upper bounds...
Thankfully they can be made astronomically large. And in an infinite
universe, even a quantum noise source will loop eventually, right? :-)

>> 1) [...] Is "I would like a random bytevector of a specified length" the correct interface between the high and low level, so an RNG source might be characterised as a function that accepts a natural number and returns a bytevector of that length? (I suspect so).
>
> To avoid allocation, there should also be an interface that accepts a
> bytevector, a start index, and a length, like `bytevector-copy!`.

y'know, I'm beginning to think we're reinventing binary input ports
here. Should we make the interface actually be a binary input port?

> .> 1a) I presume we would want to make it easy for users to construct
> compliant random sources of their own devising, eg their own RNG
> algorithm, and use the same higher-level libraries with it; and also
> to construct higher-level distribution libraries that can run from any
> compliant random source.
>
> Yes. SRFI 194 uses a parameter which you bind to specify the
> random-number source, though it might be better to pass it as an
> argument to all high-level algorithms.

Yeah. I'm referring more to the criticism of SRFI-27 that it offers no
way to make an object with your own algorithm inside that responds to
random-source? and the other functions thereon.

>> 1b) [...] I think providing a rand()-style "Return a random fixnum in some range" is too low-level and dangerous for users - they should either get raw random bytes or use a library that provides problem-domain random data. Which can include things like "pick N random elements from this list/vector with/without replacement" and "randomly permute this vector/list" as well as the obvious return-a-number procedures.
>
> Definitely. An enhancement of 194 would serve this purpose. However,
> I think there are advantages to providing the random-integer and
> random-float interfaces in the Foundations, because they involve
> bit-twiddling. If you want random bignums, you shouldn't have to
> create random bytevectors only to throw them away right away.

They're also "fundamental" enough, covering probably the vast majority
of statistical uses for random numbers, that providing them up-front and
tucking all the fun distributions away deeper in the libraries will
probably help most users find the procedures they need more quickly
anyway...

>> 3) Going further, are we right to conflate cryptographic and statistical uses of RNGs at all - should we have totally distinct libraries for the two, each more highly specialised?
>
> I think we *should* conflate them, using the SPI record to distinguish
> when necessary. The algorithm that takes floats and gives you points
> on the unit sphere shouldn't have to care where the floats come from.

I agree, personally; this was a bit of a devil's-advocate question, just
to make sure we've considered it.

> 5) [...] how much do we provide out of the box so users can rely on
> them being present
>
> I think one seedable and perhaps predictable algorithm, one seedable
> and unpredictable algorithm, and one unseedable and hardware/hybrid
> algorithm should be guaranteed to be present, based on my current
> understandings.

That would be a useful selection for all the cases I can think of offhand!

> 'Peter McGoron' writes:
>
>> These procedures operate on the representation of fixnums and flonums and deserve to be in Foundations. They are very tricky to get right portably. The `generator` can probably be inlined by a fancy compiler. It might be possible to use a bytevector input instead, but some generator algorithms may take a variable number of random bytes to generate their output.
>
> I think for that purpose we should add another field to the SPI,
> namely `random-seed-size`, which returns how big the random seed needs
> to be (in bytes) for this particular algorithm, or #f if there is no
> way to set a seed.

Yes. One thing I forgot to add in my initial screed: The "seed" of an
RNG is often exactly not the same thing as the "internal state", for
various good reasons - for example xorshift breaks if the internal state
is ever zero, and other RNGs could rely on other properties of the
state, but users shouldn't need to understand all these things to
provide a good "seed" value; so seed-setting is a function from a seed
to a state that should accept arbitrary seeds (within reasonable
constraints, eg "it's a bytevector of length X" or "an integer from 0 to
X") and deal with meeting all the constraints itself. Also, seeding
functions can attempt to prevent any bias in the seed from biasing the
resulting initial state in dangerous ways, similar to how key-derivation
functions are used to turn passphrases (often ASCII text, meaning bytes
with a very definite frequency distribution and inter-character
correlations) into evenly distributed cryptographic keys. That's why
setting a seed needs to be a distinct procedure to things like SRFI-27's
state-set that restores a previously saved internal state.

OpenPGP_signature.asc

Peter McGoron

unread,

Mar 25, 2026, 11:28:08 AMMar 25

to scheme-re...@googlegroups.com

To John:

> I think one seedable and perhaps predictable algorithm, one seedable
and unpredictable algorithm, and one unseedable and hardware/hybrid
algorithm should be guaranteed to be present, based on my current
understandings.

I think having 3 RNG algorithms is way too much and can end up confusing
more than helping.

If someone needs crytographic random bytes, they are generally going to
be using other libraries (libSSL, sockets, something handling an
encrypted file format, etc.) and are more likely to lean on a CSPRNG
library than the built-in implementation one for specificity and because
the library is easier to update.

> I continue to believe that specifying fixnums is not a good idea, and
that this should return arbitrary size integers within a range, just
like your flonum procedure. Otherwise your API seems reasonable.

I think you're right. `(random-bytes->integer generator min max)` is
probably better.

_________________________________________________________________

To Alaric:

> Likewise, why "flonum" rather than "inexact" or "inexact-real"?

The procedure is very geared towards IEEE 754 floats generated from that
range, as there are more considerations there. (I assume "flonum" is a
IEEE 754 float, which is not technically required, but I would support
it: see <https://codeberg.org/scheme/r7rs/issues/308>).

An implementation could make the whole set of inexact reals GNU MPFR
numbers, for instance. And random number generation over that in a
uniform way is probably another can-of-worms. So `random-bytes->flonum`
is something that has understandably portable semantics.

(Also, it makes strictly-real? numbers, not real? numbers, as the
numbers generated will always have an exact zero imaginary part.)

_________________________________________________________________

wrt. seeding and deterministic RNGs, the second part of my RNGs pre-SRFI
<https://codeberg.org/phm/user-configurable-random#random-source-descriptors>
has a record type and associated functions for

* serializing and de-serializing state,
* determining if two PRNG states are equivalent,
* generating random bytes from a PRNG state

The "default" (or whatever name) PRNG has a procedure `(make-ser-state
[obj])` which seeds the state. When called with zero arguments, it seeds
with the entropy source. When called with a fixnum argument, it seeds
the state in a deterministic way (if the state is larger than a fixnum,
some algorithm is used to map the fixnum range to non-trivial internal
states). It can also accept implementation-defined values, such as a
bytevector.

For example, SplitMix64 <https://prng.di.unimi.it/> can accept any 64
bit integer as its seed. This could be used to map the fixnums to
sequences of non-zero bytes suitable for seeding Xorshift, chacha20,
Mersenne twister, etc.

The random source descriptors (RSDs) deliberately don't have a seeding
function because it is expected that externally provided PRNGs will have
their own preferred ways of setting up the internal state. Only the
default/arbitrary PRNG is required to accept small integers.

Wolfgang Corcoran-Mathe

unread,

Mar 27, 2026, 5:47:47 PMMar 27

to scheme-re...@googlegroups.com

On 2026-03-25 10:04 +0000, Alaric Snell-Pym wrote:
> On 25/03/2026 04:46, John Cowan wrote:
> > What are the advantages of having a type 1 (deterministic) algorithm
> > that is *not* type 4 (unpredictable)? Performance?
>
> Yeah, just performance I think.

Correct me if I'm wrong, but wouldn't repeatability be the key
property which programmers would expect from a 1-and-not-4-type
random source?

> > To avoid allocation, there should also be an interface that accepts a
> > bytevector, a start index, and a length, like `bytevector-copy!`.
>
> y'know, I'm beginning to think we're reinventing binary input ports
> here. Should we make the interface actually be a binary input port?

A binary port interface would be better than the SRFI 158-generator
interface of SRFI 194, at the very least. (Generators have all kinds
of problems--which is not to say that ports are problem-free.)

* * *

Thank you again for your very detailed summary! I am still digesting
it.

--
Wolfgang Corcoran-Mathe <w...@sigwinch.xyz>

John Cowan

unread,

Mar 27, 2026, 6:25:27 PMMar 27

to scheme-re...@googlegroups.com

On Fri, Mar 27, 2026 at 5:47 PM Wolfgang Corcoran-Mathe
<w...@sigwinch.xyz> wrote:

> Correct me if I'm wrong, but wouldn't repeatability be the key
> property which programmers would expect from a 1-and-not-4-type
> random source?

Type 1 is always repeatable: you start with a seed and then everything
follows from that. Type 4 is unpredictable.

> A binary port interface would be better than the SRFI 158-generator
> interface of SRFI 194, at the very least. (Generators have all kinds
> of problems--which is not to say that ports are problem-free.)

Generators were modeled on ports, so yes, either is fine. The only
problems I know of with generators is that they are stateful, which is
what we want here, and that they can't generate an EOF, which only
matters for generators from lists, vectors, etc. that might contain an
EOF object.

Wolfgang Corcoran-Mathe

unread,

Mar 27, 2026, 7:19:46 PMMar 27

to scheme-re...@googlegroups.com

On 2026-03-27 18:25 -0400, John Cowan wrote:
> On Fri, Mar 27, 2026 at 5:47 PM Wolfgang Corcoran-Mathe
> <w...@sigwinch.xyz> wrote:
>
> > Correct me if I'm wrong, but wouldn't repeatability be the key
> > property which programmers would expect from a 1-and-not-4-type
> > random source?
>
> Type 1 is always repeatable: you start with a seed and then everything
> follows from that. Type 4 is unpredictable.

Oops, my mistake. Thank you.

> > A binary port interface would be better than the SRFI 158-generator
> > interface of SRFI 194, at the very least. (Generators have all kinds
> > of problems--which is not to say that ports are problem-free.)
>
> Generators were modeled on ports, so yes, either is fine. The only
> problems I know of with generators is that they are stateful, which is
> what we want here, and that they can't generate an EOF, which only
> matters for generators from lists, vectors, etc. that might contain an
> EOF object.

In this case my main complaint about generators is that they add another
output-port-like object to the language without much justification.
On top of this, you can't write a portable 'generator?'. And, since
you can also make a strong argument for push-style generators[1] (and
probably other kinds of generators as well), I would prefer not to
"bless" SRFI 158 generators in their current form.

The downside of a binary port interface to random numbers is that ports
aren't a perfect fit--would it make any sense to use 'transcoded-port'
on an RNG port? (Perhaps this is one reason why the SRFI 194 authors
used generators?)

(I know, this is a separate issue. There are bigger things to pursue
here than the ports vs. generators and generators vs. generators
debates.)

[1] https://okmij.org/ftp/Scheme/enumerators-callcc.html

[2] As a way to evaluate Oleg's arguments, I used
push-and-accumulate style generators in a CSV parser:
https://www.sigwinch.xyz/scheme/csv.html I think that in this case
the push model works very well, although both models have pros and cons.

--
Wolfgang Corcoran-Mathe <w...@sigwinch.xyz>

Peter McGoron

unread,

Mar 27, 2026, 8:21:11 PMMar 27

to scheme-re...@googlegroups.com

On 3/27/26 19:19, Wolfgang Corcoran-Mathe wrote:
> In this case my main complaint about generators is that they add another
> output-port-like object to the language without much justification.
> On top of this, you can't write a portable 'generator?'. And, since
> you can also make a strong argument for push-style generators[1] (and
> probably other kinds of generators as well), I would prefer not to
> "bless" SRFI 158 generators in their current form.
>
> The downside of a binary port interface to random numbers is that ports
> aren't a perfect fit--would it make any sense to use 'transcoded-port'
> on an RNG port? (Perhaps this is one reason why the SRFI 194 authors
> used generators?)

In terms of Foundations RNGs: My proposal (`random-bytes->flonum`, etc.)
used generators for performance reasons, with the understanding that an
implementation would be able to optimize the calls better. They are
low-level procedures and are intended to be wrapped in some other interface.

I would personally be OK with not supporting super-fancy distributions
like SRFI 194 if an agreeable stream interface isn't agreed upon.

-- Peter McGoron

John Cowan

unread,

Mar 27, 2026, 10:10:02 PMMar 27

to scheme-re...@googlegroups.com

On Fri, Mar 27, 2026 at 7:19 PM Wolfgang Corcoran-Mathe
<w...@sigwinch.xyz> wrote:

> In this case my main complaint about generators is that they add another
> output-port-like object to the language without much justification.

The justification was portability to existing Schemes. Generators are
just a convention, which makes them straightforward to write. (I
suppose you mean "input-port-like".)

> On top of this, you can't write a portable 'generator?'.

True. That has to do with our inability to add arbitrary properties
to procedures or runtime objects generally. Perhaps we should have a
SRFI to do so, modeled on SRFI 213? It would allow a great deal of
power we don't currently have any way to accomplish. A semi-portable
implementation with weak hash tables would be one way to do it.

> The downside of a binary port interface to random numbers is that ports
> aren't a perfect fit--would it make any sense to use 'transcoded-port'
> on an RNG port?

No, but if you have a binary input port open on an executable file, a
database file, or what have you, transcoded-port wouldn't make sense
then either.

> (Perhaps this is one reason why the SRFI 194 authors
> used generators?)

Again, portability was the issue. You can't extend binary input ports
in most Schemes.

Wolfgang Corcoran-Mathe

unread,

Mar 28, 2026, 1:41:04 PMMar 28

to scheme-re...@googlegroups.com

On 2026-03-27 22:09 -0400, John Cowan wrote:
> On Fri, Mar 27, 2026 at 7:19 PM Wolfgang Corcoran-Mathe
> <w...@sigwinch.xyz> wrote:
>
> > In this case my main complaint about generators is that they add another
> > output-port-like object to the language without much justification.
>
> The justification was portability to existing Schemes. Generators are
> just a convention, which makes them straightforward to write. (I
> suppose you mean "input-port-like".)

Sorry, yes; I/O dyslexia strikes. (You get data out of input ports
and put data into output ports.)

> > On top of this, you can't write a portable 'generator?'.
>
> True. That has to do with our inability to add arbitrary properties
> to procedures or runtime objects generally. Perhaps we should have a
> SRFI to do so, modeled on SRFI 213? It would allow a great deal of
> power we don't currently have any way to accomplish. A semi-portable
> implementation with weak hash tables would be one way to do it.

This seems like an addition with vast implications. I see how it
would enable things like defining custom types and contracts, maybe
even dynamically. But it's a big, knotty idea that goes way beyond
the problem of distinguishing generators from other procedures.

I still prefer binary input ports for random numbers. They provide
everything necessary, they're more clearly defined than generator
procedures, and we already have the ability to portably query some
port properties. As for portability, I believe it's more important
in SRFIs and, in a Report, shouldn't be allowed to get in the way of
specifying the Right Thing. And anyway, R7RS-large already requires
more dramatic adaptations than the addition of a class of RNG ports.

--
Wolfgang Corcoran-Mathe <w...@sigwinch.xyz>

Linas Vepstas

unread,

Mar 30, 2026, 12:39:07 AMMar 30

to scheme-re...@googlegroups.com

On Tue, Mar 24, 2026 at 11:47 PM John Cowan <co...@ccil.org> wrote:
> On Tue, Mar 24, 2026 at 4:34 PM Arthur A. Gleckler <a...@speechcode.com> wrote:
>
> > such hardware is often hard to inspect and makes an attractive target for attackers who can plant somebody inside the chip design house or even fabrication plant.
>
> If you can do that, I should think you are better off perverting the
> CPU's behavior, something like the Pentium divide bug.

The goal is not to pervert the cpu (damage it) but to gain covert
control over it. Having worked at several CPU companies, I can attest
that when something like this comes up in lunch-table conversations,
the people who know tend to look at you with squinty eyes, issue some
curt warning, and change the topic.

Here's one example: all chips have some top-secret extra wires and a
pool of spare transistors; the intended use is to work around bugs
that are discovered after chip fab has started (eg. fdiv-style bugs)
-- blow some fuses here and there, and bingo, you've patched the bug.
What else can you do with this? Change of topic. How does it work?
Change of topic. How many and where do they route? "Don't know what
you're talking about, dude." Joe and I were... "see, what you said
doesn't even make sense and anyway there are other people who deal
with that. What're you doing this weekend?"

-- Linas

--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

Alaric Snell-Pym

unread,

Mar 30, 2026, 4:50:18 AMMar 30

to scheme-re...@googlegroups.com

On 30/03/2026 05:38, Linas Vepstas wrote:

> Here's one example: all chips have some top-secret extra wires and a
> pool of spare transistors; the intended use is to work around bugs
> that are discovered after chip fab has started (eg. fdiv-style bugs)
> -- blow some fuses here and there, and bingo, you've patched the bug.

I'm always impressed with people's ingenuity in exploiting the tiniest
thing to gain full control of a system, too.

I have a friend who worked on some embedded device; the firmware wasn't
field-updatable as it went into a mask ROM but the device had a small
flash that could be field-updated. So the programmers of the firmware
put a small jump-indirection table in the flash, so that they could
patch bits of the system they thought might need patching: put new code
in the flash and divert the jump addresses to it in the table.

However, they found a breaking bug, and it wasn't in a routine that
could be patched in the table. Crisis! Recall the hardware and
hand-solder in new mask ROM chips at great expense!

But what my friend did was to write a flash image that patched some
other function which was called early enough, and made it set a hardware
debug breakpoint in front of the routine that needed patching, with a
debug interrupt handler in the flash that contained the patched routine,
wrapped in the appropriate logic to save/restore the CPU state as a
debug interrupt handler - abusing the hardware debug facilities (mainly
pointless in a production system) to escalate the ability to patch *one*
routine to the ability to patch *any* routine.

On modern CPUs, you can do a lot by just patching the microcode:

https://www.theregister.com/2025/02/04/google_amd_microcode/

I wouldn't be at all surprised if nation-state-level attackers had ways
(potentially with active cooperation from organisations involved in the
design or manufacture of the CPUs) to patch the microcode from userland
code.

I love infosec stuff because it's wonderfully paranoid and crazy - and
then a new story comes out that they were right all along! There's so
much exciting drama!

>
> -- Linas