random seeds

David Harvey

unread,

Oct 8, 2007, 9:31:39 AM10/8/07

to sage-...@googlegroups.com

When I start up sage, I get different random number seeds every time,
e.g.

$ ./sage
sage: ZZ.random_element()
2

.....

$ ./sage
sage: ZZ.random_element()
-4

The seeding --- at least for this case --- seems to be happening in
random.pxi.

We *really* need a way of specifying a random seed at startup.

david

mabshoff

unread,

Oct 8, 2007, 9:35:39 AM10/8/07

to sage-devel

On Oct 8, 3:31 pm, David Harvey <dmhar...@math.harvard.edu> wrote:

Hello David,

When we fixed all the leaks in the random seed code William wrote some
code to specify the random seed via an environment variable. I grepped
$SAGE_LOCAL/bin and couldn't find anything, so maybe it wasn't merged.

> david

Cheers,

Michael

William Stein

unread,

Oct 8, 2007, 10:36:48 AM10/8/07

to sage-...@googlegroups.com

I removed it because the implementation was way too hack-ish. It was
mainly for experimenting.

I do encourage David to open a trac ticket about this. He's right that
seeding the random number generator should be possible via a command
line argument at startup.

One perhaps reasonable way to do this is by setting an environment variable
in local/bin/sage-sage if a certain command line option is set, then in
ext/random.pxi (and anywhere else), somehow using that environment variable
(if set) to seed the random number generator. The tricky part is one has
to also make sure in all cases that all seeding is done from one place, and
that the random seed is easily available from Sage on startup or in
crash messages.

William

William Stein

unread,

Oct 8, 2007, 10:38:06 AM10/8/07

to sage-...@googlegroups.com

I just looked and this is already
http://trac.sagemath.org/sage_trac/ticket/658

I added my comments to that ticket.

--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

cwitty

unread,

Oct 8, 2007, 12:34:50 PM10/8/07

to sage-devel

On Oct 8, 7:36 am, "William Stein" <wst...@gmail.com> wrote:
> I do encourage David to open a trac ticket about this. He's right that
> seeding the random number generator should be possible via a command
> line argument at startup.

But which random number generator? libc, Python, libpari, NTL,
libgmp... not counting the many subprocesses that have their own
random number generator.

I have a rough proposal to try to bring some order to this mess.
Randomness in SAGE should be centered in a "randgen" object. There is
a global default randgen, but others can be created. All algorithms
that use random numbers should take an optional randgen parameter, and
use the numbers from there. The numbers from a randgen would be
portable across architectures.

This would have several advantages. Randomized algorithms could be
run repeatably, for testing or debugging. All of our "random"
doctests could be tested, instead of ignored.

I would be willing to work on this, if people think it's a good idea.
What do you think?

Carl

Robert Bradshaw

unread,

Oct 8, 2007, 1:03:57 PM10/8/07

to sage-...@googlegroups.com

On Oct 8, 2007, at 9:34 AM, cwitty wrote:

> On Oct 8, 7:36 am, "William Stein" <wst...@gmail.com> wrote:
>> I do encourage David to open a trac ticket about this. He's right
>> that
>> seeding the random number generator should be possible via a command
>> line argument at startup.
>
> But which random number generator? libc, Python, libpari, NTL,
> libgmp... not counting the many subprocesses that have their own
> random number generator.

I believe gmp, Python, and NTL can be seeded, perhaps others.

> I have a rough proposal to try to bring some order to this mess.
> Randomness in SAGE should be centered in a "randgen" object. There is
> a global default randgen, but others can be created. All algorithms
> that use random numbers should take an optional randgen parameter, and
> use the numbers from there. The numbers from a randgen would be
> portable across architectures.

I'm not sure what impact this would have on performance... both in
terms of our randgen object's efficiency and in passing this
(optional) parameter all over the place (and through anything that
may use a randomized subroutine). Random number sources may be one of
the things that is OK to have a global for. Not that I'm saying it's
a bad idea, but it should be looked at more.

There's also the question of specifying distributions, which perhaps
should be looked into at the same time, and other issues (e.g. What
is a random element of RR[x]) I think one should always be able to
generate a random element without any parameters (e.g. so a generic
random matrix (not using the term in the theoretical sense) could
just fill itself with random entries).

>
> This would have several advantages. Randomized algorithms could be
> run repeatably, for testing or debugging. All of our "random"
> doctests could be tested, instead of ignored.

Some (many?) of the random doctests are the result of differing
inexact floating point calculations.

>
> I would be willing to work on this, if people think it's a good idea.
> What do you think?

I think it is a great idea whose time is way overdue.

>
> Carl
>
>
>

William Stein

unread,

Oct 8, 2007, 1:10:10 PM10/8/07

to sage-...@googlegroups.com

On 10/8/07, Robert Bradshaw <robe...@math.washington.edu> wrote:
> > This would have several advantages. Randomized algorithms could be
> > run repeatably, for testing or debugging. All of our "random"
> > doctests could be tested, instead of ignored.
>
> Some (many?) of the random doctests are the result of differing
> inexact floating point calculations.

True. However, a few days ago I realized there is a much better
solution to this problem. Use ... in the output. E.g., instead
of something like

sage: sin(1.0) # random low-order bits
0.841470984807897

we do

sage: sin(1.0)
0.8414709848078...

and then all but the last 2 digits are checked, and them being
wrong is sort of clear to the reader of the doctest.

Thoughts?

>
> >
> > I would be willing to work on this, if people think it's a good idea.
> > What do you think?
>
> I think it is a great idea whose time is way overdue.

I agree. However, I strongly encourage people to discuss
this a bit longer in sage-devel before implementing something.
Whatever we do it will likely be easy to implement but hard
to design.

-- William

Justin C. Walker

unread,

Oct 8, 2007, 1:32:42 PM10/8/07

to sage-...@googlegroups.com

On Oct 8, 2007, at 9:34 AM, cwitty wrote:

>
> On Oct 8, 7:36 am, "William Stein" <wst...@gmail.com> wrote:
>> I do encourage David to open a trac ticket about this. He's right
>> that
>> seeding the random number generator should be possible via a command
>> line argument at startup.
>
> But which random number generator? libc, Python, libpari, NTL,
> libgmp... not counting the many subprocesses that have their own
> random number generator.
>
> I have a rough proposal to try to bring some order to this mess.
> Randomness in SAGE should be centered in a "randgen" object. There is
> a global default randgen, but others can be created. All algorithms
> that use random numbers should take an optional randgen parameter, and
> use the numbers from there. The numbers from a randgen would be
> portable across architectures.

On the face of it, I think it's a good idea. It's worth pursuing, at
least until we have an idea of what the impact is likely to be.

+1

On Oct 8, 2007, at 10:10 AM, William Stein wrote:

> True. However, a few days ago I realized there is a much better
> solution to this problem. Use ... in the output. E.g., instead
> of something like
>
> sage: sin(1.0) # random low-order bits
> 0.841470984807897
>
> we do
>
> sage: sin(1.0)
> 0.8414709848078...

This sounds good, but what about "piping" issues (take the print
output of one function and give it as input to another)? Currently,
we get

sage: xxx= 0.8414709848078...
------------------------------------------------------------
File "<ipython console>", line 1
xxx= RealNumber('0.8414709848078E')llipsis
^
<type 'exceptions.SyntaxError'>: invalid syntax

I often copy/paste between windows running different SAGEs, or even
print to files and read back, so this isn't entirely bogus.

Justin

--
Justin C. Walker, Curmudgeon-At-Large
Institute for the Absorption of Federal Funds
--------
If you're not confused,
You're not paying attention
--------

William Stein

unread,

Oct 8, 2007, 1:37:12 PM10/8/07

to sage-...@googlegroups.com

I'm sorry for not being clearer -- you completely misunderstood
my proposal. I am *only* suggesting that the actual doctest
output be changed to have ...'s in cases where the lower order
bits are random. I am *NOT* suggesting changing the actual
output of Sage to have ...'s.

William

Justin C. Walker

unread,

Oct 8, 2007, 1:53:59 PM10/8/07

to sage-...@googlegroups.com

I could have figured that out if I'd actually taken the time to read
your reply, instead of replying, answering the phone, getting the
dog's teeth out of my leg, and crossing things off my Honey-do list.

Never mind...

Justin

--
Justin C. Walker, Curmudgeon-At-Large, Director
Institute for the Enhancement of the Director's Income
--------
The path of least resistance:
it's not just for electricity any more.
--------

cwitty

unread,

Oct 8, 2007, 8:33:06 PM10/8/07

to sage-devel

On Oct 8, 10:10 am, "William Stein" <wst...@gmail.com> wrote:
> I agree. However, I strongly encourage people to discuss
> this a bit longer in sage-devel before implementing something.
> Whatever we do it will likely be easy to implement but hard
> to design.

OK, here's my preliminary proposal for a class randgen, that manages
random number generators and seeds.

Note that the methods of randgen are intended to be used by library
authors (like the authors of ZZ.random_element() and
RR.random_element()), not directly by end-users; end-users may create
randgen objects and pass them around, but would probably never
directly call any methods on them.

randgen is a Cython class. The main state it holds is a
gmp_randstate_t,
although it also has some other cached information.

randgen methods include:
python_random()
Returns an instance of random.Random. The first time it is called
on a given instance of randgen, a new random.Random is created and
seeded from the gmp_randstate_t; this is saved, and subsequent
calls
return the same random.Random instance.

set_seed_libc()
set_seed_ntl()
set_seed_pari()
set_seed_magma()
set_seed_mathematica()
set_seed_...()
Sets the seed of the specified random number generator, from a new
random number from the gmp_randstate_t.

new_randgen()
Creates a new randgen object, seeded from a random number from
this
object's gmp_randstate_t.

Also, Cython code can just access the gmp_randstate_t directly.

Constructor:
randgen()
Create a new randgen, seeded randomly (from os.urandom() if
available,
from the system time otherwise).
randgen(n)
Create a new randgen, seeded from n.

One of my design goals is that if algorithm A calls algorithm B, where
both
A and B use random numbers, it should be possible to change algorithm
B
(for instance, to use a different number of random numbers) without
affecting the random numbers seen by algorithm A. The interface
supports
that by having algorithm A call .new_randgen() on its randgen object,
and
passing this new randgen to algorithm B. Algorithm A's isolation
from
algorithm B is then perfect if B uses only the main gmp_randstate_t or
python_random(); if the algorithms use one of the other random number
generators, then isolation is achieved if both algorithms use
set_seed_...() before every use of the corresponding random number
generator.

There is a single global default randgen, named default_rgen.

Every function/method that uses random numbers has an optional
argument
rgen, and is declared with rgen=default_rgen.

So ZZ.random_element() would use the gmp_randstate_t inside the
default randgen, but ZZ.random_element(rgen=randgen(3)) would create a
new
randgen and use the gmp_randstate_t inside it. (So it would return
the
same number every time.)

To make a sequence of doctests repeatable, any of the following would
work:

sage: sage.misc.random.default_rgen = randgen(1)
sage: ZZ.random_element()
sage: RR.random_element()

or

sage: rgen = randgen(1)
sage: ZZ.random_element(rgen=rgen)
sage: RR.random_element(rgen=rgen)

or

sage: ZZ.random_element(rgen=rgen(1))
sage: RR.random_element(rgen=rgen(1))

(The first two options would print the same random numbers as each
other. In the third option, the second line would print a different
random number.)

The names "rgen" and "randgen" are carefully chosen not to include the
string "random", to avoid triggering the doctest feature "ignore
doctests that include the word random". But if people like this
approach, which allows "random" doctests to still give identical
results across runs and across machines, then maybe that doctest
feature should be disabled.

David Harvey

unread,

Oct 8, 2007, 8:47:13 PM10/8/07

to sage-...@googlegroups.com

Hi Carl,

I haven't yet thought hard about the details of what you propose, but
I'm just curious why you are suggesting to use gmp_randstate_t as the
"most basic type".

One property I would like the system to have is: the most basic
random number generator should be insanely fast, even if the quality
isn't super-high; and then people should be able to select higher-
quality random number generators (at the expense of speed) if they like.

david

cwitty

unread,

Oct 8, 2007, 9:30:35 PM10/8/07

to sage-devel

On Oct 8, 5:47 pm, David Harvey <dmhar...@math.harvard.edu> wrote:
> Hi Carl,
>
> I haven't yet thought hard about the details of what you propose, but
> I'm just curious why you are suggesting to use gmp_randstate_t as the
> "most basic type".

Well, it's not a very good reason. It's because I said "Cython code
can just access the gmp_randstate_t directly." So I always want the
gmp_randstate_t to be initialized (otherwise, Cython code has to call
some "ensure the gmp_randstate_t is initialized" method, and will
sometimes crash if it forgets). If gmp_randstate_t is the "most basic
type", then creating a new randgen only requires initializing one new
random number generator; if something else was basic (presumably
Python, since it's the only other one in the list that allows creation
of multiple simultaneous random number generators) then every randgen
creation would require initializing two random number generators.

> One property I would like the system to have is: the most basic
> random number generator should be insanely fast, even if the quality
> isn't super-high; and then people should be able to select higher-
> quality random number generators (at the expense of speed) if they like.

My proposal is not intended to allow for plugging different random
number algorithms. My thought was that some code (code that wants
random mpz_t's or mpfr_t's, for instance) would always use the
gmp_randstate_t (which ends up being the Mersenne Twister). Other
code (code that wants to use the methods of random.Random) would use
python_random() (which, again, ends up being the Mersenne Twister).
Code that wants to create random NTL objects would presumably call
set_seed_ntl() and then use the NTL calls.

I believe that actually allowing for pluggable random number
algorithms would be a lot more work. We would need to define some
generic interface to get random bits, and rewrite all the SAGE code
that uses random numbers to call that interface. This is not an idea
that particularly interests me, so I probably wouldn't work on it.

Carl

Reply all

Reply to author

Forward