long x = 362436069;
long crry = 1234567;
long mwc32() {
_asm {
mov eax,x
mov edx,2083801278
mul edx ; a*x in edx:eax
mov ecx,crry ; get crry
add eax,ecx ; ax+c in eax
adc edx,0 ; if carry, increment edx
mov crry,edx ; store new crry
mov x,eax ; store new x
}
return x;
}
In the non-debug version, the compiler only added one assembly language
instruction for the "return x".
Here are the first five numbers generated. Are they correct?
-817168771
958084061
37572461
-240094778
105007235
Pete
Put the following code in a ".c" file (not ".cpp"). Using a "C" file
keeps the compiler from generating a "mangled" name that encodes the
function's return type. The C code does not have an explicit
"return x" instruction, which saves one assembly language "mov". When
compiled without debug information, the code generated is exactly
the assembly language instructions that are wanted.
In the files that call this routine, don't declare it as "void", but as
extern "C" long MWC32();
^^^^^^^^^^^^^^^^^ MWC32.c ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
static long x = 362436069;
static long crry = 1234567;
void MWC32() {
_asm {
mov eax,x
mov edx,2083801278
mul edx ; a*x in edx:eax
mov ecx,crry ; get crry
add eax,ecx ; ax+c in eax
adc edx,0 ; if carry, increment edx
mov crry,edx ; store new crry
mov x,eax ; store new x
}
// return x; Implicit. The value of x is already in the return register.
}
The first five numbers generated are
-817168771
958084061
37572461
-240094778
105007235
I trust those are correct.
Pete
> This newsgroup has frequent queries about good random mumber generators.
> A particularly promising new method is multiply-with-carry, since it
> produces 32-bit random integers that seem to pass all tests of randomness
> (such as those in the Diehard battery of my random number CDROM), and is
> exceptionally fast because it exploits the way modern CPU's get a 64-bit
> product from two 32-bit integers. But it must do that via machine
> language. I offer a version below, for those who may be able to use
> it with Fortran on their Intel machines, or as an outline for those who
> may be able to provide versions for C or other compilers or other platforms.
It turns out that this algorithm can be coded in Java, since the
"long" type is 64 bits on all platforms. One implementation is shown
at the end of this message. These are the first five numbers, using
the same values for multiplier, x, and carry as in Marsaglia's post.
-817168771
958084061
37572461
-240094778
105007235
The Java code I wrote takes about 5.5 microseconds per number on a
70 MHz Sparc 5 workstation, running the kaffe just-in-time compiler
instead of Sun's java. This time includes the function call. Note
that this is more than an order of magnitude greater than the result
quoted for the Pentium 120 (namely 130 nanoseconds).
In a way, a portable implementation of this algorithm isn't too
interesting, since if you are really interested in speed you will
code in native assembly language.
BTW Sun's default Java RNG is the same as lrand48 in the Solaris
C library. This is a 48-bit linear congruential generator.
Many thanks to George Marsaglia.
--Robert Dodier
PS. Here's the code.
/** Marsaglia's multiply-with-carry random number generator.
* Robert Dodier, dod...@colorado.edu.
*/
public class MultiplywCarryRNG
{
static long a = 2083801278, x = 362436069, c = 1234567;
static int next32()
{
long axc = a*x + c;
x = axc & 0xFFFFFFFFL;
c = (axc >>> 32) & 0xFFFFFFFFL;
return (int) x;
}
}
--
``That's one of the things I like about being a Jain. After a few
billion cosmic cycles, the old jokes become new again.'' -- Gaurav Shah
Anyway, I'll just include the files below:
--- rnd1.s
.file "rnd1.s"
.data
.align 4
.type c,@object
.size c,4
c:
.long 1234567
.type n1,@object
.size n1,4
n1:
.long 362436069
.text
.align 4
.globl rndnseed
.type rndnseed,@function
rndnseed:
/ void rndnseed( unsigned long c, unsigned long n1 );
movl 4(%esp),%eax
movl %eax,c
movl 8(%esp),%eax
movl %eax,n1
ret
.Lfe1:
.size rndnseed,.Lfe1-rndnseed
.align 4
.globl rndn
.type rndn,@function
rndn:
/ unsigned long rndn( void );
/ George Marsaglia:
/ The multiply-with-carry generator for 32-bit integers:
/ x(n)=a*x(n-1) + carry mod 2^32
/ Choose multiplier a from this list:
/ 1791398085 1929682203 1683268614 1965537969 1675393560
/ 1967773755 1517746329 1447497129 1655692410 1606218150
/ 2051013963 1075433238 1557985959 1781943330 1893513180
/ 1631296680 2131995753 2083801278 1873196400 1554115554
/ ( or any 'a' for which both a*2^32-1 and a*2^31-1 are prime)
/ This version, as it stands, uses a=2083801278 as the multiplier.
/ Replace 2083801278 with your choice of 'a' in the line:
/ movl $2083801278,%eax
/ Forms a*x+c as 64 bits in the two 32-bit registers edx:eax
/ The new x is in eax, the new carry c is in edx.
/ The period is a*2^31-1 > 2^60.
/ Takes 130 nanoseconds on a Pentium 120
/ Theory is in the file mwc1.ps of the Marsaglia Random Number CDROM.
/ Compute a*n1 + c and store it in %edx:%eax
movl $2083801278,%eax
mull n1
addl c,%eax
adcl $0,%edx
/ New random number is in %eax, which we also save in n1
/ New carry (c) is in %edx
movl %eax,n1
movl %edx,c
ret
.Lfe2:
.size rndn,.Lfe2-rndn
--- rnd2.s
.file "rnd2.s"
.data
.align 4
.type c,@object
.size c,4
c:
.long 3471936680
.type n1,@object
.size n1,4
n1:
.long 883145452
.align 4
.type n2,@object
.size n2,4
n2:
.long 3102236716
.align 4
.type n3,@object
.size n3,4
n3:
.long 1053179685
.align 4
.type n4,@object
.size n4,4
n4:
.long 875202402
.text
.align 4
.globl rndmseed
.type rndmseed,@function
rndmseed:
/ void rndmseed( unsigned long c, unsigned long n1, unsigned long n2,
/ unsigned long n3, unsigned long n4 );
/ Called with rndmseed( c, n1, n2, n3, n4 );
movl 4(%esp),%eax
movl %eax,c
movl 8(%esp),%eax
movl %eax,n1
movl 12(%esp),%eax
movl %eax,n2
movl 16(%esp),%eax
movl %eax,n3
movl 20(%esp),%eax
movl %eax,n4
ret
.Lfe1:
.size rndmseed,.Lfe1-rndmseed
.align 4
.globl rndm
.type rndm,@function
rndm:
/ unsigned long rndm( void );
pushl %ebx
/ Next 4 sections compute a4*n4+a3*n3+a2*n2+a1*n1+c
/ where a4,a3,a2,a1 are the constants given by Marsaglia,
/ viz. 2111111111, 1492, 1776, 5115, respectively
/ Note that we keep running total in %ecx:%ebx
movl $2111111111,%eax
mull n4
movl %eax,%ebx
movl %edx,%ecx
movl $1492,%eax
movl n3,%edx
movl %edx,n4
mull %edx
addl %eax,%ebx
adcl %edx,%ecx
movl $1776,%eax
movl n2,%edx
movl %edx,n3
mull %edx
addl %eax,%ebx
adcl %edx,%ecx
movl $5115,%eax
movl n1,%edx
movl %edx,n2
mull %edx
addl %eax,%ebx
adcl %edx,%ecx
addl c,%ebx
adcl $0,%ecx
/ New random number is in %ebx, which we also save in n1
/ New carry (c) is in %ecx
movl %ebx,%eax
movl %eax,n1
movl %ecx,c
popl %ebx
ret
.Lfe2:
.size rndm,.Lfe2-rndm
--- rndtest.c
/*
To compile:
as -o rnd1.o rnd1.s
as -o rnd2.o rnd2.s
gcc -o rndtest rndtest.c rnd1.o rnd2.o
*/
#include <stdio.h>
typedef unsigned long ul;
void rndmseed( ul c, ul n1, ul n2, ul n3, ul n4 );
ul rndm( void );
void rndnseed( ul c, ul n1 );
ul rndn( void );
/*
#define RND rndn
Using default seed values:
3477798525
958084061
37572461
4054872518
105007235
*/
#define RND rndm
/*
Using default seed values:
451831326
2915878364
2924531694
4286387848
3190156635
*/
int main()
{
ul i, c, n1, n2, n3, n4;
/*
scanf( "%lu %lu", &c, &n1 );
rndnseed( c, n1 );
*/
/*
scanf( "%lu %lu %lu %lu %lu", &c, &n1, &n2, &n3, &n4 );
rndmseed( c, n1, n2, n3, n4 );
*/
for( i = 0; i < 5; i++ )
printf( "%lu\n", RND() );
exit(0);
}
I don't know if the _int64 type is implemented in VC++ 4.0, but this works
in version 5.0:
long x = 362436069;
long crry = 1234567;
long mwc32() {
_int64 axc = (_int64)x * 2083801278 + crry;
crry = axc>>32;
x = (long)axc;
return x;
}
Al Klassen
I saw other people AND-ing crry and the return value with 0xFFFFFFFF
Should you do that and what effect does it have?
Also, does using the _int64 type (or, say, GNU gcc's "long long")
provide the same level of efficiency as the assembly code posted by
others?
Yunzhi Zhu
You probably should use unsigned ints.
I have by now seen dozens of posts about this stuff, and I have not
seen one post among them that would state that the "very good" aspect
of the generator had been tested at all. Has anybody run a spectral
test on it? With what dimensions? What were the outcomes?
After all, the above is just a simple linear congruential generator
modulo 2^32-1. What is is the quality of the multiplier 2083801278?
There are quite tolerable generators of that kind, but is this one?
Was the multiplier, perhaps, taken from "The Art of Computer
Programming, Vol. 2, Seminumerical Algorithms" by Knuth?
--
David Kastrup Phone: +49-234-700-5570
Email: d...@neuroinformatik.ruhr-uni-bochum.de Fax: +49-234-709-4209
Institut für Neuroinformatik, Universitätsstr. 150, 44780 Bochum, Germany
Well Ok, gcc provides "unsigned long long" -- a 64 bits data type
(required for this generator to work).
But the original poster implied that in order to get a 64-bit product
from two 32-bit integers efficiently, one needs to resort to low level
machine instructions. I don't know assembly, but I know C, so I'd like
to know if using gcc native 64-bit integral types is equally
efficient, and if yes, whether I need to AND values of "crry" and "x"
with 0xFFFFFFFF.
> I have by now seen dozens of posts about this stuff, and I have not
> seen one post among them that would state that the "very good"
> aspect of the generator had been tested at all. Has anybody run a
> spectral test on it? With what dimensions? What were the outcomes?
I believe that George Marsaglia's word on it could be trusted :)
Yunzhi Zhu
In this case it doesn't matter because the value will never exceed
63 bits therefore the number will never overflow into the sign bit.
>
>I have by now seen dozens of posts about this stuff, and I have not
>seen one post among them that would state that the "very good" aspect
>of the generator had been tested at all. Has anybody run a spectral
>test on it? With what dimensions? What were the outcomes?
I believe there is a paper that was submitted by Raymond Couture and
Pierre L'Ecuyer to Mathematics of Computation. I am unsure if it has
been published yet. The paper is called 'Distribution Poperties of
Multiply-With-Carry Random Number Generators'. It was available on the
net but if it has been published it has probably been removed.
They studied the spectral properties of the Recursion-With-Carry
extension. The generator in question has a long period and passes
all of the DIEHARD tests. The recursion-with-carry extension provides
an exponentially longer period at a cost of some speed.
>
>After all, the above is just a simple linear congruential generator
>modulo 2^32-1. What is is the quality of the multiplier 2083801278?
>
Incorrect it is a Multiply with carry generator with modulo 2^32.
The carry portion changes with each iteration. The form is similar.
The key point is that the period is longer than a LCNG with the
same modulus. Also the arithmetic is quite a bit faster than
multiplication modulo a prime.
>There are quite tolerable generators of that kind, but is this one?
>Was the multiplier, perhaps, taken from "The Art of Computer
>Programming, Vol. 2, Seminumerical Algorithms" by Knuth?
>
The multiplier was taken from the DIEHARD package. Multipliers for
MWC and RWC random number generators are easy to calculate.
For MWC the multiplier is calculated as follows:
m=a*b-1 where:
m and (m-1)/2 are both prime.
a will be the multiplier for modulus b.
The period will be (m-1)/2.
For RWC the formula is extendend to higher dimensions.
A thread is currently running in the sci.crypt newsgroup
about the RWC generators with the special case
of all multipliers zero except A1 and AR where R is the
length.
>--
>David Kastrup Phone: +49-234-700-5570
>Email: d...@neuroinformatik.ruhr-uni-bochum.de Fax: +49-234-709-4209
>Institut für Neuroinformatik, Universitätsstr. 150, 44780 Bochum, Germany
>
--
Mack
See the three new X8 ciphers, including the technical paper X8.TXT
and the reference implementation (feedback requested):
http://www.users.zetnet.co.uk/hopwood/crypto/scott16/x8.zip
http://www.sni.com/~mpj/crypto.htm under Encryption Libraries as X8.ZIP.
There was a 3rd. ed. of "Seminumerical Algorithms" in 1997, and I would have
thought that it could be relied upon for "trustworthy" if not definitive guidance on
such issues for a few years at least. This assumes that it is up to the high standard
set by the earlier editions. Does anyone have an opinion on this? To my unskilled
eye, it looks "up to the previously set standards." [Here and there in the new ed.
in place of a thorough discussion, there is an "under construction" symbol, together
with an outline of the content earmarked for that location "in the NEXT edition."
Such passages obviously offer no claim to be definitive.]
--------------------------------------------------------------
"I would predict that there are far greater mistakes waiting
to be made by someone with your obvious talent for it."
Orac to Vila. [City at the Edge of the World.]
-----------------------------------------------
R.W. Hutchinson. | rwh...@nr.infi.net
>Well Ok, gcc provides "unsigned long long" -- a 64 bits data type
>(required for this generator to work).
>But the original poster implied that in order to get a 64-bit product
>from two 32-bit integers efficiently, one needs to resort to low level
>machine instructions. I don't know assembly, but I know C, so I'd like
>to know if using gcc native 64-bit integral types is equally
>efficient, and if yes, whether I need to AND values of "crry" and "x"
>with 0xFFFFFFFF.
It is quite possible to provide anything in a programming language.
There are procedures for dealing with integers of arbitrary length;
the hardware strongly affects their speed.
It is possible to emulate any machine-language combination on any
other with ONE program, provided the memory is enough.
In all cases, the higher level language operations have to be
translated into low level machine instructions. In the above
situation, someone who is knowledgeable about the capabilities
of the machine is likely to do better than the compiler. But
on most machines, it will still be slow.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
> There was a 3rd. ed. of "Seminumerical Algorithms" in 1997, and I would have
>thought that it could be relied upon for "trustworthy" if not definitive guidance on
>such issues for a few years at least. This assumes that it is up to the high standard
>set by the earlier editions. Does anyone have an opinion on this? To my unskilled
>eye, it looks "up to the previously set standards." [Here and there in the new ed.
>in place of a thorough discussion, there is an "under construction" symbol, together
>with an outline of the content earmarked for that location "in the NEXT edition."
>Such passages obviously offer no claim to be definitive.]
I would not trust any of the current pseudo-random generators now in
common use. Some have been tested in some types of situations. Most
will do a good job in many situations, but anything slightly complicated
is questionable. I suggest that, whatever pseudo-random procedure is
used, that it be XORed with a fair physical generator.
>I would not trust any of the current pseudo-random generators now in
>common use. Some have been tested in some types of situations. Most
>will do a good job in many situations, but anything slightly complicated
>is questionable. I suggest that, whatever pseudo-random procedure is
>used, that it be XORed with a fair physical generator.
The point is that one should be wary of current (and future) pseudo-
random generators. This doesn't mean you shouldn't use them; in most
cases they work remarkably well. My general feeling has been that,
unless you are doing something very peculiar, and you are using less
than 1,000,000 numbers, then they are probably OK. Possibly this
number should be increased for more modern generators.
The problem with the current physical generators is that they are
really too slow. You want them for large simulations (> 1,000,000
numbers). The fastest commercial generator I know about can
be run at 25 microseconds per byte or 4 gigabytes per day. (Twice
as fast as the manufacturer recommends but OK if you are Xoring
with a pseudo-generator). I think it would be reasonable to have
a PC running continuously making a huge bank of random bytes on disk
that you could draw on as required. It would take quite a bit of
organising, but I think I would do this if I was in the business
of doing massive Monte Carlo calculations.
(For key-generation or lotteries where unpredictability is
really important, the slow speed of hardware generators is not a
problem. They still need to be fast enough to test, but 1 byte
per few milliseconds will be fast enough).
I am currently thinking (only thinking) about trying to get an
objective interpretation of "passes Marsaglia's Diehard tests".
Marsaglia doesn't give a summary result. So I run the test suite
on my set of random numbers and then look for extremely significant
results, and possibly clusters of moderately significant ones.
If I don't see any, I say the generator passes. Not a very
satisfactory approach. I have been wondering about calculating
some summary statistic and then finding its distribution
using simulation using a hybrid (pseudo xored with a hardware
rng).
It's going to take about 3 days to get the 10 gigabytes of random
numbers required.
Robert
>>I would not trust any of the current pseudo-random generators now in
>>common use. Some have been tested in some types of situations. Most
>>will do a good job in many situations, but anything slightly complicated
>>is questionable. I suggest that, whatever pseudo-random procedure is
>>used, that it be XORed with a fair physical generator.
>The point is that one should be wary of current (and future) pseudo-
>random generators. This doesn't mean you shouldn't use them; in most
>cases they work remarkably well. My general feeling has been that,
>unless you are doing something very peculiar, and you are using less
>than 1,000,000 numbers, then they are probably OK. Possibly this
>number should be increased for more modern generators.
The question is not generally how many are used, but how many are
used jointly. If one is using a large number jointly, the problems
increase. The problems of dependence can be very difficult to
test in these cases.
>The problem with the current physical generators is that they are
>really too slow. You want them for large simulations (> 1,000,000
>numbers). The fastest commercial generator I know about can
>be run at 25 microseconds per byte or 4 gigabytes per day. (Twice
>as fast as the manufacturer recommends but OK if you are Xoring
>with a pseudo-generator).
I am surprised at this. For absence of even moderate-term dependence
to fair accuracy, I think that radioactive counts are probably better
than electronic noise, and if the process is run at moderately high
speeds, the performance is actually considerably improved over the
case of no dead time. I would be surprised if we could not get better
than 1 microsecond per byte.
We can also reuse physical random numbers without losing too much.
Try to make sure that the reuse period is not close to a power of
2 times a rational number with a small numerator and denominator.