# generating non-IID random sequences

2 views

### Yaroslav Bulatov

Jul 18, 2007, 3:12:13â€¯AM7/18/07
to
I'm looking for a fast way to sample from a Markov-1 sequence of
random bits. The method below is 3600 times slower than built-in
RandomInteger function, can it be made much faster?

p = 0.9; (* the probability of encountering 00 or 11 *)
f = RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, 0}]
&;
NestList[f, RandomChoice[{0, 1}], 100000]

### Peter Pein

Jul 19, 2007, 3:38:58â€¯AM7/19/07
to
Yaroslav Bulatov schrieb:

Hi Yaroslav,

isn't this formula a bit too complicated to be used in a fast generator?

Just return the same bit as the last with probability p:

SeedRandom[1];
Timing[With[{p=0.9},
rlist=NestList[If[Random[]<=p,#,1-#]&,Random[Integer],10^6];]][[1]]

Count[Partition[rlist,2,1],{x_,x_},{1}]/Length[rlist]//N
Out[32]=
0.300019 Second
Out[33]=
0.899887
, which is pretty near 0.9 :-)

Peter

### Darren Glosemeyer

Jul 19, 2007, 3:39:59â€¯AM7/19/07
to
Yaroslav Bulatov wrote:
> I'm looking for a fast way to sample from a Markov-1 sequence of
> random bits. The method below is 3600 times slower than built-in
> RandomInteger function, can it be made much faster?
>
> p = 0.9; (* the probability of encountering 00 or 11 *)
> f = RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, 0}]
> &;
> NestList[f, RandomChoice[{0, 1}], 100000]
>
>

Here's some code that is roughly 25-30 times faster than the code sent
for a list of 10^6 bits. The code starts with a list containing a 0 or 1
(generated with equal probability) followed by n 0 or 1 values generated
with 1 having probability p. Those values are then looped through,
iteratively replacing any value that follows a 0 with the other value
(the probabilities flip-flop for values following a 0, so we can just
switch the value). In effect this trades the more costly n calls to
RandomChoice for a single call to RandomChoice and an update loop
through the list, and uses compilation to speed up the pass through the
loop.

In[1]:= markovBits[p_, n_] := Block[{not},
Compile[{{rr, _Integer, 1}},
not[1] = 0;
not[0] = 1;
Module[{rrnew = rr},
Do[If[rrnew[[i]] === 0,
rrnew[[i + 1]] = not[rrnew[[i + 1]]]], {i,
Length[rrnew] - 1}];
rrnew]
][Join[{RandomInteger[]}, RandomChoice[{p, 1 - p} -> {1, 0},
n]]]]

In[2]:= markovBits[.9, 10^6]; // Timing

Out[2]= {0.89, Null}

The following counts the values of absolute differences of adjacent
values to give a check of the adjacency probabilities, so the 0s are the
00 and 11 cases.

In[3]:= Tally[Abs[Most[#] - Rest[#]]] &[markovBits[.9, 10^6]]

Out[3]= {{0, 900210}, {1, 99790}}

Darren Glosemeyer
Wolfram Research

### Mark Fisher

Jul 20, 2007, 3:35:06â€¯AM7/20/07
to

Here are 3 functions. The first is your version; the second is your
version compiled; the third is a slightly different version also
compiled.

fun = Function[{n, p},
NestList[

RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1,

0}] &, RandomChoice[{0, 1}], n]];

fun1 = Compile[{{n, _Integer}, {p, _Real}},
NestList[

RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1,

0}] &, RandomChoice[{0, 1}], n]];

fun2 = Compile[{{n, _Integer}, {p, _Real}},
NestList[
If[# == 0, If[RandomReal[] < p, 0, 1],
If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]];

This gives the relative timings:

First[#]/#& @ Table[Timing[i[10^5, .9]][[1]], {i, {fun, fun1, fun2}}]

On my machine I get a factor of 5 speed up for fun1 and a factor of 25
speed up for fun2 relative to fun.

--Mark.

### Mark Fisher

Jul 21, 2007, 4:22:11â€¯AM7/21/07
to

Well, I like both Peter's and Darren's solutions. Peter's can be
speeded up just a touch by compiling, but it beats mine even without
compiling. Darren's can be speeded up by using Peter's idea and
rearranging a bit. It ends up being about the same speed as Peter's.

markovBits1 =

Compile[{{n, _Integer}, {p, _Real}},

Module[{r =
Join[{RandomInteger[]}, RandomChoice[{p, 1 - p} -> {1, 0}, n]]},
Do[If[r[[i]] === 0, r[[i + 1]] = 1 - r[[i + 1]]], {i, n}];
r]]

--Mark

### Yaroslav Bulatov

Jul 21, 2007, 4:49:49â€¯AM7/21/07
to
The Compiled one-liner (fun2) seems to be the fastest solution.

I also compared it with a simple java solution, and it was about 2.7
times faster (19 seconds vs 7 seconds). Using Java solution through
Running JVM with some optimization flags (server,compileThreshold)
took the time to 6 seconds. A simple C solution took 2 seconds, 1.5
seconds with gcc -O9 optimization.

The bottom line seems to be that efficient Mathematica solution can be
25 times faster than inefficient one, porting to C can give 10x speed-
up, and Java is somewhere in between.

fun2 = Compile[{{n, _Integer}, {p, _Real}},
NestList[
If[# == 0, If[RandomReal[] < p, 0, 1],
If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]];

Timing[fun2[4 10^7, .9]][[1]]

public class hmm {
public static int[] sample(double p, int length) {
int[] result = new int[length];
result[0]=(Math.random()>.5)?0:1;
for (int i = 1; i < length; ++i)
result[i]=(Math.random()>p)?(1-result[i-1]):result[i-1];
return result;
}

public static void main(String[] args) {
long start = System.currentTimeMillis();
sample(.9,4*10000000);
System.out.println((System.currentTimeMillis()-start)/1000.);
}
}

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define n 40000000

main() {
int* result = malloc(4*n);
double start = clock();
int i;
result[0] = 0;
for (i = 1; i<n;++i)
result[i]=(rand()/((double)RAND_MAX + 1)>.9)?(1-result[i-1]):
(result[i-1]);
printf("%d\n",result[i-1]);
printf("%f\n",(clock()-start)/CLOCKS_PER_SEC);
return 0;

### Carl Woll

Jul 22, 2007, 4:34:35â€¯AM7/22/07
to
Yaroslav Bulatov wrote:

>The Compiled one-liner (fun2) seems to be the fastest solution.
>
>I also compared it with a simple java solution, and it was about 2.7
>times faster (19 seconds vs 7 seconds). Using Java solution through
>Running JVM with some optimization flags (server,compileThreshold)
>took the time to 6 seconds. A simple C solution took 2 seconds, 1.5
>seconds with gcc -O9 optimization.
>
>The bottom line seems to be that efficient Mathematica solution can be
>25 times faster than inefficient one, porting to C can give 10x speed-
>up, and Java is somewhere in between.
>
>

Here is a much faster solution:

markov[len_, p_] := BitAnd[
Accumulate @ Prepend[ UnitStep[RandomReal[{-p, 1 - p}, len]],
RandomInteger[]],
1
]

In[7]:= fun2[10^7, .9]; // Timing
Out[7]= {4.656,Null}

In[8]:= markov[10^7, .9]; // Timing
Out[8]= {0.766,Null}

A test that toggles occur about 10% of the time:

In[10]:= res = markov[10^7, .9];
N@Total[BitXor[Most[res], Rest[res]]]/(Length[res] - 1)

Out[11]= 0.0998825

A bit of explanation. In general, an efficient Mathematica algorithm
will use vector/array operations as much as possible, so that the kernel
function does the looping through a list. In this vein, both BitAnd[_,
1] and UnitStep[_] are vector operations relying on the listability of
BitAnd and UnitStep.

In fun2, the function NestList is slow because it has to loop through
the list, and doesn't take advantage of vector operations. It might seem
at first glance that a NestList type of approach is unavoidable, since
your application needs to repeatedly apply a custom function to an
initial value. However, there is at least one kernel function,
Accumulate, that very quickly applys a particular function (Plus) to an
initial value (I can't think of any others that are as quick).
Accumulate is a new version 6 function, and it can be about an order of
magnitude quicker than the old equivalent FoldList[Plus, 0, list]
approach. So, using Accumulate is the key to the speed improvement.

In order to apply Accumulate, I created a random vector having 0s with
probability p and 1s with probability 1-p. Then applying Accumulate will
create a list like:

{0,0,0,1,1,1,2,2,2, ...}

The even numbers correspond to one value and the odd numbers correspond
to the other value, since toggling twice brings us back to the same value.

One more comment. If your probability p is equal to a fraction with
small integers, say p=num/den, then it is possible to speed things up
further. For example, with p=.9, we have:

In[21]:= UnitStep[RandomReal[{-.9, .1}, 10^7]]; // Timing
UnitStep[RandomInteger[{-9, 0}, 10^7]]; // Timing

Out[21]= {0.562,Null}

Out[22]= {0.297,Null}

This improvement brings the timing down to about .5 seconds, which is
almost 10 times faster than your fun2 function.

Carl Woll
Wolfram Research

>fun2 = Compile[{{n, _Integer}, {p, _Real}},
> NestList[
> If[# == 0, If[RandomReal[] < p, 0, 1],
> If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]];

### Mark Fisher

Jul 23, 2007, 3:54:26â€¯AM7/23/07
to
> >>Here are 3 functions. The first is your version; the second is your
> >>version compiled; the third is a slightly different version also
> >>compiled.
>
> >>fun = Function[{n, p},
> >> NestList[
> >> RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1,
> >> 0}] &, RandomChoice[{0, 1}], n]];
>
> >>fun1 = Compile[{{n, _Integer}, {p, _Real}},
> >> NestList[
> >> RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1,
> >> 0}] &, RandomChoice[{0, 1}], n]];
>
> >>fun2 = Compile[{{n, _Integer}, {p, _Real}},
> >> NestList[
> >> If[# == 0, If[RandomReal[] < p, 0, 1],
> >> If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]];
>
> >>This gives the relative timings:
>
> >>First[#]/#& @ Table[Timing[i[10^5, .9]][[1]], {i, {fun, fun1, fun2}}]
>
> >>On my machine I get a factor of 5 speed up for fun1 and a factor of 25
> >>speed up for fun2 relative to fun.
>
> >>--Mark.

Carl makes the clever suggestion to speed things up further using
RandomInteger, but he seems to suggest this idea isn't generally
applicable by referring to "small integers". A small amount of
experimentation suggests it works well for arbitrary p (as long as p
is a machine number). One simply uses Rationalize[p,0] to get the
nearest fraction. Here's a modification of Carl's function:

markov1[len_, p_] :=
With[{r = Rationalize[p, 0]},
BitAnd[Accumulate@
Prepend[UnitStep[
RandomInteger[{-Numerator[r],
Denominator[r] - Numerator[r] - 1}, len]], RandomInteger[]],
1]]

--Mark