generating correlated random numbers from correlation matrix

Brian

unread,

Aug 26, 2002, 7:03:52 PM8/26/02

to

Hello,

I was wonding if any of you can point me in the direction of a published proof
of the following:

Problem: Given an NxN correlation matrix A between random events, such that
the correlation between event i and event j occuring is A(i,j) = A(j,i),
generate a vector x of length N with correlated random numbers between 0 and 1,
such that if a large number of vectors x are generated, their correlation will
match that given in matrix A.

Solution: use the cholesky factorization to find NxN lower-triangular matrix L,
where L*L' = A, L' being the transpose of L. Generate a vector y of random
numbers (evenly distributed) between 0 and 1. Use L*y = x as your vector of
correlated random numbers

Many Thanks --

Brian

Andrew Ross

unread,

Aug 27, 2002, 4:33:00 PM8/27/02

to

a...@c.com (Brian) wrote in message news:<stya9.135334$vg.22...@twister.nyroc.rr.com>...

Check out the NORTA (Normal To Anything) idea. Various papers are:

http://critical.orie.cornell.edu/~shane/pubs/NORTAProperties.pdf
http://users.iems.nwu.edu/~nelsonb/norta4.ps
http://www.ist.ucf.edu/past_lectures01.htm

A yahoo search for: norta normal distribution
comes up with other interesting articles.

Enjoy!
Andrew Ross
Industrial and Systems Engineering
Lehigh University
Bethlehem, PA

Greg Heath

unread,

Aug 29, 2002, 5:27:43 AM8/29/02

to

a...@c.com (Brian) wrote in message news:<stya9.135334$vg.22...@twister.nyroc.rr.com>...

> Hello,
>
> I was wonding if any of you can point me in the direction of a published
> proof of the following:
>
> Problem: Given an NxN correlation matrix A between random events, such that
> the correlation between event i and event j occuring is A(i,j) = A(j,i),
> generate a vector x of length N with correlated random numbers between 0 and
> 1, such that if a large number of vectors x are generated, their correlation
> will match that given in matrix A.

1. Not enough information for a unique solution
2. Assume x(i) are to be uniformly distributed
3. Then the Pearson and Spearman correlation coefficients are identical
A{sij} = A{pij}
4. Assume y(j) and z(i) are gaussian distributed
5. Then the Spearman and Pearson correlation coefficients obey a sinusoidal
relationship (something like s = 2*sin(pi*p/6)).
6. Convert A{sij} to B{ pij = (6/pi)*invsin(sij/2) }

> Solution: use the cholesky factorization to find NxN lower-triangular matrix L,
> where L*L' = A, L' being the transpose of L.

L*L' = B

>Generate a vector y of random numbers (evenly distributed) between 0 and 1.

generate iid gaussian y (correlation matrix is identity matrix)

>Use L*y = x as your vector of correlated random numbers

z = L*y (gaussian with Pearson B and Spearman A)

x = GAUSSCDF(z) Gaussian cumulative distribution function.

If any x(i) are not uniformly distributed, there may not be a solution.

Hope this helps.

Greg

Nick Maclaren

unread,

Aug 29, 2002, 5:48:12 AM8/29/02

to

Sorry about revisiting, but the latest posting made me realise
that I had not stated what I was on about in the normal case.

In article <stya9.135334$vg.22...@twister.nyroc.rr.com>,

a...@c.com (Brian) writes:
|>
|> Solution: use the cholesky factorization to find NxN lower-triangular matrix L,
|> where L*L' = A, L' being the transpose of L. Generate a vector y of random
|> numbers (evenly distributed) between 0 and 1. Use L*y = x as your vector of
|> correlated random numbers

I realise that I may not have stated things explicitly, but this
isn't quite true. The Cholesky factorisation is unstable in the
presence of multiple zero eigenvalues - it is NOT true (though
it is often claimed) that it fails with all semi-definite ones.
There is a simple modification to allow it to handle up to two
zero eigenvalues, but beyond that madness lies. This, as far as
I know, is unpublished.

What is also not published as far as I know is the error analysis
of Gaussian decomposition and Cholesky factorisation for positive
SEMI-definite matrices. But, if you work through Wilkinson and
Reinsch, the changes to their analysis are trivial - a couple of
strict inequalities need modifying and that is about all. This
leads to a trivial way to handle semi-definite matrices, which is
what I discovered in 1973 and is in the NAG library.

Just add k*N*eps*max_element to the diagonals, and Bob's your
uncle! k is 5 for Gaussian decomposition and 13 for Cholesky
factorisation, if I recall, but it is a long time ago. This
won't handle grossly unscaled matrices, of course, but there
is an equivalent kludge for those that I never worked out
because it was a lot more complicated to analyse. eps is the
relative error in the matrix elements, not necessarily the
machine epsilon.

An alternative approach (discovered later, I believe, by someone
else) is to do a stable LDL' decomposition and multiply the
L by the square root of the diagonal matrix, treating negative
elements as zero. I have never analysed this, but should expect
it to be equivalent.

And, of course, there are many solutions involving full matrices,
based on any one of several approaches. They are less efficient,
but equally good otherwise.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679

George Marsaglia

unread,

Jan 31, 2003, 10:42:32 AM1/31/03

to

"Brian" <a...@c.com> wrote in message
news:stya9.135334$vg.22...@twister.nyroc.rr.com...

Most students in statistics courses that rise to the level of covariance
matrices and linear transformations should be able to verify that if
y = (y_1,y_2,...,y_n) is a 1xn vector of random variables with nxn
covariance matrix C, and if A is an nxm constant matrix, then the
covariance matrix of the 1xm vector yA is A'CA..
The y's need not be normal or uniform or anything else, except that their
covariance matrix must exist. It need not be invertible; for the case
where C is singular, see G. Marsaglia , ``Conditional means and covariances
of normal variables with singular covariance matrix," (1964),
J. of the Amer. Statis. Assoc., v 59, 1203--1204,
and for representing C=T`T with T triangular, before the term Cholesky was
in use,
see G. Marsaglia (1957) ``A note on the construction of a multivariate
normal sample,"
IRE Transactions on Information Theory, v IT--3, 149.

George Marsaglia