Hey Brendan,
Sorry for confusing you. I hope this will clear up the ideas.
PRNGs and NORTA are completely independent. PRNGs just generate that random stream of bytes, as I'm sure you know. You are correct that most wind up providing those bytes as a uniform distribution of [0,1). Right there is where people could want to specify to use their own PRNG (I assume as a rand.Source?) adding a degree of freedom we could design the API for. Or we could marry math/rand directly and not allow people to use their own PRNG. Note that if you look at the rand.Float64 documentation it is not perfectly distributed in [0,1) theoretically (due to a bug in the initial shipment of Go1, see
http://golang.org/src/pkg/math/rand/rand.go?s=3018:3050#L94), so some people may want to be able to use their own PRNG. This is opportunity 1 for designing the distribution API.
So with these [0,1) numbers coming from a PRNG, the next step is to convert them from a uniform distribution to something more useful. The most useful is the normal distribution for a variety of reasons, one of which I'll come back to. Luckily, the math/rand package already has a NormFloat64 function. It uses the [0,1) distribution to generate normally distributed variates using a ziggarut-128 algorithm. There are other algorithms that are less efficient, but people may want to use those instead. Additionally, they may want to use custom distributions with our library, so that suggests using a Go interface for distributions. For arbitrary distributions, using the inverse transform method is an option of turning [0,1) into the PDF. Again, that is a detail that could be hidden behind an interface. This is opportunity 2 for designing the distribution API.
Finally, these single variate solution pathways are fine. NORTA really comes into play with correlated multivariate distributions, which I will get back to in a second. I see you are suggesting the function:
Rand([]float64) []float64
I am not sure if this is for every distribution struct or a free-floating function. I'll assume (uh oh) you mean to have it be
func (s SomeDistribution) Rand([]float64) []float64
I am also not sure what the []float64 parameter is for. The way I am (probably mistakenly) interpreting it is "give SomeDistribution a list of [0,1) for each variate, get back the PDF for each variate". In this case, it would be more difficult to use when the multivariates are sampled from different distributions, such as:
multiVars := []float64{0.2, 0.5, 0.44}
pdf1 := distX.Rand([]float64{multiVars[0]})[0]
pdf2 := distY.Rand([]float64{multiVars[0]})[0]
pdf3 := distZ.Rand([]float64{multiVars[0]})[0]
I'm pretty sure I am misinterpreting something here, so I'll leave that for you to help set me straight. Back to NORTA. When generating correlated multivariates, normal statistically independent variates are generated from the normal distribution (typically), and then Cholesky decomposition is applied to correlate them. This provides correlated multiple normally distributed variates. However, the masses may demand correlated variates between, say, a gamma and a linear distribution. NORTA is the process of taking these correlated multiple normally distributed variates (with an extra step before the Cholesky decomposition), running it through the inverse CDF function to get back [0,1), which are correlated this time, and then using these [0,1) to inverse transform sample back to any distributions the users may want. This is opportunity 3 in designing the API.
So putting these three opportunities together, what would an example look like?
type RandInterface interface {
Float64() float64 // Opportunity 1: Allow use of custom PRNGs
}
type DistX struct {
r RandInterface
// other specifics for dist X
}
func NewDistXSource(r RandInterface) *DistX {
return &DistX{r, /* etc */} // Opportunity 1: Use custom PRNG
}
func NewDistX(seed int64) *DistX {
return &DistX{math.New(math.NewSource(seed)), /* etc */} // Opportunity 1: Use std lib
}
// Repeat above declarations for DistY, DistZ
type Distribution interface {
Rand() float64 // Opportunity 2: Let client distributions figure out how to generate variate
}
func (x *DistX) Rand() float64 {
// Use x.r to get [0,1) and generate variate, Opportunity 2 says "I don't care how" here.
}
// Repeat above func declaration for DistY, DistZ
// Above code can be in dists. Below code in mv.
// Opportunity 3: Could put this function type under the Distribution interface mentioned above, hence "room for design"
type InverseTransformFn func(float64) float64
// Can use our own matrix package as arg below.
// Opportunity 3: Maybe simpler correlated multivariate function?
func CorrelatedMultivariates(corrMatrix [][]float64, dists []InverseTransformFn) (results []float64) {
// Step 1: Adjust Pearson -> Spearman correlations using a Hotelling paper, create var normalDist as normal distribution
// Step 2: Generate len(dists) std normally distributed variates
for i, dist := range dists {
temp[i] = normalDist.Rand()
}
// Step 3: Cholesky decomposition to correlate the temp[i] variates
// Step 4: Convert to targeted correlated distributions. Correlations no longer exact, only approximate
for i, dist := range dists {
results = append(results, dist(temp[i]))
}
}
// Opportunity 3: Leverage Go-idiom of reusing code for stat. independent multivariates
func Multivariates(dists []InverseTransformFn) []float64 {
// Step 1: Generate correlation matrix that is zero everywhere, with 1 diagonals
return CorrelatedMultivariates(corrMatrix, dists)
}
In the code snippet above, the first opportunity (PRNGs) is independent of the other two. Opportunities 2 and 3 if designed properly can leverage each other's interfaces if so desired, or stay separate completely. Really the multivariate code that this point is just the two functions "Multivariates" and "CorrelatedMultivariates" because they leverage the single variate algorithms, so less code needed.
Let me know if I am talking rubbish at this point.