Hi Arthur,
I think the amount of polymorphism is killing performance (e.g. all those overloaded functions). It is definitely possible to get such code to go fast, but it requires a bit more expertise. The main things I got from looking at ziggurat is:
- You're using boxed Vectors. Even with the SPECIALIZE pragma, those Vectors won't become unboxed. If you want the code to stay polymorphic you can use add an Unbox type class constraint on ziggurat and switch to unboxed Vectors.
- RandomGen is not specialized, so there will be some overhead each time a random number is required.
I took a quick look at the Core. You can find the core for ziggurat here at
https://gist.github.com/4106836 . There are two interesting things to note. First, lets look at the type:
$fUniformGaussianValuesDoubleDouble_ziggurat
:: forall g_a2D9.
RandomGen g_a2D9 =>
(Vector Double, Vector Double, Vector Word)
-> g_a2D9 -> (Double, g_a2D9)
Here we see that:
- RandomGen hasn't been specialized away.
- We're using boxed Vectors of boxed Doubles.
There's a lot of Core in the body, but note that this happens a lot:
case $wrandomIvalInteger
@ g_a2D9
@ Word
$dRandomGen_a2Da
$fNumWord
$fRandomCSize4
$fRandomCSize3
g1_X1UE
This is the call to the unspecialized RandomGen. Note how we're passing a number of dictionary parameters here.
I also recommend that you look at Bryan's statistics package, which tackles a similar domain and has good performance from what I understand.