Encouraged from this benchmark here on matrix-matrix products with OpenMP in Eigen,
https://plafrim.bordeaux.inria.fr/doku.php?id=people:guenneba
I thought it would be nice to try this out in a Stan program... does someone have a compute intensive Stan model around which would be suitable to test this? It should be as easy as enabling -fopenmp to get a speedup proportional to the number of cores on the machine. So this speedup is almost for free given you have sufficient CPUs available and a Stan program using so called GEMM expressions.
Best,
Sebastian
Of course, this is not anything useful for ODE matters, yes. For this I have a few ideas, namely
1. one autodiff arena per thread which would allow general Stan language "pfor" constructs.
2. enabling OpenMP in a natural way on few commands in the language. For example integrate_ode_* functions have a "natural" generalization to multiple cases by introducing a variant with ragged arrays (for the time vector) and an array for the parameters. I guess other Stan functions could be generalized similarly.
Option 1 is best, I think, but could be hard to do. Option 2 is straightforward, but does not generalize easily (and needs ragged arrays to be there).
Sebastian
> > On Aug 21, 2016, at 1:09 PM, Sebastian Weber: