I have written an algorithm to do any number monte carlo simulations for a given system.When I do 10,000 simulations each of length 254, my python+numpy routine takes about 51 seconds to run.When I test the same algorithm under my cython+numpy algorithm I get about 36 seconds. I realize that 15 seconds represents a 30% increase, but having typdef'd all variables I expceted to see more than that.The python+numpy algorithm is as follows:import numpy as npdef monteCarlo(sims, pers):"""static python implementation of monte carlo simulation""""g2 = np.array([0.01, -0.03])g3 = np.array([0.02, 0.01, -0.03])marcov2 = np.array([[0.9, 0.1],[0.5, 0.5]]).cumsum(1)marcov3 = np.array([[0.5, 0.45, 0.05],[0.05, 0.85, 0.10],[0.25, 0.25, 0.5]]).cumsum(1)y2 = np.zeros((sims, pers))y3 = np.zeros((sims, pers))for sim in range(sims):s2 = 0s3 = 0epsilon = np.random.normal(0, 0.02, pers) # generate random shocksfor t in range(1, pers):r_num = np.random.rand()y2[sim, t] = g2[s2] + y2[sim, t - 1] + epsilon[t]y3[sim, t] = g3[s3] + y3[sim, t - 1] + epsilon[t]s2 = np.where(marcov2[s2, :] > r_num)[0][0]s3 = np.where(marcov3[s3, :] > r_num)[0][0]return [y2, y3]The cython+numpy algorithm is like this:import numpy as npimport cythoncimport cythoncimport numpy as npDTYPE = np.floatctypedef np.float_t DTYPE_t@cython.boundscheck(False)def monteCarlo(unsigned int sims, unsigned int pers):"""Cython implementation of a monteCarlo simulation"""cdef np.ndarray[DTYPE_t, ndim=2] marco2, marco3, y2, y3cdef np.ndarray[DTYPE_t, ndim=1] g2, g3, epsilonmarco2 = np.array([[0.9, 0.1],[0.5, 0.5]], dtype=DTYPE).cumsum(1)marco3 = np.array([[0.5, 0.45, 0.05],[0.05, 0.85, 0.10],[0.25, 0.25, 0.5]]).cumsum(1)y2 = np.zeros([sims, pers], dtype=DTYPE)y3 = np.zeros([sims, pers], dtype=DTYPE)g2 = np.array([0.01, -0.03], dtype=DTYPE)g3 = np.array([0.02, 0.01, -0.03], dtype=DTYPE)cdef float sigma = 0.02cdef unsigned int sim, tcdef unsigned int s2, s3cdef DTYPE_t rand_numfor sim in range(sims):epsilon = np.random.normal(0, sigma, pers)s2 = 0s3 = 0
for t in range(1, pers):rand_num = np.random.rand()y2[sim, t] = g2[s2] + y2[sim, t - 1] + epsilon[t]y3[sim, t] = g3[s3] + y3[sim, t - 1] + epsilon[t]s2 = <unsigned int> np.where(marco2[<unsigned int> s2, :] > rand_num)[0][0]s3 = <unsigned int> np.where(marco3[<unsigned int> s3, :] > rand_num)[0][0]return [y2, y3]I am new to cython and recognize some of the cython code may be un-needed (perhaps the casing of s2 and s3 towards the bottom), but I tried to follow the cython and numpy tutorial in the cython docs.Any suggestions on how to get more of a speed up??(note I attached both of these files)
Hi Spencer,
I agree with what Jake pointed out in terms of making python calls inside of your cython code. An easy way to see where this is happening is to use cython -a <pyx_file> and look at the html that it outputs (see http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types).
I also wanted to add that one such place that you'll want to replace a call to a numpy function is your call to np.random.rand. I've found that drawing random numbers directly from randomkit (which is what numpy.random calls under the hood), rather than going through the numpy interface can give some large performance increases. As an example, take a look at cIntegratorSimple.pyx in:
https://bitbucket.org/joshua.adelman/pylangevin-integrator/src
This has given me a 4x speed-up in some cases.
Hope that helps, Josh