The sage matrices use unsigned long for the mod_int datatype. This does not fit into a (signed) python int, so whenever it reaches Python (assigning to a Python variable, or passing as an argument to a non-cdef function) it gets converted to a Python arbitrary-precision integer. This wastes quite a bit of time when creating it and gets worse once you start doing arithmetic:
--------------------------------------
ctypedef unsigned long mod_int # this is how we define it
cpdef caster_slow():
cdef int i
foo = None
for i in range(10000000):
foo = <mod_int>(1.0)
cpdef caster_fast():
cdef int i
foo = None
for i in range(10000000):
foo = <int>(1.0)
--------------------------------------
Then we get
sage: timeit('caster_slow()')
5 loops, best of 3: 137 ms per loop
sage: timeit('caster_fast()')
25 loops, best of 3: 34.4 ms per loop
Should we just switch to (signed) long? This wastes one bit (factor of 2) of maximum modulus length, but at least for now we define MAX_MODULUS = 2**23 so this wouldn't be an issue.