Actually, for a while now, "for i in range(...)" is translated into fast
C intelligently. In fact, I believe it's the recommended syntax now,
instead of 0<=i<...
I believe that Cython 0.13 (just released, and making its way into Sage)
does even more to optimize loops intelligently.
> Note that you can get an annotated version of your code. In the
> notebook, this is available via some link; I don't remember what you
> need to do on the command line (try to search the manual).
I usually do something like:
sage -cython -a my-file.pyx
which creates the .html file in the current directory.
Thanks,
Jason
No, you still need to do cdef int i, I believe.
So:
cdef int i
for i in range(10):
....
should be fast.
Jason
This is what I would recommend. You can do something like
from sage.rings.real_mpfr cimport *
def square(RealNumber x):
cdef RealNumber result = x._new()
mpfr_mul(result.value, x.value, x.value, GMP_RNDN)
return result
If you really need speed, you can declare, allocate, and free mpfr_t
variables themselves (declared via "cdef mpfr_t var") rather than
using the RealNumber Python objects as wrappers.
Just out of curiosity, what multi-precision root finding tools are you
using? Also, I don't think numpy supports high-precision floating
point arrays (except as arrays of generic objects).
- Robert
One of the things you might do is use RR (which is based on mpfr), and
when you need to do a sequence of calculations really fast, reach in to
the RR number, do stuff using mpfr, and then create a new RR number with
the new multiprecision result. The RR _add method shows how to do this,
for example:
cdef RealNumber x
x = self._new()
mpfr_add(x.value, self.value,
(<RealNumber>other).value,(<RealField_class>self._parent).rnd)
return x
Notice how it creates a new real number x, does something using mpfr to
manipulate the mpfr value, but then returns the RR number.
You can find lots more examples in devel/sage/sage/rings/real_mpfr.pyx
Thanks,
Jason
With the exception of lots of special functions or very large
precisions, this probably won't be a speed advantage over Sage's
native RealNumbers given that they both have Python overhead. (As the
precision grows, the arithmetic itself will be comprable, as they are
both using MPIR/GMP under the hood.)
> It supports the features you need, is quite efficient, and also has
> multi precision vectors and matrices.
> And after the problematic steps, getting back to numpy arrays isn't
> really a problem.
>
> If you really want to get out the every tiny bit of efficiency out of
> your code, then you have to stick to cython and mpfr.
> But I'm not sure if this is really necessary.
>
> greez
> maldun
>
> --
> To post to this group, send email to sage-s...@googlegroups.com
> To unsubscribe from this group, send email to sage-support...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-support
> URL: http://www.sagemath.org
>
What do you mean by symbolic? Are you actually using maxima and the
various calculus modules?
> - fully optimize in Cython by turning to doubles everywhere, although
> it gets very fast here happens what I was already afraid of: the
> algorithm goes belly up after 15 steps :( (I would need a lot more
> steps). In step 14 values like 2.51885860e-34 appear and in
> combination with the numerical noise gathered this turns out to
> produce very wrong values at the next step.
>
> @Robert: I'm trying to implement an algorithm that computes a sequence
> of functions v_k according to the rule v_k(x) = \max \{ K-e^x, \int
> v_{k-1}(y) p(x+y) dy \}, v_0(x)=K-e^x, where p is a prob. density. The
> background is an optimal stopping problem in stochastics. So at each
> step I essentially first compute the integral and then determine where
> the integral and x -> K-e^x intersect, this just by the basic
> find_root function in Sage.
>
> It's not difficult (however very annoying...) to write down a formula
> for the integral (given the specific density p I'm using) and work out
> formulae for all the coefficients that appear in this formula for the
> integral in terms of those appearing in v_{k-1}. The number of
> coefficients (and hence computations) involved in the formula for v_k
> increases very quickly with k, that explains why the algorithm gets
> slow in 'symbolic mode'. However 'double mode' is not good enough as
> mentioned above.
It sounds like you're trying too extreems--how about using your
"double mode" algorithm, but instead using elements of RealField(N)
where N > 53. This should be much faster than "symbolic" (if I
understand you right, calling find_root and integrate) but have higher
precision than using raw doubles. This may be enough, without getting
your hands on MPFR directly yourself.
You might also consider looking at how the dickman rho function is
implemented http://hg.sagemath.org/sage-main/file/b5dab6864f35/sage/functions/transcendental.py#l464
which sounds similar in spirit.
> I will try your suggestion for using mpfr, thanks Robert and Greg!
> (however it looks a bit scary to me ;)). I need to store the huge
> amounts of coefficients somewhere, I guess numpy is my best guess even
> though I have to declare the dtype as object then, no?
What do you mean by huge? Would a list suffice? Or if it's a
polynomial, what about a polynomial in RR? For high precision, the
per-element overhead is constant, so I don't see an advantage to using
NumPy here just as a storage divice if the dtype is object.
- Robert
I believe the standard find_root uses scipy, which is limited to double
precision.
Jason
Shouldn't be too hard to follow up with Newton's method to increase
the precision if desired.
- Robert
Just out of curiosity, how much did it help?
> If I could further speed it up by a few
> factors that would be great. I will further try if I can figure out
> how to use Cython to speed up parts of the algorithm where a lot of
> looping and computing is done.
That should be feasible--you probably don't need to change the whole
thing--profile to see what parts. Excessive coercion may also impact
things.
>> You might also consider looking at how the dickman rho function is
>> implementedhttp://hg.sagemath.org/sage-main/file/b5dab6864f35/sage/functions/tra...
>> which sounds similar in spirit.
>>
>> > I will try your suggestion for using mpfr, thanks Robert and Greg!
>> > (however it looks a bit scary to me ;)). I need to store the huge
>> > amounts of coefficients somewhere, I guess numpy is my best guess even
>> > though I have to declare the dtype as object then, no?
>>
>> What do you mean by huge? Would a list suffice? Or if it's a
>> polynomial, what about a polynomial in RR? For high precision, the
>> per-element overhead is constant, so I don't see an advantage to using
>> NumPy here just as a storage divice if the dtype is object.
>
> The function v_k is not a polynomial but a piecewise function
> consisting of linear combinations of terms a*x^b*exp(c*x) (*) in each
> part of its domain. 'Huge' means that the number of coefficients (i.e.
> all the a, b and c's in (*)) to be computed in the k-th step of the
> algorithm roughly grows like k^2. These coefficients are currently
> organized in 4-dimensional Numpy-arrays: the k runs along one axis and
> there are three other 'counters' (one to keep track of the part of the
> domain etc.). I would prefer not to give up this way of representing
> the coefficients as my notes use this form as well. Would you say I
> could better use 4-d Python lists rather?
No, for multi-dimensional stuff, NumPy is better, even if you're using
the object dtype.
- Robert
Hello,
There is no way to work around this:
->RDF (Double Precision numbers) use the processor arithmetic
->RealField(x) use the MPFR library.
In the first case, Sage (python, cython) passes the operations to the
processor which performs each operation in 1 (or may be 2) clock cycles.
In the second case, MFR, will call a (may be small) routine to perform
the task. Even if MPFR is perfectly optimized, this is much more costly.
Use MPFR when you need high precision, whatever is costs, or a special
rounding method.
Even worst (or better): the processor implements pipelining, parallelism
and so on.
Something I find very spectacular:
def prod(a,b):
for i in range(1,1):
a*b
c=random_matrix(RDF,1000)
d=random_matrix(RDF,1000)
prod(c,d)
try it; count the operations (10^9 multiplications). On my computer
(3ghz), prod(c,d) takes 0.18 second: that is to say 5 Gigaflops! (or 10
is you take account of the additions): more than 1 operation by clock
cycle. This because we use here: 1) the processor fpu, 2) the blas
Atlas, which allows the processor to perform at full speed (and
minimizes cache misses).
Now, change RDF to RR. It takes 421 seconds to perform prod(c,d) on my
computer: that is to say *2000* times more than with RDF numbers,
because 1) we do not use the fpu of the processor, 2) there is no
optimization of the matrix product (and a lot of cache misses also!).
For "number crunching" use the processor's fpu (RDF) directly!
Yours
t.d.
> Many thanks, Kees
>
> Alright, many thanks for the clear and extensive answer Thierry.
> Bottomline is thus that I'll have to live with it.
>
> On a sidenote, I must admit it surprises me. I'm only an amateur
> programmer, let alone that I know anything about the subtleties of how
> cpus interact exactly with code, but you would somehow expect that it
> should be possible when you add two arbitrary precision numbers say,
> you break them up in smaller chunks (like 2115+3135 =
> (21+31)*100+(15+35)) so that each of the smaller chunks are
> essentially doubles to be added and can hence be performed at optimal
> speed. You would get the overhead (breaking the numbers up in chunks
> and putting them together again) plus the cycles necessary to make the
> additions, but this would be a lot less than a factor 100 I'd have
> guessed.
>
> Well, anyhow, of course the above is layman talk and I will be missing
> essential points, but that's probably why I'd have expected arbitrary
> precision speed to be a lot closer to double speed.
It's more than that. You can see at
http://www.mpfr.org/mpfr-current/mpfr.html#Introduction-to-MPFR that
MPFR guarantees a lot of stuff happens (modulo bugs, of course) in a
cross-platform manner.
To see the sort of thing you mention above, you might look at the
quad-double library (this isn't arbitrary precision, though):
Jason