On Tue, Sep 25, 2012 at 1:14 PM, William Stein <
wst...@gmail.com> wrote:
> On Tue, Sep 25, 2012 at 1:10 PM, rickhg12hs <
rickh...@gmail.com> wrote:
>> Is there a way to use R from Sage to speed things up? Is there something in
>> Sage that could be improved to approach R's speed?
>>
>> E.g., here's a toy demonstration of how slow R's summary is from Sage.
>>
>> sage: version()
>> 'Sage Version 5.3, Release Date: 2012-09-08'
>> sage: %timeit r.summary(range(1000))
>> 5 loops, best of 3: 17.5 s per loop
>
> I don't understand your questions above at all, but want to point out
> that in your example the time is completely dominated by converting
> the Python object "range(1000)" to R. Consider:
>
> sage: %timeit r.summary(range(1000))
> 5 loops, best of 3: 2.2 s per loop
> sage: s = r(range(1000))
> sage: %timeit r.summary(s)
> 125 loops, best of 3: 5.26 ms per loop
Here is doing something similar (but not identical) in Sage (pure Python):
sage: v = range(1000)
sage: %timeit [min(v), median(v), mean(v), max(v)]
625 loops, best of 3: 121 µs per loop
Here's something related:
sage: v = stats.TimeSeries(range(1000))
sage: %timeit [v.min(), v.mean(), v.max()]
625 loops, best of 3: 3.67 µs per loop
I'm sure R's summary is on the order of microseconds as well, and that
5.26ms is almost all overhead. One can use the C interface to R to
get much better performance:
sage: import rpy2.robjects as robjects
sage: v = robjects.r(range(1000))
sage: summary = robjects.r['summary']
sage: %timeit summary(v)
625 loops, best of 3: 898 µs per loop
sage: print(summary(v))
Min. 1st Qu. Median Mean 3rd Qu. Max.
999 999 999 999 999 999
So with the C interface it is 898 microseconds, which to me still
seems like a lot, given that I feel like I could write something in
Cython that does the same thing in about 10 microseconds.
-- William