R interface performance

48 views
Skip to first unread message

rickhg12hs

unread,
Sep 25, 2012, 4:10:01 PM9/25/12
to sage-s...@googlegroups.com
Is there a way to use R from Sage to speed things up?  Is there something in Sage that could be improved to approach R's speed?

E.g., here's a toy demonstration of how slow R's summary is from Sage.

sage: version()
'Sage Version 5.3, Release Date: 2012-09-08'
sage: %timeit  r.summary(range(1000))
5 loops, best of 3: 17.5 s per loop


Regards.

William Stein

unread,
Sep 25, 2012, 4:14:42 PM9/25/12
to sage-s...@googlegroups.com
I don't understand your questions above at all, but want to point out
that in your example the time is completely dominated by converting
the Python object "range(1000)" to R. Consider:

sage: %timeit r.summary(range(1000))
5 loops, best of 3: 2.2 s per loop
sage: s = r(range(1000))
sage: %timeit r.summary(s)
125 loops, best of 3: 5.26 ms per loop

-- William


>
>
> Regards.
>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-support" group.
> To post to this group, send email to sage-s...@googlegroups.com.
> To unsubscribe from this group, send email to
> sage-support...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sage-support?hl=en.
>
>



--
William Stein
Professor of Mathematics
University of Washington
http://wstein.org

William Stein

unread,
Sep 25, 2012, 4:21:20 PM9/25/12
to sage-s...@googlegroups.com
On Tue, Sep 25, 2012 at 1:14 PM, William Stein <wst...@gmail.com> wrote:
> On Tue, Sep 25, 2012 at 1:10 PM, rickhg12hs <rickh...@gmail.com> wrote:
>> Is there a way to use R from Sage to speed things up? Is there something in
>> Sage that could be improved to approach R's speed?
>>
>> E.g., here's a toy demonstration of how slow R's summary is from Sage.
>>
>> sage: version()
>> 'Sage Version 5.3, Release Date: 2012-09-08'
>> sage: %timeit r.summary(range(1000))
>> 5 loops, best of 3: 17.5 s per loop
>
> I don't understand your questions above at all, but want to point out
> that in your example the time is completely dominated by converting
> the Python object "range(1000)" to R. Consider:
>
> sage: %timeit r.summary(range(1000))
> 5 loops, best of 3: 2.2 s per loop
> sage: s = r(range(1000))
> sage: %timeit r.summary(s)
> 125 loops, best of 3: 5.26 ms per loop

Here is doing something similar (but not identical) in Sage (pure Python):

sage: v = range(1000)
sage: %timeit [min(v), median(v), mean(v), max(v)]
625 loops, best of 3: 121 µs per loop

Here's something related:

sage: v = stats.TimeSeries(range(1000))
sage: %timeit [v.min(), v.mean(), v.max()]
625 loops, best of 3: 3.67 µs per loop

I'm sure R's summary is on the order of microseconds as well, and that
5.26ms is almost all overhead. One can use the C interface to R to
get much better performance:

sage: import rpy2.robjects as robjects
sage: v = robjects.r(range(1000))
sage: summary = robjects.r['summary']
sage: %timeit summary(v)
625 loops, best of 3: 898 µs per loop
sage: print(summary(v))
Min. 1st Qu. Median Mean 3rd Qu. Max.
999 999 999 999 999 999


So with the C interface it is 898 microseconds, which to me still
seems like a lot, given that I feel like I could write something in
Cython that does the same thing in about 10 microseconds.

-- William

rickhg12hs

unread,
Sep 25, 2012, 4:37:58 PM9/25/12
to sage-s...@googlegroups.com
You're right.  Converting the Python object to R does dominate.

I'm intrigued by your use of  rpy2.robjects and why the summary is surprising.

Regards.

kcrisman

unread,
Sep 25, 2012, 4:40:25 PM9/25/12
to sage-s...@googlegroups.com

So with the C interface it is 898 microseconds, which to me still
seems like a lot, given that I feel like I could write something in
Cython that does the same thing in about 10 microseconds.


Which is why things like Rcpp exist.

This is sort of like timeit in R:

> v <- c(1:1000)
> system.time(replicate(100,summary(v)))
   user  system elapsed 
  0.058   0.001   0.060 
> system.time(replicate(1000,summary(v)))
   user  system elapsed 
  0.565   0.002   0.567 
> system.time(replicate(1000000,summary(v)))
^C
Timing stopped at: 4.306 0.009 4.315 
> system.time(replicate(10000,summary(v)))
   user  system elapsed 
  5.655   0.003   5.658 

So a half a millisecond even here.

rickhg12hs

unread,
Sep 25, 2012, 10:24:01 PM9/25/12
to sage-s...@googlegroups.com

sage: import rpy2.robjects as robjects
sage: v = robjects.r(range(1000))
sage: summary = robjects.r['summary']
sage: %timeit summary(v)
625 loops, best of 3: 898 µs per loop
sage: print(summary(v))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
    999     999     999     999     999     999

After changing v assignment to:

v = robjects.IntVector(range(1000))

... everything worked correctly and quickly (for my machine anyway).

sage: %timeit summary(v) 
125 loops, best of 3: 1.59 ms per loop
sage: print(summary(v))         
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0   249.8   499.5   499.5   749.2   999.0 


Thanks!
 
Reply all
Reply to author
Forward
0 new messages