Does the R magic in ipython know how to take a pd.DataFrame as input?

325 views
Skip to first unread message

Ariel Rokem

unread,
Sep 17, 2012, 7:41:51 PM9/17/12
to pyd...@googlegroups.com
Hi everyone, 

I am creating a DataFrame object in one cell of an ipython notebook and would like to then pass it on into R for some statistical analysis (using the ezAnova library, or 'lm'). Is that supposed to work? That is, if I enter the following in an ipython cell 

%%R -i my_data_frame 

Should I then be able to just treat my_data_frame as an R dataframe? 

Thanks much, 

Ariel 

Wes McKinney

unread,
Sep 17, 2012, 7:46:43 PM9/17/12
to pyd...@googlegroups.com
> --
>
>

I don't believe it does at the moment. The %%R cell magic can send
vanilla NumPy arrays but not pandas.DataFrame yet-- the code exists to
do the transferring via rpy2 but it hasn't been integrated into
IPython. I'd hoped by now to have the time to do this but not yet--
hopeful that some kind soul will take a look at it.

- Wes

Ariel Rokem

unread,
Sep 17, 2012, 11:40:53 PM9/17/12
to pyd...@googlegroups.com
Hi Wes, 

OK - that makes sense. I get some things to appear on the R side, but not a proper data frame. My knowledge of R is pretty rudimentary, so I am afraid I can't be much help. 

Thanks! 
Ariel   

Fernando Perez

unread,
Sep 20, 2012, 8:13:29 PM9/20/12
to pyd...@googlegroups.com
On Mon, Sep 17, 2012 at 8:40 PM, Ariel Rokem <aro...@gmail.com> wrote:
> OK - that makes sense. I get some things to appear on the R side, but not a
> proper data frame. My knowledge of R is pretty rudimentary, so I am afraid I
> can't be much help.

Don't forget the author of %%R is just down the street from you :) He
might be up to a little sprint on the matter if you ping him.

If not, we could try to make it happen in a couple of weeks, I suspect
I'll be paying you guys a visit soon.

f

Ariel Rokem

unread,
Sep 21, 2012, 11:15:54 PM9/21/12
to pyd...@googlegroups.com
Hi Fernando, 

On Thu, Sep 20, 2012 at 5:13 PM, Fernando Perez <fpere...@gmail.com> wrote:
On Mon, Sep 17, 2012 at 8:40 PM, Ariel Rokem <aro...@gmail.com> wrote:
> OK - that makes sense. I get some things to appear on the R side, but not a
> proper data frame. My knowledge of R is pretty rudimentary, so I am afraid I
> can't be much help.

Don't forget the author of %%R is just down the street from you :)  He
might be up to a little sprint on the matter if you ping him.

OK - following your prompting, I took another look at this. The truth is that we seem to be very close to have this working. Take a look here (cell 32 and onwards are changed relative to the example Jonathan wrote): 


The problem right now is that pandas sets the dtype for columns that are strings as 'object' and then R doesn't quite know what to do with that (or does something that I don't understand). Wes - is there a particular reason that pandas does that? 

I am actually pretty happy with the current state of affairs, by using this kind of hack of manually fixing the dtype for string columns, I could make my own use-case work beautifully. No more writing/reading to csv files...

 But it would be nice to make this automatic. 
 
If not, we could try to make it happen in a couple of weeks, I suspect
I'll be paying you guys a visit soon.

If we don't find a solution for this until then, we can take a look together when you are around. Will be good to see you on the farm! :-) 

Cheers, 

Ariel 

Ariel Rokem

unread,
Sep 27, 2012, 8:00:46 PM9/27/12
to pyd...@googlegroups.com
Hi everyone, 

Following up on this: 

On Fri, Sep 21, 2012 at 8:15 PM, Ariel Rokem <aro...@gmail.com> wrote:
Hi Fernando, 

On Thu, Sep 20, 2012 at 5:13 PM, Fernando Perez <fpere...@gmail.com> wrote:
On Mon, Sep 17, 2012 at 8:40 PM, Ariel Rokem <aro...@gmail.com> wrote:
> OK - that makes sense. I get some things to appear on the R side, but not a
> proper data frame. My knowledge of R is pretty rudimentary, so I am afraid I
> can't be much help.

Don't forget the author of %%R is just down the street from you :)  He
might be up to a little sprint on the matter if you ping him.

OK - following your prompting, I took another look at this. The truth is that we seem to be very close to have this working. Take a look here (cell 32 and onwards are changed relative to the example Jonathan wrote): 


The problem right now is that pandas sets the dtype for columns that are strings as 'object' and then R doesn't quite know what to do with that (or does something that I don't understand). Wes - is there a particular reason that pandas does that? 

I am actually pretty happy with the current state of affairs, by using this kind of hack of manually fixing the dtype for string columns, I could make my own use-case work beautifully. No more writing/reading to csv files...

 But it would be nice to make this automatic. 

Ariel Rokem

unread,
Oct 18, 2012, 4:41:34 PM10/18/12
to pyd...@googlegroups.com
Hi Wes and all, 

We have written something like that (see PR I linked to in an earlier email in this thread). In the discussion that ensued, someone suggested that the conversion of DataFrames belongs upstream and not in ipython itself. I am wondering whether this should actually go in Pandas itself. The one thing that seems to be the main implementation issue (and the reason we need to do anything at all to convert from the Pandas DataFrame to the R data-frame) is that trings get represented in the DataFrame as 'object' datatype. Is there any particular reason for that, or could that be changed in Pandas? I haven't really dug much into the Pandas code-base. Could you point me in the general direction of where I should look for that kind of thing to try it out? 

Thanks! 

Wes McKinney

unread,
Oct 27, 2012, 11:15:54 AM10/27/12
to pyd...@googlegroups.com
> --
>
>

A main reason is to be able to support NA values in string columns,
and also to make working with strings more flexible w.r.t.
variable-length strings (the fixed-width string data type can be very
restrictive). I have some ideas about fixing this (that will also make
R<->Python easier). R implements string vectors with heap strings in a
global hash table and a special value for NAs. This actually isn't
very inherently different from what pandas does except there's all the
little PyObject boxes.

- Wes

Ariel Rokem

unread,
Nov 1, 2012, 4:14:56 PM11/1/12
to pyd...@googlegroups.com
Hi Wes,
OK - thanks. Most of that went over my head, but it sounds like
changing the 'object' designation of the strings is a non-starter.
Also sounds like you have a way forward. Is there anything I can do to
help?

Ariel
Reply all
Reply to author
Forward
0 new messages