Are there reasons why rbind.fill would fail to grab unique columns from one dataset?

283 views
Skip to first unread message

Eric Green

unread,
Nov 21, 2012, 9:41:50 AM11/21/12
to manip...@googlegroups.com
Hi, 

Thanks for this great resource. I've solved many problems with your collective tips! 

Here is one I can't seem to figure out. I have several datasets with common and unique columns. I successfully combined df.A and df.B with rbind.fill (adds unique columns from df.A and df.B to df.AB). I then combined df.AB and df.C with the same function.

When I try to combine df.ABC with df.D, several columns are missing. If I change the order [e.g., from rbind.fill(df.ABC, df.D) to rbind.fill(df.D, df.ABC)], I get a different set of missing columns. Are there reasons why rbind.fill would fail to grab unique columns from one dataset?

Brandon Hurr

unread,
Nov 21, 2012, 9:59:44 AM11/21/12
to Eric Green, manip...@googlegroups.com
Do you have a reproducible example? 

Also, what options are you using in rbind.fill?
--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To view this discussion on the web visit https://groups.google.com/d/msg/manipulatr/-/m2sosKJyKeIJ.
To post to this group, send email to manip...@googlegroups.com.
To unsubscribe from this group, send email to manipulatr+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/manipulatr?hl=en.

Eric Green

unread,
Nov 21, 2012, 10:40:03 AM11/21/12
to manip...@googlegroups.com, Eric Green
Hi, 

This generic example I made from my code works perfectly. So I think the error is on my side with the data frames I am trying to bind. I just can't detect any problems. In my last step [the equivalent of ABCD <- rbind.fill(ABC,D)], I get a dataset D with 106 columns whether I say rbind.fill(ABC,D) or rbind.fill(D,ABC), but the order determines which columns are dropped.

I don't think my example will be all that helpful since it works, but you can get a better sense of my approach. Again, my problem is that the last step fails to bind all columns. I am not using any options with rbind.fill.

I am curious to know if there are known reasons why rbind.fill would fail to bind unique columns.



require(plyr)

path <- paste("YOURPATH","data",sep="/")

A <- read.csv(paste(path,"A.csv",sep="/")) # has columns A and B
B <- read.csv(paste(path,"B.csv",sep="/")) # has columns B and C
C <- read.csv(paste(path,"C.csv",sep="/")) # has columns C and D
D <- read.csv(paste(path,"D.csv",sep="/")) # has columns D and E

# bind
AB <- rbind.fill(A,B)
ABC <- rbind.fill(AB,C)
ABCD <- rbind.fill(ABC,D)

sessionInfo()

# R version 2.14.1 (2011-12-22)
# Platform: i686-pc-linux-gnu (32-bit)
# attached base packages:
#   [1] grid      stats     graphics  grDevices utils     datasets  methods   base     
# other attached packages:
#   [1] GFusionTables_1.0 gregmisc_2.1.2    gplots_2.11.0     KernSmooth_2.23-7 caTools_1.13      gtools_2.7.0      gmodels_2.15.3    gdata_2.11.0     
# [9] plyr_1.7.1        car_2.0-12        nnet_7.3-1        MASS_7.3-16       lubridate_1.1.0   RCurl_1.91-1      bitops_1.0-4.1    sendmailR_1.1-1  
# [17] base64_1.1       
# loaded via a namespace (and not attached):
#   [1] stringr_0.6  tools_2.14.1
To unsubscribe from this group, send email to manipulatr+unsubscribe@googlegroups.com.
D.csv
C.csv
B.csv
A.csv

Hadley Wickham

unread,
Nov 21, 2012, 5:00:45 PM11/21/12
to Eric Green, manip...@googlegroups.com
Works fine for me - I may have fixed it in the development version.
You can try it out with:

install.packages("devtools")
library(devtools)
install_github("plyr")
>>> manipulatr+...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/manipulatr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/manipulatr/-/3LWCeDcb2nkJ.
>
> To post to this group, send email to manip...@googlegroups.com.
> To unsubscribe from this group, send email to
> manipulatr+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/manipulatr?hl=en.



--
RStudio / Rice University
http://had.co.nz/

Eric Green

unread,
Nov 22, 2012, 8:08:04 AM11/22/12
to manip...@googlegroups.com, Eric Green
Thanks for looking into this. It turns out that RStudio was not showing all of the columns in my dataframe. So when I got an error trying to use a vector of columns that should have existed in the combined dataframe, I viewed the dataframe in RStudio and noticed that I was "missing" several variables at the end of the dataframe. At this point it looked to me like rbind.fill was not working for the last step. I thought there was something about my variables that could be causing them to be dropped.

@Hadley, your note suggested to me that I must be doing something wrong. I exported the mega dataset to csv, confirmed that the "missing" variables were indeed there, and then searched my vector of column names until I found a problematic name that had nothing to do with rbind.fill. I was not really "missing" any variables; RStudio just was not showing them, which led me to think rbind.fill was failing.

I don't know if there is a way to automatically view all columns in RStudio. This ticket from 2011 suggests not.

Thanks again.

Hadley Wickham

unread,
Nov 22, 2012, 8:28:28 AM11/22/12
to Eric Green, manip...@googlegroups.com
> I don't know if there is a way to automatically view all columns in RStudio.
> This ticket from 2011 suggests not.

I'd recommend filing another ticket - the more people who report a
problem, the higher it moves up the priority list.

Hadley

Eric Green

unread,
Nov 22, 2012, 8:55:18 AM11/22/12
to manip...@googlegroups.com, Eric Green
Reply all
Reply to author
Forward
0 new messages