data.frame with columns of lists

10 views
Skip to first unread message

Michael

unread,
Aug 10, 2015, 7:15:14 PM8/10/15
to Davis R Users' Group

Hi all,

A few times recently, I’ve converted a matrix from sapply to a dataframe, and ended up with a dataframe where each column is a vector of lists. Why is this happening? Here is my most recent example:

> library(ggmap)
> workshops = paste(c("Yolo", "Merced", "Marin", "Tulare", "San Diego", "Siskiyou", "Imperial"), "County")
> wsLocs = sapply(workshops, geocode)
> locs = as.data.frame(t(wsLocs))
> str(locs)
'data.frame':    7 obs. of  2 variables:
 $ lon:List of 7
  ..$ Yolo County     : num -122
  ..$ Merced County   : num -121
  ..$ Marin County    : num -123
  ..$ Tulare County   : num -119
  ..$ San Diego County: num -117
  ..$ Siskiyou County : num -123
  ..$ Imperial County : num -115
 $ lat:List of 7
  ..$ Yolo County     : num 38.8
  ..$ Merced County   : num 37.2
  ..$ Marin County    : num 38.1
  ..$ Tulare County   : num 36.1
  ..$ San Diego County: num 32.7
  ..$ Siskiyou County : num 41.8
  ..$ Imperial County : num 33

Vince S. Buffalo

unread,
Aug 10, 2015, 7:32:19 PM8/10/15
to davi...@googlegroups.com
Hi Michael,

Dataframes are just R lists with class "data.frame", so there's no restriction* what these lists can contain (for better or worse). 

This can lead to weird ragged arrays in dataframes — I recall seeing the example below somewhere. Normally as.data.frame() or data.frame() try to separate out the columns into dataframe columns but I() protects this (see the help page):

> d <- data.frame(x=1:4, y=I(list(1, 1:2, 1:3, 1:4)))
> d
  x          y
1 1          1
2 2       1, 2
3 3    1, 2, 3
4 4 1, 2, 3, 4
> d$y
[[1]]
[1] 1

[[2]]
[1] 1 2

[[3]]
[1] 1 2 3

[[4]]
[1] 1 2 3 4

Or oddly, matrices too:

> d <- data.frame(x=1:5, y=I(matrix(rnorm(10), ncol=2)))
> d
  x          y.1          y.2
1 1 -0.94178.... 0.350709....
2 2 1.849225.... -0.32549....
3 3 -1.22864.... 0.215633....
4 4 -0.14632.... -1.15937....
5 5 -0.37591.... 1.126598....
> colnames(d) # *only two columns!*
[1] "x" "y"
> d$y[,1]
[1] -0.9417866  1.8492254 -1.2286453 -0.1463300 -0.3759147

R even lets you put a list containing a function in a dataframe, but their print methods aren't working:

> d <- data.frame(x=1, y=I(list(function(x) pi*x)))
> d
Error in paste(x, collapse = ", ") :
  cannot coerce type 'closure' to vector of type 'character'

But by some miracle, it works:

> d$y[[1]](2)
[1] 6.283185

Ok, enough oddities... here's what I would recommend:

> d <- do.call(rbind, wslocs)
> d
        lon      lat
1 -121.9018 38.76460
2 -120.7120 37.20098
3 -122.7633 38.08340
4 -118.8597 36.13417
5 -117.1611 32.71573
6 -122.5770 41.77433
7 -115.4734 33.01137
> class(d)
[1] "data.frame"

Then you can tack on other columns (e.g. the county names). I do this crufty stuff in local() to keep things clean.

I love do.call() with rbind() and lapply(). It's a go-to idiom every R user should master (Hadley and others call it the split-apply-combine pattern — there's also a section on it in my book's R chapter). do.call() handles lists intelligently. 

Anyways, hope this helps.

Vince

*I think.


--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at http://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.



--
Vince Buffalo
@vsbuffalo :: vincebuffalo.com
Coop Lab :: Population Biology Graduate Group
University of California, Davis

Michael Levy

unread,
Aug 10, 2015, 7:48:41 PM8/10/15
to davi...@googlegroups.com

Thanks, Vince. do.call(rbind, lapply()) is a fine approach, but I’m still not sure why this is happening. wsLocs looks like an ordinary numeric matrix, why does as.data.frame make a list of lists rather than a list of numeric vectors?


-- 
Michael Levy
c: 304-376-4523

Vince S. Buffalo

unread,
Aug 10, 2015, 7:57:18 PM8/10/15
to davi...@googlegroups.com
That's because your geocode() function returns a dataframe:

> tmp <- geocode('Yolo')
> class(tmp)
[1] "data.frame"

when you transpose it:

> str(t(tmp))
 num [1:2, 1] -121.8 38.7
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "lon" "lat"
  ..$ : NULL
> class(t(tmp))
[1] "matrix"

it makes a matrix of lists (so it's not an ordinary matrix), which is also possible (see below):

> matrix(list(list(2), list(1)), nrow=2)
     [,1]
[1,] List,1
[2,] List,1

I'd say t() and coercion is to blame here. This is why rbinding lists is better than transposing.

HTH,
V

Michael Levy

unread,
Aug 10, 2015, 8:06:34 PM8/10/15
to davi...@googlegroups.com

I see, thanks. I’m not in the habit of using str on matrices… I checked its class (matrix) and the class of each element in it (numeric) but missed that each element was a list. Also to blame: the geocode function is vectorized, so I shouldn’t have been using sapply on it in the first place. Thanks for the explanations,

Michael


-- 
Michael Levy
c: 304-376-4523


Jaime Ashander

unread,
Aug 11, 2015, 1:56:54 AM8/11/15
to davi...@googlegroups.com
I recently came across this SO answer on "Why is `vapply` safer than `sapply`?" that might add something here:

http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply


- Jaime
Reply all
Reply to author
Forward
0 new messages