Variables in One Column, Values in Another

24 views
Skip to first unread message

Joseph Flanagan

unread,
Apr 18, 2015, 1:00:47 AM4/18/15
to manip...@googlegroups.com
As a result of scraping a web page, I wound up with a data frame where the variables were in one column, and their accompanying values are in another.  A simple version looks like this: 

    df <- data.frame(is_employed = c("Hobbies", "Has Previous Experience"), false = c("squash", "false"))

In order to get it in the form I wanted, I first converted to it to a matrix and then back to a data frame, like this: 

    mat <- as.matrix(df)
    mat <- rbind(colnames(mat), mat)
    colnames(mat) <- c("variable", "value")

    df2 <- as.data.frame(mat)

I can live with that, although I was wondering if there is a better method. Anyways, the problem occurs when I want to spread() the data. If I just call spread() on df2, it doesn't produce the desired result. Instead, I have to add a dummy column and then delete it at the end, like so: 
 
   library("deplyr")
    library("tidyr")

    df3 <- df2 %>%
       mutate(n = 1) %>%
       spread(variable, value)%>%
       select(-n)

Again, I can live with it, but I was wondering whether I missed something. Is it possible to call spread() on a two-column data frame?


Hadley Wickham

unread,
Apr 20, 2015, 5:37:45 PM4/20/15
to Joseph Flanagan, manipulatr
You should be able to use header = FALSE somewhere to get

df <- data.frame(
X1 = c("Is employed", "Hobbies", "Has Previous Experience"),
X2 = c("false", "squash", "false")
)

Then it's a fairly simple application of spread from tidyr:

tidyr::spread(df, X1, X2)

Unfortunately there's a small bug for this case, so you need to add a
dummy id variable:

df$id <- 1
tidyr::spread(df, X1, X2)

Hadley
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.



--
http://had.co.nz/

Joseph Flanagan

unread,
Apr 21, 2015, 1:21:37 AM4/21/15
to manip...@googlegroups.com, jflan...@gmail.com
Thanks. I'm not sure whether the header = FALSE option can work here, since the table is in a list of 7 other tables that do have headers. Eventually, these need to get combined into a large table, so I probably could just join the table earlier rather than creating a dummy ID variable. It was good to know that I was correct in needing a dummy ID variable for `spread()`. I thought I was doing something wrong. 

Thanks again,
Joe
Reply all
Reply to author
Forward
0 new messages