Spread Multiple Columns

5,129 views
Skip to first unread message

Brandon Hurr

unread,
Oct 30, 2014, 3:58:42 PM10/30/14
to manipulatr
I'm trying to spread/cast multiple columns using tidyr. 

If I have a dataset like this:

asdf <- data.frame(Col1 = rep(letters[1:5], each=2), Col2 = rep(c("Day1", "Day7")), Col3 = runif(10), Col4 = runif(10), Col5 = runif(10))

#I want to end up with a final table that has names() like this

#Col1, Day1Col3, Day7Col3, Day1Col4, Day7Col4, Day1Col5, Day7Col5

#I can sort of do it over a single column, but it looks weird

spread(asdf, Col2, Col3)

#I'm looking for something like:
spread(asdf, Col2, 3:5) #doesn't work
#Error: Invalid column specification

Thanks,

Brandon

Brandon Hurr

unread,
Oct 30, 2014, 4:40:21 PM10/30/14
to manipulatr
I more or less got what I wanted by filtering and then joining. It'd be nice to be able to append names to the ends of the columns that have matching names though. x and y aren't descriptive enough. 

Day1 <- 
asdf %>%
filter(Col2 == "Day1") %>%
select(-Col2) 

Day7 <-
asdf %>%
filter(Col2 == "Day7") %>%
select(-Col2) 

asdfWide <-
inner_join(Day1, Day7, by="Col1")

Hadley Wickham

unread,
Oct 30, 2014, 4:51:31 PM10/30/14
to Brandon Hurr, manipulatr
You need to create a variable to spread by:

asdf %>%
gather(key, val, Col3:Col5) %>%
unite(key2, Col2, key, sep = "") %>%
spread(key2, val)

Hadley
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.



--
http://had.co.nz/

Brandon Hurr

unread,
Oct 30, 2014, 5:36:02 PM10/30/14
to Hadley Wickham, manipulatr
Indeed. I saw an answer here, but just didn't get it. 


I see that you have to gather it all up and then spread it, whilst creating a new column with a combined name that will become the new column header. I guess I thought since I had my data in "long"-ish form already I didn't need to melt/gather it more. :/ 

Thanks, 
B

 

Joshua Adams

unread,
Jul 9, 2017, 6:10:34 PM7/9/17
to manipulatr, brando...@gmail.com
I have been working on this type of question for several hours now. I am relatively new to data science. I am unable to figure this out even with the answer below. Would you be kind enough to walk through an example spreading multiple columns across a single key? Preferably adding the name of the spread column to the key. I have done the spread() exercises from your book "R for Data Science" with gather() but I cannot figure out this solution.

example current data frame columns:
  • Week (1, 2, 3, ...)
  • (key) product (UPCA, UPCB, UPCB, ....)
  • weekly sales $
  • weekly units
  • weekly features 

desired output spreading columns:
  • Week (1, 2, 3, ....)
  • Weekly Sales UPCA in $
  • Weekly Sales UPCB in $
  • Weekly Sales UPCC in $
  • ....
  • Weekly Units UPCA
  • Weekly Units UPCB
  • Weekly Units UPCC
  • ....
  • Weekly Features UPCA
  • ....

Thank you,

Brandon Hurr

unread,
Jul 9, 2017, 6:47:23 PM7/9/17
to Joshua Adams, manipulatr
Josh,

Could you provide a fake dataset or a small chunk of your working dataset using dput()

I can try and look at it later if you've not figured it out.

B

Joshua Adams

unread,
Jul 9, 2017, 7:20:22 PM7/9/17
to manipulatr, ada...@gmail.com
Sure, see below sample current data set. I would like to spread() this data set out with  ITM_SKU_CD as key, but across all of the values (weekly dollars, weekly units, etc....). I can do with spread() with one key and one value, but I cannot figure out how to do it with one key across multiple values. 

For example, I would like to see columns weekly_sales_xxxx, weekly_sales_zzzz...., weekly_units_xxxx, weekly_units_zzzz....

WEEK_NAME
ITM_SKU_CD ITM_SKU_DESC weekly_dollars weekly_units weekly_features weekly_displays weekly_distribution size_in_oz avg_unit_price avg_price_per_oz
2015 PD 06 WK 2 (22)
xxxx
product xxx
2502.39
600
0
2
99
12
4.17065
0.3475542
2015 PD 06 WK 3 (23)
xxxx
product xxx
2713.8
650
0
0
100
12
4.175077
0.3479231
2015 PD 06 WK 4 (24)
zzzz
product zzz
2392.6
570
0
1
100
12
4.197544
0.3497953
2015 PD 06 WK 2 (22)
zzzz
product zzz
1516.59
314
0
18
86
16
4.829904
0.301869
2015 PD 06 WK 3 (23)
zzzz
product zzz
1665.97
359
0
9
90
16
4.640585
0.2900366
2015 PD 06 WK 4 (24)
yyyy
product yyy
1256.29
237
0
3
92
16
5.300802
0.3313001
2015 PD 06 WK 2 (22)
yyyy
product yyy
5006.76
1012
0
25
100
16
4.947391
0.309212
2015 PD 06 WK 3 (23)
yyyy
product yyy
5319.57
1066
0
10
101
16
4.990216
0.3118885

Joshua Adams

unread,
Jul 9, 2017, 7:53:57 PM7/9/17
to manipulatr, ada...@gmail.com
Brandon, 

This link seems to have to answer I'm looking for but I can't get a good understanding of how to use the syntax for my data set. I'd like to use the tidyverse package if possible. 


On Sunday, July 9, 2017 at 6:47:23 PM UTC-4, Brandon Hurr wrote:

Brandon Hurr

unread,
Jul 10, 2017, 10:53:40 AM7/10/17
to Joshua Adams, manipulatr
Josh, 

Does this do what you want? 

library(tidyverse)

df1 <- read_tsv("~/Desktop/tidyrhelp.tsv")

df1 %>%
gather(thing, value, -WEEK_NAME, -ITM_SKU_CD) %>%
unite(united, ITM_SKU_CD, thing) %>%
spread(united, value)

HTH,
B


To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+unsubscribe@googlegroups.com.

To post to this group, send email to manip...@googlegroups.com.
tidyrhelp.tsv
Reply all
Reply to author
Forward
0 new messages