Spread Multiple Columns

Brandon Hurr

unread,

Oct 30, 2014, 3:58:42 PM10/30/14

to manipulatr

I'm trying to spread/cast multiple columns using tidyr.

If I have a dataset like this:

asdf <- data.frame(Col1 = rep(letters[1:5], each=2), Col2 = rep(c("Day1", "Day7")), Col3 = runif(10), Col4 = runif(10), Col5 = runif(10))

#I want to end up with a final table that has names() like this

#Col1, Day1Col3, Day7Col3, Day1Col4, Day7Col4, Day1Col5, Day7Col5

#I can sort of do it over a single column, but it looks weird

spread(asdf, Col2, Col3)

#I'm looking for something like:

spread(asdf, Col2, 3:5) #doesn't work

#Error: Invalid column specification

Thanks,

Brandon

Brandon Hurr

unread,

Oct 30, 2014, 4:40:21 PM10/30/14

to manipulatr

I more or less got what I wanted by filtering and then joining. It'd be nice to be able to append names to the ends of the columns that have matching names though. x and y aren't descriptive enough.

Day1 <-

asdf %>%

filter(Col2 == "Day1") %>%

select(-Col2)

Day7 <-

asdf %>%

filter(Col2 == "Day7") %>%

select(-Col2)

asdfWide <-

inner_join(Day1, Day7, by="Col1")

Hadley Wickham

unread,

Oct 30, 2014, 4:51:31 PM10/30/14

to Brandon Hurr, manipulatr

You need to create a variable to spread by:

asdf %>%
gather(key, val, Col3:Col5) %>%
unite(key2, Col2, key, sep = "") %>%
spread(key2, val)

Hadley

> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.

--
http://had.co.nz/

Brandon Hurr

unread,

Oct 30, 2014, 5:36:02 PM10/30/14

to Hadley Wickham, manipulatr

Indeed. I saw an answer here, but just didn't get it.

https://stackoverflow.com/questions/24929954/is-it-possible-to-use-spread-on-multiple-columns-in-tidyr-similar-to-dcast

I see that you have to gather it all up and then spread it, whilst creating a new column with a combined name that will become the new column header. I guess I thought since I had my data in "long"-ish form already I didn't need to melt/gather it more. :/

Thanks,

B

Joshua Adams

unread,

Jul 9, 2017, 6:10:34 PM7/9/17

to manipulatr, brando...@gmail.com

I have been working on this type of question for several hours now. I am relatively new to data science. I am unable to figure this out even with the answer below. Would you be kind enough to walk through an example spreading multiple columns across a single key? Preferably adding the name of the spread column to the key. I have done the spread() exercises from your book "R for Data Science" with gather() but I cannot figure out this solution.

example current data frame columns:

Week (1, 2, 3, ...)
(key) product (UPCA, UPCB, UPCB, ....)
weekly sales $
weekly units
weekly features

desired output spreading columns:

Week (1, 2, 3, ....)
Weekly Sales UPCA in $
Weekly Sales UPCB in $
Weekly Sales UPCC in $
....
Weekly Units UPCA
Weekly Units UPCB
Weekly Units UPCC
....
Weekly Features UPCA
....

Thank you,

Brandon Hurr

unread,

Jul 9, 2017, 6:47:23 PM7/9/17

to Joshua Adams, manipulatr

Josh,

Could you provide a fake dataset or a small chunk of your working dataset using dput()

I can try and look at it later if you've not figured it out.

B

Joshua Adams

unread,

Jul 9, 2017, 7:20:22 PM7/9/17

to manipulatr, ada...@gmail.com

Sure, see below sample current data set. I would like to spread() this data set out with ITM_SKU_CD as key, but across all of the values (weekly dollars, weekly units, etc....). I can do with spread() with one key and one value, but I cannot figure out how to do it with one key across multiple values.

For example, I would like to see columns weekly_sales_xxxx, weekly_sales_zzzz...., weekly_units_xxxx, weekly_units_zzzz....

WEEK_NAME	ITM_SKU_CD	ITM_SKU_DESC	weekly_dollars	weekly_units	weekly_features	weekly_displays	weekly_distribution	size_in_oz	avg_unit_price	avg_price_per_oz

2015 PD 06 WK 2 (22)	xxxx	product xxx	2502.39	600	0	2	99	12	4.17065	0.3475542
2015 PD 06 WK 3 (23)	xxxx	product xxx	2713.8	650	0	0	100	12	4.175077	0.3479231
2015 PD 06 WK 4 (24)	zzzz	product zzz	2392.6	570	0	1	100	12	4.197544	0.3497953
2015 PD 06 WK 2 (22)	zzzz	product zzz	1516.59	314	0	18	86	16	4.829904	0.301869
2015 PD 06 WK 3 (23)	zzzz	product zzz	1665.97	359	0	9	90	16	4.640585	0.2900366
2015 PD 06 WK 4 (24)	yyyy	product yyy	1256.29	237	0	3	92	16	5.300802	0.3313001
2015 PD 06 WK 2 (22)	yyyy	product yyy	5006.76	1012	0	25	100	16	4.947391	0.309212
2015 PD 06 WK 3 (23)	yyyy	product yyy	5319.57	1066	0	10	101	16	4.990216	0.3118885

Joshua Adams

unread,

Jul 9, 2017, 7:53:57 PM7/9/17

to manipulatr, ada...@gmail.com

Brandon,

This link seems to have to answer I'm looking for but I can't get a good understanding of how to use the syntax for my data set. I'd like to use the tidyverse package if possible.

https://stackoverflow.com/questions/26019915/how-to-spread-or-cast-multiple-values-in-r

On Sunday, July 9, 2017 at 6:47:23 PM UTC-4, Brandon Hurr wrote:

Brandon Hurr

unread,

Jul 10, 2017, 10:53:40 AM7/10/17

to Joshua Adams, manipulatr

Josh,

Does this do what you want?

library(tidyverse)

df1 <- read_tsv("~/Desktop/tidyrhelp.tsv")

df1 %>%

gather(thing, value, -WEEK_NAME, -ITM_SKU_CD) %>%

unite(united, ITM_SKU_CD, thing) %>%

spread(united, value)

HTH,
B

To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+unsubscribe@googlegroups.com.

To post to this group, send email to manip...@googlegroups.com.

Visit this group at https://groups.google.com/group/manipulatr.

tidyrhelp.tsv

Reply all

Reply to author

Forward