Using separate() in tidyr on multiple columns?

921 views
Skip to first unread message

John Mola

unread,
Mar 1, 2017, 8:08:39 PM3/1/17
to davi...@googlegroups.com
Hi all,

Not a very good drug user, so please be gentle with me. 

Here is some dummy data:
Ind = c("SNP1","SNP1","SNP2","SNP2","SNP3")
SNP1 = c("AA","TT","AT","TT","TT")
SNP2 = c("GC","GG","GG","CC","GC")
SNP3 = c("GG","GC","GG","CC","GC")
df = data.frame(Ind,SNP1,SNP2,SNP3)
df_filt= distinct(df, Ind, .keep_all = TRUE)
df_filt

> df_filt
   Ind SNP1 SNP2 SNP3
1 SNP1   AA   GC   GG
2 SNP2   AT   GG   GG
3 SNP3   TT   GC   GC

(Dummy data loaded in a bit weird, because there's an intermediate step in there, where I only wanted the unique "SNP" rows)

So now, I'd like to split the AA's, GC's, etc into separate columns. I manage to do this with separate() in one column:

> separate(data = df_filt, col = SNP1, into = c("SNP1.1","SNP1.2"), sep=c(1))
   Ind SNP1.1 SNP1.2 SNP2 SNP3
1 SNP1      A      A   GC   GG
2 SNP2      A      T   GG   GG
3 SNP3      T      T   GC   GC

But I can't seem to figure out a way to automate this across many (eventually thousands) of columns at once. Separate does not allow you to call a vector of column names, for instance.

Any thoughts?

Thanks!

John

--

John Mola

unread,
Mar 1, 2017, 8:12:08 PM3/1/17
to davi...@googlegroups.com
Oh yeah. Apologies for the column/row names matching. I was messing with my messing with data. 

Here:

> Ind = c("Bee1","Bee1","Bee2","Bee2","Bee3")
> SNP1 = c("AA","TT","AT","TT","TT")
> SNP2 = c("GC","GG","GG","CC","GC")
> SNP3 = c("GG","GC","GG","CC","GC")
> df = data.frame(Ind,SNP1,SNP2,SNP3)
> df_filt= distinct(df, Ind, .keep_all = TRUE)
> df_filt
   Ind SNP1 SNP2 SNP3
1 Bee1   AA   GC   GG
2 Bee2   AT   GG   GG
3 Bee3   TT   GC   GC
> separate(data = df_filt, col = SNP1, into = c("SNP1.1","SNP1.2"), sep=c(1))
   Ind SNP1.1 SNP1.2 SNP2 SNP3
1 Bee1      A      A   GC   GG
2 Bee2      A      T   GG   GG
3 Bee3      T      T   GC   GC

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Vince S. Buffalo

unread,
Mar 1, 2017, 8:22:52 PM3/1/17
to davi...@googlegroups.com
So a few things —

First I'd use a tibble:

df <- tibble(Ind,SNP1,SNP2,SNP3)
df_filt <- distinct(df, Ind, .keep_all = TRUE)

and since you don't have a separator, you can't use spread() — you need to use extract() with a grouping regular expression. This is a bit permissive, but works:

df_filt %>% gather(snp, value, -Ind) %>% extract(value, into=c('h1', 'h2'), '(.)(.)')

# A tibble: 9 × 4
    Ind   snp    h1    h2
* <chr> <chr> <chr> <chr>
1  Bee1  SNP1     A     A
2  Bee2  SNP1     A     T
3  Bee3  SNP1     T     T
4  Bee1  SNP2     G     C
5  Bee2  SNP2     G     G
6  Bee3  SNP2     G     C
7  Bee1  SNP3     G     G
8  Bee2  SNP3     G     G
9  Bee3  SNP3     G     C

Note how I use gather here to gather all SNP columns. It's easier to apply this operation to long data and then recast to wide data using spread(). 

HTH,
Vince
Vince Buffalo
@vsbuffalo :: vincebuffalo.com
Coop Lab :: Population Biology Graduate Group
University of California, Davis

Jaime Ashander

unread,
Mar 1, 2017, 8:55:52 PM3/1/17
to davi...@googlegroups.com
As mentioned, the key thing is using gather to make the data long. You could still use
separate in place of Vince's extract, you'd just need to use the new column
name you passed to gather (value in this case) so this would work too:

df_filt %>% gather(snp, value, -Ind) %>% separate(col = value, into = c("SNP1.1","SNP1.2"), sep=1)

(when numeric, sep is interpreted as a position in the string)

John Mola

unread,
Mar 1, 2017, 9:39:58 PM3/1/17
to davi...@googlegroups.com
By god. It works. Thank you very much y'all!

Cheers,

John
Reply all
Reply to author
Forward
0 new messages