Ind = c("SNP1","SNP1","SNP2","SNP2","SNP3")
SNP1 = c("AA","TT","AT","TT","TT")
SNP2 = c("GC","GG","GG","CC","GC")
SNP3 = c("GG","GC","GG","CC","GC")
df = data.frame(Ind,SNP1,SNP2,SNP3)
df_filt= distinct(df, Ind, .keep_all = TRUE)
df_filt
> df_filt
Ind SNP1 SNP2 SNP3
1 SNP1 AA GC GG
2 SNP2 AT GG GG
3 SNP3 TT GC GC
(Dummy data loaded in a bit weird, because there's an intermediate step in there, where I only wanted the unique "SNP" rows)
So now, I'd like to split the AA's, GC's, etc into separate columns. I manage to do this with separate() in one column:
> separate(data = df_filt, col = SNP1, into = c("SNP1.1","SNP1.2"), sep=c(1))
Ind SNP1.1 SNP1.2 SNP2 SNP3
1 SNP1 A A GC GG
2 SNP2 A T GG GG
3 SNP3 T T GC GC
But I can't seem to figure out a way to automate this across many (eventually thousands) of columns at once. Separate does not allow you to call a vector of column names, for instance.
Any thoughts?
Thanks!
John
--