reformatting Plink files for Tassel ?

686 views
Skip to first unread message

James Cricket

unread,
Oct 15, 2012, 9:21:35 AM10/15/12
to tas...@googlegroups.com
The Tassel manual states that the ped file should be from a plink ped file except that of the non-genoptye columns, only Individual ID is required.

"TASSEL only requires that the Individual ID field be filled in."

Does this mean that the other non-genotype columns have to be removed ? If so, what is a good way of removing them from a GB-sized file?

I'm currently attempting to remove the unwanted columns using the cut command in unix but am not really getting anywhere.


Also, with the map file I notice:

 "TASSEL does not require the Genetic distance field to be filled in". Does this mean that this column has to be removed or is it optional ?

Jon Zhang

unread,
Oct 15, 2012, 10:49:48 AM10/15/12
to tas...@googlegroups.com
Hi James,

I hope you haven't spent too much time trying to remove the columns. No you do not have to remove them. In fact I believe the file will fail to load into Tassel if you do. Tassel acknowledges the existence of these columns it just doesn't use the information, if any, that is contained in these columns. Note if you export the data from Tassel after you import from the ped file, the information in those columns will be lost.

Jon

James Cricket

unread,
Oct 15, 2012, 11:31:03 AM10/15/12
to tas...@googlegroups.com
Cheers Jon, I just worked out how to remove the columns :)

I notice now as well that the ped file must be tab-delimited which can be generated in Plink using --tab

From Plink "It is sometimes useful to have a PED file that is tab-delimited, except that between alleles of the same genotype a space instead of a tab is used. A file formatted in this way can load into Excel, for example, as a tab-delimited file, but with one genotype per column instead of one allele per column. Use the option --tab as well as --recode or --recode12 to achieve this effect."

./plink --bfile mydata --recode --tab --out mydata_tab_151012

If only I had the patience to read things properly :)

James Cricket

unread,
Oct 15, 2012, 2:37:28 PM10/15/12
to tas...@googlegroups.com
Actually Jon, my last script didn't work either.

I notice the genotypes are still only whitespace-separated in my ped file. I'm assuming Tassel doesn't like that so was going to go back to plink and try a different output setting or try to replace whitespace with tabs in unix. The genotypes are in ACGT format - I can't see anywhere in the manual suggesting that this isn't ok. I'm assuming the ped file shouldn't be transposed.

Any suggestions? It seems like a fairly basic step, or am I doing something wrong from start.

Jon Zhang

unread,
Oct 15, 2012, 3:55:20 PM10/15/12
to tas...@googlegroups.com
Hi James, can you run me through what you are trying to do really quickly? I'm a bit confused as to where the problem is. If you need an example of what Tassel expects from a ped file, the Tassel tutorial dataset has an example file. With regard to Tassel not liking whitespaces. I haven't worked on Plink import in Tassel for a while and I don't have the code in front of me so I can't say for certain, however I can say that tab delimited will definitely work. There's a chance its actually coded to delimit by any "empty space character", but now that I think about it this is unlikely.

Oh and I should say, I don't have any experience with using Plink, so unfortunately I can't help you export files from Plink.
Reply all
Reply to author
Forward
0 new messages