Converting genotype data

193 views
Skip to first unread message

bleckie

unread,
Jul 23, 2020, 1:52:59 PM7/23/20
to Core Hunter Users
All,

I'm having some trouble converting my Tassel output data to a usable format in corehunter. I have had success with using the distance matrix but I would like to try using a genotype file. Any tips on how to easily convert between applications?

Thank you,

Brian

daveneti

unread,
Jul 24, 2020, 1:12:34 AM7/24/20
to Core Hunter Users
Hi Brian,

Can you give me an example of the Tassel output data? Just the first few columns and rows is enough, then I can advise you better.

Guy

bleckie

unread,
Jul 24, 2020, 9:24:58 AM7/24/20
to Core Hunter Users
Hi Guy,

Tassel will output in a variety of formats including Hapmap, Hapmap Diploid, HDF5, VCF, and some others. After stringent filtering, I am left with ~750 taxa and 17000 SNPs. 
Hapmap output looks like this:
rs# alleles chrom pos strand assembly# center protLSID assayLSID panelLSID QCcode 160696 160944 161113
S01_18244 A/G 1 18244 + NA NA NA NA NA NA A A N
S01_54667 C/T 1 54667 + NA NA NA NA NA NA C C C
S01_62404 T/G 1 62404 + NA NA NA NA NA NA G T G

Diploid hapmap looks like this:
rs# alleles chrom pos strand assembly# center protLSID assayLSID panelLSID QCcode 160696 160944 161113
S01_18244 A/G 1 18244 + NA NA NA NA NA NA AA AA NN
S01_54667 C/T 1 54667 + NA NA NA NA NA NA CC CC CC
S01_62404 T/G 1 62404 + NA NA NA NA NA NA GG TT GG

Any help would be appreciated.

Thanks,

Brian

Ira Herniter

unread,
Apr 28, 2025, 1:51:36 PMApr 28
to Core Hunter Users
Hi, 
I'm just starting to work with CoreHunter3, and I have the same question. Is there an easy way to convert VCF or hapmap data for input into the CoreHunter3 pipeline?
Thanks,
Ira

Ira Herniter

unread,
Apr 29, 2025, 9:44:18 AMApr 29
to Core Hunter Users
Hi all, I figured this one out. The easiest way to format for input into corehunter is to use VCF tools and then PLINK:

  1. vcftools --vcf [input VCF] --out [output prefix] --plink
  2. /programs/plink-1.9-x86_64-beta7/plink --file [prefix from VCFtools output] --recode A --out [output prefix] 
Then I open the file in Excel and use the Rows-to-Columns function, using a space as the delimiter, remove the extraneous columns, and save as a CSV.
The csv pops right into Corehunter.
-Ira

Herman De Beukelaer

unread,
Apr 29, 2025, 10:26:23 AMApr 29
to Core Hunter Users
Hi Ira,

Thanks for sharing your approach, it might be very useful for others who want to use Core Hunter on VCF data. Great!

Herman

Op dinsdag 29 april 2025 om 15:44:18 UTC+2 schreef iher...@gmail.com:

Ira Herniter

unread,
Apr 29, 2025, 10:50:32 AMApr 29
to Core Hunter Users
One thing I forgot, while removing columns, etc in the last step, you also need to remove all the missing data. 
Replace "NA" with nothing.
-Ira

Reply all
Reply to author
Forward
0 new messages