how to get a summary of genotype file without using the GUI, only command lines ?

150 views
Skip to first unread message

Elise Albert

unread,
Sep 19, 2017, 9:14:47 AM9/19/17
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hello,

I am using Tassel 5 on a hpcc. I am allowed to submit jobs to the queue but not to run the GUI (or for a really short time with low memory allocation).

I sucessfully ran both step of the FILLIN procedure in order to impute my missing values in a matrix of 820 genotypes x 3 million SNPs.

# step 1
run_pipeline.pl -Xmx80g -FILLINFindHaplotypesPlugin -hmp snp_matrix.vcf -o snp_matrix_haplotypes

# step 2
run_pipeline.pl -Xmx80g -FILLINImputationPlugin -hmp snp_matrix.vcf  -d snp_matrix_haplotypes -o snp_matrix_imputed.hmp.txt.gz


1/ I got many outputs from these both steps. I do not really understand them.

Where can I find a description of the outputs given by FILLIN step 1 and 2 and their  meaning ? In particular the file which look like a map ? What is the meaning of the different columns ?

Besides, I am missing information about how was sucessfull this procedure ? How many data are still missing ? What is the new maf after imputation across my SNPs matrix ? What is the way to get these informations, without running the GUI ?

I tried as follow :

run_pipeline.pl -Xmx80g -importGuess snp_matrix_imputed.hmp.txt.gz -genotypeSummary overall,site 

with snp_matrix_imputed.hmp.txt.gz being the file I got after FILLIN step1 and 2

I got the result below concerning number of SNPs per chromsome. But nothing about maf, rate of remaining missing data etc...

#Genotype Table Name: snp_matrix
#Number of Taxa: 814
#Number of Sites: 3147262
#Sites x Taxa: 2561871268
#Chromosomes...
#1: start site: 0 (123) last site: 781595 (30427176) total: 781596
#2: start site: 781596 (30566) last site: 1312598 (19697549) total: 531003
#3: start site: 1312599 (192) last site: 1924382 (23459627) total: 611784
#4: start site: 1924383 (1016) last site: 2437538 (18584896) total: 513156
#5: start site: 2437539 (53) last site: 3147261 (26975377) total: 709723

What is the appropriate command line to get something equivalent to the "geno summary" in the GUI as shown here https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/GenoSummary/GenoSummary, BUT using only command line interface ?

I hope you could help me with that.

Thank.

Best regards. 

Elise

Terry Casstevens

unread,
Sep 19, 2017, 10:11:20 AM9/19/17
to Tassel User Group
run_pipeline.pl -Xmx80g -importGuess snp_matrix_imputed.hmp.txt.gz
-genotypeSummary overall,site -export genotypeSummary

It will produce multiple files
> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tassel+un...@googlegroups.com.
> To post to this group, send email to tas...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tassel/1de311b3-a1e1-4a99-ab0e-4ad0116241ec%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages