Hello,
I am using Tassel 5 on a hpcc. I am allowed to submit jobs to the queue but not to run the GUI (or for a really short time with low memory allocation).
I sucessfully ran both step of the FILLIN procedure in order to impute my missing values in a matrix of 820 genotypes x 3 million SNPs.
# step 1
run_pipeline.pl -Xmx80g -FILLINFindHaplotypesPlugin -hmp snp_matrix.vcf -o snp_matrix_haplotypes
# step 2
run_pipeline.pl -Xmx80g -FILLINImputationPlugin -hmp snp_matrix.vcf -d snp_matrix_haplotypes -o snp_matrix_imputed.hmp.txt.gz
1/ I got many outputs from these both steps. I do not really understand them.
Where can I find a description of the outputs given by FILLIN step 1 and 2 and their meaning ? In particular the file which look like a map ? What is the meaning of the different columns ?
Besides, I am missing information about how was sucessfull this procedure ? How many data are still missing ? What is the new maf after imputation across my SNPs matrix ? What is the way to get these informations, without running the GUI ?
I tried as follow :
run_pipeline.pl -Xmx80g -importGuess snp_matrix_imputed.hmp.txt.gz -genotypeSummary overall,site
with snp_matrix_imputed.hmp.txt.gz being the file I got after FILLIN step1 and 2
I got the result below concerning number of SNPs per chromsome. But nothing about maf, rate of remaining missing data etc...
#Genotype Table Name: snp_matrix
#Number of Taxa: 814
#Number of Sites: 3147262
#Sites x Taxa: 2561871268
#Chromosomes...
#1: start site: 0 (123) last site: 781595 (30427176) total: 781596
#2: start site: 781596 (30566) last site: 1312598 (19697549) total: 531003
#3: start site: 1312599 (192) last site: 1924382 (23459627) total: 611784
#4: start site: 1924383 (1016) last site: 2437538 (18584896) total: 513156
#5: start site: 2437539 (53) last site: 3147261 (26975377) total: 709723
I hope you could help me with that.
Thank.
Best regards.
Elise