Hi Julian,
Thanks a lof for your answer.
What i want to do is to calculate Nucleotide diversity Pi for my data. I know that Stacks populations provides a value for that but unfortunately this is not enough for me, because i want to calculate Pi on my data sorted for specific parameters.
For example only sites which are present in at least 2 individuals, or only sites that are present in at least 50% of the individuals...
For this last option i can not use the -r option of Stacks because it sorts on the percent of individuals present in the populations for the site to be processed, and not on the final number of genotype/SNP called by the program, which means that most of the time i end up with less minimum individuals that i wanted to if i use this option, because for some individuals a statistically significant SNP call could not be made in one nucleotide position.
To solve that i started sorting my vcf output directly with my own criteria and developed a script to calculate the Pi from the VCF. But for that to work i also need to know the total number of sites (fixed + variants) i had initially (necessary for the Pi calculation). I have to be able to filter the total number of sites also for the same criteria as my SNP vcf file (ex at least 50% individuals present), to get the correct value of Pi for each set of parameters applied on the data.
That's the reason why a VCF output of all sites, not just SNPs would be very useful to me. I will check more the 'genomic' output, to see if there is a way to transform it into a VCF file, in which case my problem would be solved.
I hope i was clear enough.
Best
Julie