how are the average pi values considering all sites (variant and fixed) calculated?

348 views
Skip to first unread message

Carol

unread,
Feb 10, 2021, 11:18:21 AM2/10/21
to Stacks
Hi everyone,

I was checking the nucleotide diversity (pi) stats output in the file 'populations.sumstats_summary.tsv", where pi is discriminated as calculated from All positions (variant and fixed) and only variant positions.

I've figured that the pi output for each of my population when considering only variant sites, is the actual average of pi calculated of each marker per population found in the file "populations.sumstats.tsv". However, I'm unsure of how to interpret these pi values calculated on variant positions alone. 

On the other hand, pi values calculated for all positions has an easier interpretation (i.e. how many SNPs per base pair should I expect). wantke to plot these pi values for each of my populations, but I cannot figure from where I can get the pi estimates for each of my RADloci.

I would greatly appreciate it if someone can tell me from which file can I extract those pi values per RADlocus.

Thanks in advance, 

Carol

unread,
Jul 19, 2021, 5:37:41 AM7/19/21
to Stacks
Please someone help me with this issue! 
How can I plot nucleotide diversity considering "all positions" for each of my populations? 
The information in the "populations.sumstats.tsv" corresponds to the nucleotide diversity only considering variant sites. Where or how can I retrieve calculations of nucleotide diversity for "all positions". where can I check the source code of "populations" stacks module

Cheers,

CaffeSospeso

unread,
Jul 19, 2021, 5:53:05 AM7/19/21
to Stacks
Hi Carol,

If I'm not wrong, in the 'populations.sumstats_summary.tsv' file you should have pi values for both 'only variant sites' and 'All positions'.

Bests,

Gabriele

Carol

unread,
Jul 19, 2021, 6:23:49 AM7/19/21
to Stacks
Yes, I'm aware that 'populations.sumstats_summary.tsv' contains the nucleotide diversity mean and variance considering "all positions" but what I'm looking for is the estimate for each locus as in the file 'populations.sumstats.tsv", the only problem is that nucleotide diversity estimates in that file only contains information based on "variant sites"

If anyone knows how to get the nucleotide diversity based on "all positions" for each locus I'll greatly appreciate it.

Cheers, 

Julian Catchen

unread,
Jul 19, 2021, 4:28:24 PM7/19/21
to stacks...@googlegroups.com, Carol
As you note, the populations program does this calculation for you
across all sites in the dataset. If you only want to include certain
loci, you could use a whitelist to populations, then the calculation
will only include those specific loci.

If you want a per-locus calculation of pi, you can do it by hand. Pi
will be 0 for the fixed sites at a locus. So, for each population at
each locus, you can take the measure of pi for each variant site (in the
sumstats file) and then add in X 0s, to calculate an average for the
locus, where X is the remaining sites at that locus.

So, if you have 145bp-long loci, and a locus has two SNPs, add your two
values of pi together, then divide by 145 to get an average for the locus.

julian


Carol wrote on 7/19/21 5:23 AM:
> discriminated as calculated from *All positions *(variant
> and fixed) and only *variant positions*.
>
> I've figured that the pi output for each of my population
> when considering *only variant sites*, is the actual average
> of pi calculated of each marker per population found in the
> file "populations.sumstats.tsv". However, I'm unsure of how
> to interpret these pi values calculated on variant positions
> alone.
>
> On the other hand, pi values calculated for*all positions
> *has an easier interpretation (i.e. how many SNPs per base
> pair should I expect). wantke to plot these pi values for
> each of my populations, but I cannot figure from where I can
> get the pi estimates for each of my RADloci.
>
> I would greatly appreciate it if someone can tell me from
> which file can I extract those pi values per RADlocus.
>
> Thanks in advance,
>
> --
> Stacks website: http://catchenlab.life.illinois.edu/stacks/
> ---
> You received this message because you are subscribed to the Google
> Groups "Stacks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to stacks-users...@googlegroups.com
> <mailto:stacks-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/stacks-users/7b288d74-c982-4e8a-ab6d-49389899bc20n%40googlegroups.com
> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/stacks-users/7b288d74-c982-4e8a-ab6d-49389899bc20n*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ%21%21DZ3fjg%21qR3YtR5fWwjSFCu-ubyHIWYfB5Q8kj2oU0KRxRrexoV5SEGVxh2v8FXtZXaSVuPbVwg$>.


--
Julian M Catchen, Ph.D.
Assistant Professor
Department of Evolution, Ecology, and Behavior
Carl R. Woese Institute for Genomic Biology
University of Illinois, Urbana-Champaign
--
jcat...@illinois.edu; @jcatchen

Carol

unread,
Jul 23, 2021, 5:55:28 AM7/23/21
to Stacks
Thanks a lot for your reply dear Julian,

I proceed as you instructed and in fact managed to calculate the nucleotide diversity for each RADlocus. It was nice to see that the average from my newly calculated values was consistent with the values reported in the sumstats_summary.tsv file

Cheers,

Reply all
Reply to author
Forward
0 new messages