Hi YK,
Thanks for your kind words and thoughtful question.
You're absolutely right that nucleotide diversity (pi) is a valuable metric, particularly because it can be compared consistently across studies and species.
However, calculating pi accurately requires data that includes both variant and invariant sites. This is typically possible with whole genome sequencing, but not with most reduced representation sequencing (RRS) methods like ddRAD or DArTseq, where only variable sequences/sites are usually retained and repetitive regions are often excluded. As a result, estimates of pi from such datasets would be biased unless the invariant sites are reconstructed or accounted for.
There is a way around this — for example, starting from the raw sequencing data and retaining invariant sites, similar to the approach used in pixy (see: ksamuk/pixy on GitHub). Because most users work with processed SNP datasets, we’ve held off on implementing pi in dartR to avoid misleading results.
That said, we’re currently working on a method to estimate the number of invariant sites from RRS datasets, which would allow for more robust pi calculations — and we’re hoping to have something ready by next week!
Cheers,
Luis
--
You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/QuHKLEN6l-E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/6a61f9a0-eb02-4d98-9829-a869d9d0a969n%40googlegroups.com.