Description of bitwise.dist function

286 views
Skip to first unread message

sarahj...@gmail.com

unread,
Dec 7, 2016, 11:19:55 PM12/7/16
to poppr
Hi all,

I would like to use the bitwise.dist function in poppr to calculate a genetic distance matrix from SNP data for AMOVA in pegas. I have been using the function with differences_only = TRUE, which I assumed to be correct for binary SNP data?
However I am a little unclear on how this matrix is calculated. I read in the Frontiers paper that it "calculates the fraction of different sites between samples equivalent to Provesti's distance, counting missing data as equivalent in comparison (Prevosti et al., )." (Otherwise known as the Hamming distance I assume from the poppr documentation).
However, not being a mathematician or a statistician, I was hoping someone could provide me with a simple explanation. Also, how does this calculation differ when I set differences_only = TRUE?

Many thanks!

Sarah

Zhian Kamvar

unread,
Dec 7, 2016, 11:43:15 PM12/7/16
to sarahj...@gmail.com, poppr
Hi Sarah,

This is a really good question. The bitwise.dist() function will calculate the fraction of alleles different between samples. Setting differences_only = TRUE will tell the function to calculate the fraction of genotypes different between the samples, making the distance calculation more coarse. Often, it is most appropriate to set differences_only = FALSE

For example, let's say you have four samples genotyped at one locus:

A/A
A/T
T/T
A/A

Your data is represented as the number of minor alleles observed:

0
1
2
0

The distance matrix calculated by bitwise.dist would look like this:

0.5        
1.0 0.5    
0.0 0.5 1.0

Looking at the first column, this is telling you that, when you compare samples 1 and 2, half of the alleles are different. When comparing samples 1 and 3, all of the alleles are different, but samples 1 and 4 are identical.

If you set differences_only = TRUE, the distance matrix turns out to be:

1    
1 1  
0 1 1

Again, this is asking the question of whether or not the genotype is the same. You can think of setting differences_only = TRUE as increasing the distance because it removes granularity.

I hope that helps. feel free to ask if you have any further questions!

Best,
Zhian


--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/602bfbf3-7983-49fd-9efa-de8094f52a8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sarahj...@gmail.com

unread,
Dec 8, 2016, 12:19:04 AM12/8/16
to poppr
Awesome, thank you Zhian, that explains everything! I think I should now redo my analysis with differences_only = FALSE :)

Sarah

Zhian Kamvar

unread,
Dec 8, 2016, 12:25:33 AM12/8/16
to sarahj...@gmail.com, poppr
I'll make sure to include this description in the next release. Documentation is always a bit of trial and error. :)

Zhian


-- 
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages