Best genetic distance calculation methods

542 views
Skip to first unread message

genetist

unread,
Feb 14, 2018, 1:01:50 AM2/14/18
to poppr
Hi to all
Good Morning

I have SNP genotyping data of 386 markers on diploid 40 female parents, 40 male parents  and their 40 offspring's. I have missing data in that  and now I want to calculate genetic related ness between these parents and offspring's based on SNP genotyping data. I gone through information present at


 but I am confused whether to use rogers.dist or bitewise.dist? I decided myself to use bitewise.dist because I have missing data and bitewise.dist can handle it where as rogers.dist is not. Can any one share your expertise on this whether I selected right method or not?
Thanks in advance
Regards,

Zhian Kamvar

unread,
Feb 15, 2018, 10:09:20 AM2/15/18
to genetist, poppr
Hello,

rogers.dist is basically a scaled Euclidean distance (Equation 8 of Rogers 1972) whereas diss.dist/prevosti.dist/bitwise.dist* returns the fraction of dissimilar sites, but is not euclidean. The advantage of measuring the fraction of dissimilar sites is that the results are easily interpretable, but at the cost of applicability to more geometric analyses.

Rogers, J S. 1972. Measures of genetic similarity and genetic distances.
Pages 145–153 of: Studies in Genetics. University of Texas Publishers.

Hope that helps,
Zhian


*This comes up a lot, but bitwise.dist is no different than diss.dist or Provosti's distance. The purpose of its design is simply to handle genlight objects more efficiently, but will have no qualitative difference than these other distances.

-----
Zhian N. Kamvar, Ph. D.
Postdoctoral Researcher (Everhart Lab)
Department of Plant Pathology
University of Nebraska-Lincoln
ORCID: 0000-0003-1458-7108
> --
> You received this message because you are subscribed to the Google Groups "poppr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
> To post to this group, send email to po...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/da517cd5-d21c-4b69-b361-ab184a173782%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

signature.asc

Shailee Shah

unread,
Apr 17, 2020, 9:59:18 PM4/17/20
to po...@googlegroups.com
Hello!

I have a related question.

I have a dataset of 237 individuals and am trying to run a mantel test on genetic ~ geographic distance. I have a SNP dataset of > 6,000 SNPs, with missing data, for all the individuals. I'm trying to figure out what method to use to calculate genetic distances.

I used stats::dist() as well as poppr::diss.dist() and compared.

I understand that the two methods are different – dist() is Euclidean distance while diss.dist() is not. However, I am struggling to understand why they are giving me completely opposite results.

Here is my code and my results from mantel tests.

The geographic distance matrix is a Euclidean distance matrix calculated using lat-long of sampling sites.

# with dist() #################################
gen
.dists <- dist(allsamples.gen, method= "euclidean", diag = FALSE, upper = FALSE, p=2)

# with diss.dist() #################################

gen
.dists.poppr <- diss.dist(allsamples.gen)

# check if they are correlated

mantel
(gen.dists ~ gen.dists.poppr)

#mantelr      pval1      pval2      pval3  llim.2.5% ulim.97.5%
# 0.6453154  0.0010000  1.0000000  0.0010000  0.6246595  0.6667839

# geo.dists.full = geographical distance matrix

mantel
(gen.dists.poppr ~ geo.dists.full, nperm = 10000)

#    mantelr       pval1       pval2       pval3   llim.2.5%  ulim.97.5%
# -0.10971653  0.99510000  0.00500000  0.00670000 -0.13648637 -0.08876814

mantel
(gen.dists ~ geo.dists.full, nperm = 10000)

#   mantelr      pval1      pval2      pval3  llim.2.5% ulim.97.5%
#0.10018077 0.00050000 0.99960000 0.00320000 0.07750995 0.12707172

I also plotted the two genetic matrices against each other as well as against the same distance matrix. It looks like diss.dist() overestimates the genetic distance for individuals that were sampled geographically close to each other, which is what is driving this trend of opposing mantel's r results... 

dist.matrices.plot.png

image (1).png

image.png



Thanks for the help!

Best,
Shailee


On Thursday, February 15, 2018 at 10:09:20 AM UTC-5, Zhian Kamvar wrote:
Hello,

rogers.dist is basically a scaled Euclidean distance (Equation 8 of Rogers 1972) whereas diss.dist/prevosti.dist/bitwise.dist* returns the fraction of dissimilar sites, but is not euclidean. The advantage of measuring the fraction of dissimilar sites is that the results are easily interpretable, but at the cost of applicability to more geometric analyses.

Rogers, J S. 1972. Measures of genetic similarity and genetic distances.
Pages 145–153 of: Studies in Genetics. University of Texas Publishers.

Hope that helps,
Zhian


*This comes up a lot, but bitwise.dist is no different than diss.dist or Provosti's distance. The purpose of its design is simply to handle genlight objects more efficiently, but will have no qualitative difference than these other distances.

-----
Zhian N. Kamvar, Ph. D.
Postdoctoral Researcher (Everhart Lab)
Department of Plant Pathology
University of Nebraska-Lincoln
ORCID: 0000-0003-1458-7108




> On Feb 14, 2018, at 00:01 , genetist <blackt...@gmail.com> wrote:
>
> Hi to all
> Good Morning
>
> I have SNP genotyping data of 386 markers on diploid 40 female parents, 40 male parents  and their 40 offspring's. I have missing data in that  and now I want to calculate genetic related ness between these parents and offspring's based on SNP genotyping data. I gone through information present at
>
> https://grunwaldlab.github.io/Population_Genetics_in_R/Pop_Structure.html
>
>  but I am confused whether to use rogers.dist or bitewise.dist? I decided myself to use bitewise.dist because I have missing data and bitewise.dist can handle it where as rogers.dist is not. Can any one share your expertise on this whether I selected right method or not?
> Thanks in advance
> Regards,
>
> --
> You received this message because you are subscribed to the Google Groups "poppr" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to po...@googlegroups.com.

Zhian Kamvar

unread,
Apr 25, 2020, 11:01:14 AM4/25/20
to Shailee Shah, poppr
Hello,

This is a bit strange. The correlation is flipping with the different distances and I'm not sure why that may be. If you have a lot of missing data in your data set, that may be a factor since diss.dist ignores missing data, while euclidean distance drops missing data from the comparison. Additionally, information on what package you are using to calculate the mantel statistic would be useful for diagnosis of this problem.

Best,
Zhian

On Apr 17, 2020, at 18:59 , Shailee Shah <fireb...@gmail.com> wrote:

Hello!

I have a related question. 

I have a dataset of 237 individuals and am trying to run a mantel test on genetic ~ geographic distance. I have a SNP dataset of > 6,000 SNPs, with missing data, for all the individuals. I'm trying to figure out what method to use to calculate genetic distances. 

I used stats::dist() as well as poppr::diss.dist() and compared. 

I understand that the two methods are different – dist() is Euclidean distance while diss.dist() is not. However, I am struggling to understand why they are giving me completely opposite results. 

Here is my code and my results from mantel tests. 

The geographic distance matrix is a Euclidean distance matrix calculated using lat-long of sampling sites.

# with dist() #################################
gen
.dists <- dist(allsamples.gen, method= "euclidean", diag = FALSE,upper = FALSE, p=2)


# with diss.dist() ################################# 

gen
.dists.poppr <- diss.dist(allsamples.gen)

# check if they are correlated

mantel
(gen.dists ~ gen.dists.poppr)

#mantelr      pval1      pval2      pval3  llim.2.5% ulim.97.5%
# 0.6453154  0.0010000  1.0000000  0.0010000  0.6246595  0.6667839

# geo.dists.full = geographical distance matrix

mantel
(gen.dists.poppr ~ geo.dists.full, nperm = 10000) 

#    mantelr       pval1       pval2       pval3   llim.2.5%  ulim.97.5%
# -0.10971653  0.99510000  0.00500000  0.00670000 -0.13648637 -0.08876814

mantel
(gen.dists ~ geo.dists.full, nperm = 10000)

#   mantelr      pval1      pval2      pval3  llim.2.5% ulim.97.5%
#0.10018077 0.00050000 0.99960000 0.00320000 0.07750995 0.12707172

Thanks for the help!
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/8e7c04bd-4adb-4eea-bc74-93a6d999229e%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages