#Please how do I get amova to operate on microsat data when some loci have missing values

229 views
Skip to first unread message

Carlos P.E. Bedson

unread,
Jul 14, 2021, 4:50:36 AM7/14/21
to poppr
> # Dear Poppr people Please can you help with the following issue and error. I have a microsat dataset with many missing values which are coded as NAs. When i try to run these with poppr.amova I receive various error messages and it does not run. I have also tested this with the small platypus data set from popgenreport and receive the same messsages.
> # As my data set is patchy (7 strata, 97 samples, 11 loci, 30% missing data i.e. 4797 values), i really would like to run this and ensure that I do not cut off data.
# Unfortunately I do not understand any of the error messages and do not know how to fix this.  
> # Thank you for considering.  Best wishes from Carlos 
> ## THIS WORKS FINE 
> read.csv(paste(.libPaths()[1],"/PopGenReport/extdata/platypus1c.csv", sep="" )) # PLATYPUS DATA SET FROM POPGRENREPORT
    ind   pop       lat     long  group    age   loci1   loci2   loci3   loci4   loci5   loci6
1  T158 Black -40.86642 145.2836 Female    juv 141/145 243/243 159/173 263/265 215/215 192/192
2  T306 Black -40.85589 145.2764   Male     Ad 145/145 243/243 159/159 265/267 219/219 192/194
3  T305 Black -40.87889 145.2885 Female     Ad 143/145 243/243 159/159 265/265 215/215 194/194
4  T148 Black -40.99193 145.3757   Male     Ad 141/141 243/243 159/159 265/265 215/219 192/194
5  T149 Black -40.99193 145.3757 Female     Ad 141/145 243/245 159/171 265/267 215/215 194/196
6  T106  Brid -41.23205 147.4597   Male     Ad 145/147 243/245 159/171 267/267 215/215 192/192
7  T107  Brid -41.23205 147.4597   Male     Ad 145/145 243/243 159/171 267/267 215/215 192/192
8  T110  Brid -41.23205 147.4597 Female     Ad 145/145 243/245 171/173 263/265 215/215 192/192
9  T111  Brid -41.23205 147.4597 Female     Ad 141/145 245/245 159/159 267/267 215/215 192/192
10 T308   Cam -41.09567 145.7958   Male Sub-Ad 145/145 243/243 161/171 265/265 215/217 184/196
11 T307   Cam -41.06975 145.8152   Male     Ad 141/145 243/243 171/173 265/265 215/219 184/192
12 T302   Cam -41.05121 145.8280   Male     Ad 141/145 243/243 171/171 265/267 215/215 192/196
13 T303   Cam -41.04764 145.8230 Female    Juv 141/141 243/243 159/173 265/267 215/215 192/194
> PLATYPUS <- read.genetable( paste(.libPaths()[1],"/PopGenReport/extdata/platypus1c.csv"
+ , sep="" ), ind=1, pop=2, lat=3, long=4, other.min=5, other.max=6, oneColPerAll=FALSE, 
+ sep="/", )
> strata(PLATYPUS)<-data.frame(pop=PLATYPUS$pop)## SET STRATA 
> setPop(PLATYPUS)<- ~pop  ## ENSURE pop IS THE STRATA
> poppr.amova(PLATYPUS, ~pop)

 No missing values detected.

$call
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

$results
                           Df   Sum Sq  Mean Sq
Between pop                 2 14.82308 7.411538
Between samples Within pop 10 28.10000 2.810000
Within samples             13 34.00000 2.615385
Total                      25 76.92308 3.076923

$componentsofcovariance
                                            Sigma          %
Variations  Between pop                0.53410714  16.450266
Variations  Between samples Within pop 0.09730769   2.997034
Variations  Within samples             2.61538462  80.552700
Total variations                       3.24679945 100.000000

$statphi
                         Phi
Phi-samples-total 0.19447300
Phi-samples-pop   0.03587126
Phi-pop-total     0.16450266

> ## HERE IT STARTS TO FAIL...
> ## NOW REPLACE NA VALUE  SampleT303 LocI2 IS NOW NA value   (I DID THIS MANUALLY IN EXCEL) AND THEN RELOAD THE FILE AS PLATYPUS.ZERO.csv
> ## LOAD FILE WITH NA VALUE
> read.csv(paste(.libPaths()[1],"/PopGenReport/extdata/platypus.na.csv", sep="" ))
    ind   pop       lat     long  group    age   loci1   loci2   loci3   loci4   loci5   loci6
1  T158 Black -40.86642 145.2836 Female    juv 141/145 243/243 159/173 263/265 215/215 192/192
2  T306 Black -40.85589 145.2764   Male     Ad 145/145 243/243 159/159 265/267 219/219 192/194
3  T305 Black -40.87889 145.2885 Female     Ad 143/145 243/243 159/159 265/265 215/215 194/194
4  T148 Black -40.99193 145.3757   Male     Ad 141/141 243/243 159/159 265/265 215/219 192/194
5  T149 Black -40.99193 145.3757 Female     Ad 141/145 243/245 159/171 265/267 215/215 194/196
6  T106  Brid -41.23205 147.4597   Male     Ad 145/147 243/245 159/171 267/267 215/215 192/192
7  T107  Brid -41.23205 147.4597   Male     Ad 145/145 243/243 159/171 267/267 215/215 192/192
8  T110  Brid -41.23205 147.4597 Female     Ad 145/145 243/245 171/173 263/265 215/215 192/192
9  T111  Brid -41.23205 147.4597 Female     Ad 141/145 245/245 159/159 267/267 215/215 192/192
10 T308   Cam -41.09567 145.7958   Male Sub-Ad 145/145 243/243 161/171 265/265 215/217 184/196
11 T307   Cam -41.06975 145.8152   Male     Ad 141/145 243/243 171/173 265/265 215/219 184/192
12 T302   Cam -41.05121 145.8280   Male     Ad 141/145 243/243 171/171 265/267 215/215 192/196
13 T303   Cam -41.04764 145.8230 Female    Juv 141/141    <NA> 159/173 265/267 215/215 192/194
> PLATYPUS.NA <- read.genetable( paste(.libPaths()[1],"/PopGenReport/extdata/platypus.na.csv"
+ , sep="" ), ind=1, pop=2, lat=3, long=4, other.min=5, other.max=6, oneColPerAll=FALSE, 
+ sep="/", )
> strata(PLATYPUS.NA)<-data.frame(pop=PLATYPUS.NA$pop)## SET STRATA 
> setPop(PLATYPUS.NA)<- ~pop  ## ENSURE pop IS THE STRATA
> poppr.amova(PLATYPUS.NA, ~pop)# REMOVES 1 LOCUS BUT PROVIDES VALUES 

Found 2 missing values.

1 locus contained missing values greater than 5%

Removing 1 locus: , loci2
$call
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

$results
                           Df   Sum Sq  Mean Sq
Between pop                 2 12.54615 6.273077
Between samples Within pop 10 25.30000 2.530000
Within samples             13 31.00000 2.384615
Total                      25 68.84615 2.753846

$componentsofcovariance
                                            Sigma          %
Variations  Between pop                0.43446429  15.024154
Variations  Between samples Within pop 0.07269231   2.513763
Variations  Within samples             2.38461538  82.462082
Total variations                       2.89177198 100.000000

$statphi
                         Phi
Phi-samples-total 0.17537918
Phi-samples-pop   0.02958209
Phi-pop-total     0.15024154

> poppr.amova(PLATYPUS.NA, ~pop, cutoff = 0.15) # TRY TO STOP IT REMOVING LOCUS

 No loci with missing values above 15% found.

Distance matrix is non-euclidean.
Using quasieuclid correction method. See ?quasieuclid for details.
$call
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

$results
                           Df   Sum Sq  Mean Sq
Between pop                 2 14.11419 7.057097
Between samples Within pop 10 29.25315 2.925315
Within samples             13 34.55278 2.657906
Total                      25 77.92012 3.116805

$componentsofcovariance
                                           Sigma          %
Variations  Between pop                0.4795818  14.660765
Variations  Between samples Within pop 0.1337046   4.087335
Variations  Within samples             2.6579059  81.251901
Total variations                       3.2711923 100.000000

$statphi
                         Phi
Phi-samples-total 0.18748099
Phi-samples-pop   0.04789514
Phi-pop-total     0.14660765

Warning messages:
1: In is.euclid(xdist) : Zero distance(s)
2: In is.euclid(distmat) : Zero distance(s)
> PLATYPUS.ZERO<-missingno(PLATYPUS.NA, type = "zero") # MAKE NEW GENIND REPLACING NA WITH ZERO ## WARNING THIS ADDS TO DIVERSITY 

 Replaced 2 missing values.
> poppr.amova(PLATYPUS.ZERO, ~pop) # FAILS 

 No missing values detected.

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 0
> ## THE OTHER OBJECT WHICH HAS PROVIDED A SLIGHTLY DIFFERENT ERROR MESSAGE IS THIS 
> poppr.amova(HARES.GENCLONE, ~pop, cutoff = 0.95)

 No loci with missing values above 95% found.

Error in eigen(delta, symmetric = TRUE, only.values = TRUE) : 
  infinite or missing values in 'x'
In addition: Warning message:
In is.euclid(xdist) : Zero distance(s)
> ## OR ON MY OWN DATA SET I ATTEMPT TO RESOLVE THE 4797 MISSING VALUES
> HARES.GENCLONE

This is a genclone object
-------------------------
Genotype information:

   96 original multilocus genotypes 
   96 diploid individuals
   11 codominant loci

Population information:

    1 stratum - pop
    7 populations defined - Snake, Holme_Moss, Derwent, Woodhead, Saddleworth, Bleaklow, Margery_Hill
> HARES.GENCLONE.MEANS<-missingno(HARES.GENCLONE, type = "mean" ) # SUSPECT THIS DOES NOT CREATE WHOLE INTEGERS

 Replaced 3190 missing values.
> poppr.amova(HARES.GENCLONE.MEANS, ~pop, cutoff=0.95)

 No missing values detected.

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 0, 1
In addition: Warning messages:
1: In validityMethod(as(object, superClass)) :
  @tab does not contain integers; as of adegenet_2.0-0, numeric values are no longer used
2: In validityMethod(as(object, superClass)) :
  @tab does not contain integers; as of adegenet_2.0-0, numeric values are no longer used
> HARES.GENCLONE.ZERO<-missingno(HARES.GENCLONE, type = "zero" )# PRODUCES MORE DIVERSITY !!

 Replaced 3190 missing values.
> poppr.amova(HARES.GENCLONE.ZERO, ~pop)

 No missing values detected.

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 2, 0
> #### THANK YOU FOR LOOKING AT THIS FOR ME #### 
>

Zhian Kamvar

unread,
Jul 16, 2021, 4:39:02 PM7/16/21
to Carlos P.E. Bedson, poppr
Hello,

Instead of trying to adjust the cutoff, you could use `missing = "ignore"`, which will leave the missing data asis in the data and the distance function will take care of ignoring it.

When you ran this without rejecting any loci, you got this error:

> poppr.amova(HARES.GENCLONE, ~pop, cutoff = 0.95)

 No loci with missing values above 95% found.

Error in eigen(delta, symmetric = TRUE, only.values = TRUE) : 
  infinite or missing values in 'x'
In addition: Warning message:
In is.euclid(xdist) : Zero distance(s)

The error in the `eigen()` function: infinite or missing values in 'x' means that there are missing values in the distance matrix used in the AMOVA analysis. This is most likely caused by incomparable samples in your data due to the sparsity. Effectively, there are samples in your data that share no loci in common and there is going to be no way to compare them. My recommendation would be to set missing = "geno" and cutoff = 0.5 to remove all samples with less than half typed loci. 

Here's an example that reproduces the error you are getting:
library("poppr")
data("nancycats")

Two samples in population 1 of nancycats are missing data at locus 1 

Population 17 is missing all of locus 4

n117 <- nancycats[pop = c(1, 17), loc = c(1, 4)]
incomp(n117) # the first two samples are incomparable to the rest
#>      N215 N216 N282 N283 N288 N291 N292 N293 N294 N295 N296 N297 N281 N289 N290
#> N215    1    1    0    0    0    0    0    0    0    0    0    0    0    0    0
#> N216    1    1    0    0    0    0    0    0    0    0    0    0    0    0    0
#> N282    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N283    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N288    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N291    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N292    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N293    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N294    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N295    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N296    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N297    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N281    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N289    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
#> N290    0    0    1    1    1    1    1    1    1    1    1    1    1    1    1
info_table(n117, plot = FALSE)
#>           Locus
#> Population  fca8 fca45  Mean
#>      P01   0.200     . 0.100
#>      P17       . 1.000 0.500
#>      Total 0.087 0.565 0.326
strata(n117) <- data.frame(pop = pop(n117))
poppr.amova(n117, ~pop, missing = "ignore")
#> Warning in is.euclid(xdist): Zero distance(s)
#> Error in eigen(delta, symmetric = TRUE, only.values = TRUE): infinite or missing values in 'x'

I hope that helps.

Best,
Zhian

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/3dba4258-e02b-41a2-b4bc-77d9510f3f85n%40googlegroups.com.

Carlos P.E. Bedson

unread,
Jul 19, 2021, 8:49:59 AM7/19/21
to poppr
Thank you Zhian.  This has worked.   I appreciate your taking the time to write this out and explain it to me.   Maybe include this situation in your next how-to guide.    Thank you again so much.   Best wishes, Carlos 

Zhian N. Kamvar

unread,
Jul 19, 2021, 10:12:50 AM7/19/21
to po...@googlegroups.com
I'm glad it worked. For what it's worth, the error message you saw was not from poppr itself, but from the eigen function. Now that I know this kind of error can occur, I will improve the poppr.amova() function to catch this situation early and report an error message that gives better guidance.

All the best,
Zhian

Reply all
Reply to author
Forward
0 new messages