Getting silly, cryptic errors, trying to start my amova (FUN.VALUE = integer(nInd(poly)), MARGIN = 1, : values must be type 'integer', but FUN(X[[1]]) result is type 'double')

53 views
Skip to first unread message

Niklas

unread,
Jan 9, 2020, 12:34:32 PM1/9/20
to poppr
Hi there !

Hope somebody can help me on an issue getting an Amova to run on my data

I start with a vcf-file that I turned into a genind file, after creating a subset of my samples 
I turned it into a genclone to write information about the populations into the strata

########
svcf<-read.vcfR("sali1.vcf")
s1<-vcfR2genind(svcf)
m<-as.matrix(s1)
as.matrix(s1)[1:5,1:3]

#Subsets
di.names<-read.csv("Namen diploid.txt", header=F); di.names<-as.matrix(di.names)
indNames(s1)
s1dsub<-s1[c(di.names),]

pop.datad<-read.csv("pop.datad.csv", sep=";", header=T)
strata(dclone)<-data.frame(pop.datad$Location)
dclone@pop<-(pop.datad$Location)

damova <- poppr.amova(dclone, ~pop.datad.Location, missing = "mean")

########
After running the command I get following error message:

 Replaced 1190162 missing values.
Error in vapply(ploc, FUN = apply, FUN.VALUE = integer(nInd(poly)), MARGIN = 1,  : 
  values must be type 'integer',
 but FUN(X[[1]]) result is type 'double'
In addition: Warning messages:
1: In validityMethod(as(object, superClass)) :
  @tab does not contain integers; as of adegenet_2.0-0, numeric values are no longer used
2: In validityMethod(as(object, superClass)) :
  @tab does not contain integers; as of adegenet_2.0-0, numeric values are no longer used



Sadly I am not much of a pro in R or understanding these programs and I just wish 
to get some information out of this population comparison that I worked my but of to get.

Maybe somebody has an idea of what the problem might be.
As I am not too familiar with this kind of trouble shooting, please write me which further infos
are required to understand the issue and I´ll provide it ASAP !

Cheers !

Niklas

Zhian Kamvar

unread,
Jan 10, 2020, 4:45:26 AM1/10/20
to Niklas, poppr
Forgot to send this to the group:

Hello,

The reason why AMOVA is failing is because there is a ~slight~ bug when running AMOVA with missing data interpreted as mean allele frequencies.
AMOVA can handle missing data if you set missing = "asis", so use that instead of "mean".

A couple of side notes:

1. There is a known issue with handling genind data from vcfR2genind() from the current CRAN version of vcfR. When this happens, within-individual variance cannot be calculated.
2. When you set the strata, you need only use the data frame: strata(dclone) <- pop.datad
    from there, you can use poppr.amova(dclone, ~Location, missing = "asis"), which tells poppr.amova to take the Location column from the strata.

Hope that helps,
Zhian

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/54da3f65-fc29-42a2-bc67-0590b2f81bb5%40googlegroups.com.

Niklas

unread,
Jan 16, 2020, 11:55:03 AM1/16/20
to poppr
Hi there.

Thanks for your quick reply back than !
I just had the time to try everything out.

THAT problem seems to be solved, though the next came up.

##########

Error in eigen(delta, symmetric = TRUE, only.values = TRUE) : 
  infinite or missing values in 'x'
In addition: Warning messages:
1: In poppr.amova(dclone, ~pop.datad.Location, missing = "asis") :
  Data with mixed ploidy or ambiguous allele dosage cannot have within-individual variance calculated until the dosage is correctly estimated.

 This function will return the summary statistic, rho (Ronfort et al 1998) but be aware that this estimate will be skewed due to ambiguous dosage. If you have zeroes encoded in your data, you may wish to remove them.
 To remove this warning, use within = FALSE
2: In is.euclid(xdist) : Zero distance(s)

##########

But it seems, that the amount of my missing data is simply to high: I tried the other options for "missing"
and all gave different messages hinting to too much missing data.

Thanks though.

Cheers mate !

Niklas
To unsubscribe from this group and stop receiving emails from it, send an email to po...@googlegroups.com.

Zhian Kamvar

unread,
Jan 16, 2020, 12:10:24 PM1/16/20
to Niklas, poppr
I would suggest that you try using the `incomp()` function to find any genotypes that are incompatible with other genotypes (e.g. they share no alleles in common). This function returns a square matrix of your samples with a 1 if they are compatible and 0 if they are incompatible. This will give you a quick idea of which samples you need to remove.

Also, vcfR just recently updated on CRAN, so I would highly recommend you use `vcfR2genind(svcf, return.alleles = TRUE)` to allow for within-sample variance to be calculated. 

Here is an example of using `incomp()` to find and remove incomparable samples.

library(poppr)
data(nancycats)
strata(nancycats) <- data.frame(p = pop(nancycats))
nan <- nancycats[pop = c(1, 17), loc = c(1, 4)]
poppr.amova(nan, ~p, missing = "asis")
#> Warning in is.euclid(xdist): Zero distance(s)
#> Error in eigen(delta, symmetric = TRUE, only.values = TRUE): infinite or missing values in 'x'

rowSums(incomp(nan))
#> N215 N216 N282 N283 N288 N291 N292 N293 N294 N295 N296 N297 N281 N289 N290 
#>    2    2   13   13   13   13   13   13   13   13   13   13   13   13   13

poppr.amova(nan[rowSums(incomp(nan)) > 2, ], ~p, missing = "asis")
#> Warning in is.euclid(xdist): Zero distance(s)
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
#> Warning in is.euclid(distmat): Zero distance(s)
#> $call
#> ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
#> 
#> $results
#>                          Df    Sum Sq   Mean Sq
#> Between p                 1  3.238913 3.2389135
#> Between samples Within p 17 16.886126 0.9933015
#> Within samples           19 30.659856 1.6136766
#> Total                    37 50.784895 1.3725647
#> 
#> $componentsofcovariance
#>                                           Sigma          %
#> Variations  Between p                 0.1212120   8.507891
#> Variations  Between samples Within p -0.3101875 -21.772115
#> Variations  Within samples            1.6136766 113.264224
#> Total variations                      1.4247011 100.000000
#> 
#> $statphi
#>                           Phi
#> Phi-samples-total -0.13264224
#> Phi-samples-p     -0.23796713
#> Phi-p-total        0.08507891

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/53117652-bf78-4091-aeca-3f7194a97856%40googlegroups.com.

Niklas

unread,
Jan 16, 2020, 1:16:29 PM1/16/20
to poppr
Is this good ?

+++++

> rowSums(incomp(dclone))
numeric(0)

> samova<-poppr.amova(dclone, ~Location, missing = "asis")
Distance matrix is non-euclidean.
Using quasieuclid correction method. See ?quasieuclid for details.
Warning messages:
1: In is.euclid(xdist) : Zero distance(s)
2: In is.euclid(distmat) : Zero distance(s)


> samova
$call
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

$results
                                 Df    Sum Sq   Mean Sq
Between Location                 13  93027.11 7155.9316
Between samples Within Location 243 236804.08  974.5024
Within samples                  257 472072.13 1836.8565
Total                           513 801903.32 1563.1644

$componentsofcovariance
                                                Sigma         %
Variations  Between Location                 169.1136  10.73878
Variations  Between samples Within Location -431.1771 -27.37992
Variations  Within samples                  1836.8565 116.64114
Total variations                            1574.7931 100.00000

$statphi
                            Phi
Phi-samples-total    -0.1664114
Phi-samples-Location -0.3067392
Phi-Location-total    0.1073878


+++++

Zhian Kamvar

unread,
Jan 17, 2020, 4:44:22 AM1/17/20
to Niklas, poppr
Hi Niklas,

This looks fine. The negative variance is a thing that does tend to happen and has been addressed on this board: https://groups.google.com/d/msg/poppr/NSag-55d6bs/cfvOSV-VAQAJ

The fact that your data set passed through poppr.amova() the second time without any errors about missing data means that using the return.alleles = TRUE option with vcfR2genind() was the right way to go. I'll make a separate post on the forum explaining why.

Hope that helps.


Best,
Zhian

To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/4487dd13-61af-4cc2-a033-e58add541b99%40googlegroups.com.

Niklas

unread,
Jan 18, 2020, 4:58:59 AM1/18/20
to poppr
Hi Zhian,

You helped me more than you can imagine and helped me a lot with one of my chapters.

Thanks for your direct answers and the the helping advice beyond that.
I would have run into traps like the old vcfR2genind versions.

Thanks so much,
I am very grateful !

Cheers,
Niklas
Reply all
Reply to author
Forward
0 new messages