AMOVA Error

517 views
Skip to first unread message

meharji....@helsinki.fi

unread,
Feb 7, 2019, 4:08:41 AM2/7/19
to po...@googlegroups.com

Dear Kamvar,

I have been trying to implement AMOVA on my dataset and constantly failing with the below error after running the command for 2-3hrs. Could you please help what has been wrong?

BTW i have created the GENIND object for my dataframe with df2genind function as shown below:

> snp.df[1:5,1:5]
        chr1_252419 chr1_363813 chr1_602567 chr1_924964 chr1_1079019
AC            0           0           3           0            0
AM            0           0           3           3            0
BB            0           0           1           0            0
BC            0           0           3           1            0
BC            1           0           1           0            0

> snp.genind<-df2genind(snp.df,ncode=1,strata=snp.strata)

But the poppr.amova function fails:


> snp.genind
/// GENIND OBJECT /////////

 // 142 individuals; 41,777 loci; 153,762 alleles; size: 116.5 Mb

 // Basic content
   @tab:  142 x 153762 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 1-5)
   @loc.fac: locus factor for the 153762 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: df2genind(X = snp.df, ncode = 1, strata = snp.strata)

 // Optional content
   @strata: a data frame with 2 columns ( Breed, Sample )

> poppr.amova(snp.genind, ~Breed/Sample)

 No missing values detected.

Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length


Regards

 

Zhian Kamvar

unread,
Feb 7, 2019, 6:15:09 AM2/7/19
to Arumilli, Meharji, poppr
Hi,

I think this may be due to the fact that your data appear haploid, but df2genind is treating it as diploid. Is this your intention? 

Do you get the same behavior with a small subset of your data?

Best,
Zhian

On Feb 7, 2019, at 18:08 , meharji....@helsinki.fi wrote:

I have been trying to implement AMOVA on my dataset and constantly failing with the below error after running the command for 2-3hrs. Could you please help what has been wrong?

I have created the GENIND object for my dataframe with df2genind function as shown below:

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/58497e8d-34e5-4fc1-a1a8-1977f68de4f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

meharji....@helsinki.fi

unread,
Feb 7, 2019, 8:55:37 AM2/7/19
to po...@googlegroups.com
Hi,

The data is diploid SNP genotype data. However, the genotypes are coded as 0 (reference), 1 (heterozygous) , 2 (homozygous) and -1 (missing). The df2genind is used on this matrix.

Now, i have used df2genind with sep="" and got the below result on small dataset. Is this a valid approach? and if yes, why are the componentsofcovariance NaN.

 > snp.genind<-df2genind(test.snp.df,sep="",strata=snp.strata)
> poppr.amova(snp.genind, ~Breed/Sample)

 No missing values detected.

$call
ade4::amova(samples = xtab, distances = xdist, structures = xstruct)

$results
                              Df   Sum Sq  Mean Sq
Between Breed                 45 2163.562 48.07916
Between samples Within Breed  96 2574.924 26.82212
Within samples                 0    0.000      NaN
Total                        141 4738.486 33.60628

$componentsofcovariance
                                         Sigma   %
Variations  Between Breed                  NaN NaN
Variations  Between samples Within Breed   NaN NaN
Variations  Within samples                 NaN NaN
Total variations                           NaN NaN

$statphi
                  Phi
Phi-samples-total NaN
Phi-samples-Breed NaN
Phi-Breed-total   NaN

Regards

Zhian Kamvar

unread,
Feb 7, 2019, 10:35:31 PM2/7/19
to Arumilli, Meharji, poppr
Hi, 

I see where the confusion lies. You do not want a genind object. Instead you want to create a genlight object (for details, see the Genomics tutorial here: https://github.com/thibautjombart/adegenet/wiki/Tutorials). You can create a genlight object by using:

snp.genlight  <- new("genlight", test.snp.df, strata = snp.strata)
amova.results <- poppr.amova(snp.genlight, ~Breed/Sample)

A couple of things to note:

1. Change the -1 in your data to NA with test.snp.df[test.snp.df == -1] <- NA
2. Your previous output shows several "3"s in your snp data, so you may want to confirm that your data actually represent dosage. 

I would highly recommend for you to read the documentation for poppr.amova because there are many assumptions in the analysis: https://grunwaldlab.github.io/poppr/reference/poppr.amova.html 

Best,
Zhian

On Feb 7, 2019, at 22:55 , meharji....@helsinki.fi wrote:

Hi,

The data is diploid SNP genotype data. However, the genotype are coded as 0 (reference), 1 (heterozygous) , 2 (homozygous) and -1 (missing). The df2genind is used on this matrix.

meharji....@helsinki.fi

unread,
Feb 8, 2019, 7:54:00 AM2/8/19
to poppr
Hi,

I tried changing -1 to NA and used popp.amova with and without freq option as stated in the Ploidy section of the documentation. However, both the commands failed as shown below. I wonder whether my dataset could be used here.


>  test.snp.df[test.snp.df == -1] <- NA
> snp.genlight  <- new("genlight", test.snp.df, strata = snp.strata)

> amova.results <- poppr.amova(snp.genlight, ~Breed/Sample)
Error in DISTFUN(mpop, ...) : 
  min(ploidy(x)) == 2 || min(ploidy(x)) == 1 is not TRUE

> amova.results <- poppr.amova(snp.genlight, ~Breed/Sample,freq="TRUE")
Error in freq & codominant : 
  operations are possible only for numeric, logical or complex types

Zhian Kamvar

unread,
Feb 8, 2019, 8:47:56 PM2/8/19
to meharji....@helsinki.fi, poppr
Hi,

Your data do not appear to conform to the specifications you claim:

The data is diploid SNP genotype data. However, the genotype are coded as 0 (reference), 1 (heterozygous) , 2 (homozygous) and -1 (missing)

The example of the data you posted clearly had 3s represented. Genlight objects assume that the numbers represent counts of minor alleles, so the presence of a 3 would indicate triploid data, which is why you get this error:

> amova.results <- poppr.amova(snp.genlight, ~Breed/Sample)
Error in DISTFUN(mpop, ...) : 
  min(ploidy(x)) == 2 || min(ploidy(x)) == 1 is not TRUE

I would recommend inspecting your original data to see why you are getting apparently triploid samples in an otherwise diploid organism. If your data are from a VCF file, I would recommend using the vcfR package to read in the data and use the vcfR2genlight function to convert. 

Hope that helps,
Zhian



Sent from my iPhone

Arumilli, Meharji

unread,
Feb 9, 2019, 12:19:24 PM2/9/19
to Zhian Kamvar, poppr
Thank you. In summary, yes the data is diploid SNP genotype data. However, the recoded genotypes make it look like dosage data which is why we see 3 in the matrix. Also, in the data to be analyzed we have copy number states which are for eg: 4,5,6…. So i assume AMOVA doesn’t work in such cases. Correct me if iam wrong.

Zhian Kamvar

unread,
Feb 9, 2019, 7:49:03 PM2/9/19
to Arumilli, Meharji, poppr
If I am understanding you correctly, the numbers in your data represent genotypes and not alleles? 

The problem here is that it’s not clear to poppr how these numbers are related to each other, so it’s impossible to tell if two heterozygous genotypes share alleles, or even which genotypes are heterozygous. If you perform AMOVA on these data, it will give you incorrect results simply due to the fact that distances will be artificially inflated for each heterozygous genotype. 

I would highly recommend for you to record your data in a way that shows the alleles:

If your data are all biallelic (e.g. each locus has only two possible alleles), then you can use the genlight object and should recode like so:
homozygous genotypes (major allele) to 0
heterozygous genotypes to 1
homozygous genotypes (minor allele) to 2

If, however, you can have multiple alleles at a single locus, then you should use the genind object and record your data to the character strings the alleles represent. (e.g. if 3 represents homozygous C, then code it as “CC”).

From the output of your original data, I suspect that you have the second scenario. 

I hope that helps,
Zhian

Sent from my iPhone

meharji....@helsinki.fi

unread,
Mar 1, 2019, 9:19:47 AM3/1/19
to po...@googlegroups.com
Dear Zhian,

Based on your suggestion i have created a bi-allelic VCF file and loaded with vcfR and converted to genlight object


> vcf <- read.vcfR("SNP.syn.biallelic.recode.vcf", verbose = FALSE )
> snp.genlight<-vcfR2genlight(vcf)

> pop(snp.genlight) <- group   ##add population group

> head(group)
    Sample Group
1:   A065    AC
2: AM6B1    AM
3:   BB11    BB
4:  B1    BC
5:  BC1026    BC
6:  BC1028    BC

> snp.genlight
 /// GENLIGHT OBJECT /////////

 // 142 genotypes,  41,570 binary SNPs, size: 5.3 Mb
 131852 (2.23 %) missing data

 // Basic content
   @gen: list of 142 SNPbin

 // Optional content
   @ind.names:  142 individual labels
   @loc.names:  41570 locus labels
   @chromosome: factor storing chromosomes of the SNPs
   @position: integer storing positions of the SNPs
   @pop: population of each individual (group size range: 1-22)
   @other: a list containing: elements without names 

> amova.results <- poppr.amova(snp.genlight, ~Group/Sample)
Error: Number of rows in data frame not equal to number of individuals in object.
In addition: Warning messages:
1: Cannot set the population from an empty strata 
2: In poppr.amova(snp.genlight, ~Group/Sample) :
  Missing data are not filtered from genlight data.
3: In make_haplotypes_genlight(gid) :
  No strata found... creating one from the population
  
I have verified the sample names in the vcf and with sample names in groups dataframe. All the samples is VCF are in the groups data frame.
  
> all(sort(colnames(vcf@gt)[-1]) == sort(Group$Sample))
[1] TRUE

And the length of the samples is also same.
> length(colnames(vcf@gt)[-1])
[1] 142
> length(Group$Sample)
[1] 142

Could you hint what has been going wrong here while running 'poppr.amova'.


Reply all
Reply to author
Forward
0 new messages