Poppr: index of association

204 views
Skip to first unread message

roguestorm78

unread,
Nov 13, 2014, 1:49:22 PM11/13/14
to po...@googlegroups.com
Hi,

I am trying to perform an Index of Association test using poppr on SNP data that I've imported into a genind object.  The object contains 3 populations with multiple individuals in each population.  However, when I run the ia() test, I get the error "Error in .Call("pairwise_covar", vard.vector, PACKAGE = "poppr") : negative length vectors are not allowed".  What's going on here?  Are my variances coming up as negative values?

Thanks,

Zhian Kamvar

unread,
Nov 13, 2014, 2:08:06 PM11/13/14
to po...@googlegroups.com
Hi Johanna,

Thanks for the bug report. A quick google search of the error message "negative length vectors are not allowed" turned this up: http://alyssafrazee.com/errors.html:

this is the error you get in R when you try to make a vector with a length of 2^31 or more.

Can you post the results of sessionInfo()? 

I have a feeling that you might be using the 32-bit version or an older version of R (< 3.0), which limits the size of vectors. I believe that in the 64-bit versions of R, the size of vectors are only limited by the size of your RAM. 

Hope that helps,
Zhian

roguestorm78

unread,
Nov 13, 2014, 2:11:26 PM11/13/14
to po...@googlegroups.com
Hi Zhian,

Thanks for your reply!  Here's the sessionInfo() output:

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] poppr_1.1.2    adegenet_1.4-2 ade4_1.6-2    

loaded via a namespace (and not attached):
 [1] ape_3.1-4        bitops_1.0-6     caTools_1.17.1   colorspace_1.2-4
 [5] digest_0.6.4     fastmatch_1.0-4  ggplot2_1.0.0    grid_3.1.1      
 [9] gtable_0.1.2     htmltools_0.2.6  httpuv_1.3.0     igraph_0.7.1    
[13] labeling_0.3     lattice_0.20-29  MASS_7.3-33      Matrix_1.1-4    
[17] munsell_0.4.2    nlme_3.1-117     parallel_3.1.1   pegas_0.6       
[21] permute_0.8-3    phangorn_1.99-11 plyr_1.8.1       proto_0.3-10    
[25] Rcpp_0.11.2      reshape2_1.4     rgl_0.93.1098    RJSONIO_1.3-0   
[29] scales_0.2.4     shiny_0.10.1     stringr_0.6.2    tcltk_3.1.1     
[33] tools_3.1.1      vegan_2.0-10     xtable_1.7-4  

So def using 64-bit and a version >3.0 of R, but I have just seen that there's a new version of R (3.1.2), so will try upgrading to that and see if this gets sorted.

Thanks!

Jo

Zhian Kamvar

unread,
Nov 13, 2014, 2:50:03 PM11/13/14
to po...@googlegroups.com
Hi Jo,

This is odd indeed. I usually expect this kind of error on Windows systems. My gut feeling is that you've reached the limit of your computer's RAM.

How many SNPs do you have in your data? I ask this because of the fact that R stores everything in memory. That error call was to an internal function that calculates a covariance matrix that is of length (n*(n-1))/2, where n is the number of SNPs in your data. This is what that looks like in terms of memory usage:


print(object.size(vector(length = choose(10, 2), mode = "numeric")), units = "auto")
# 400 bytes
print(object.size(vector(length = choose(100, 2), mode = "numeric")), units = "auto")
# 38.7 Kb
print(object.size(vector(length = choose(1000, 2), mode = "numeric")), units = "auto")
# 3.8 Mb
print(object.size(vector(length = choose(10000, 2), mode = "numeric")), units = "auto")
# 381.4 Mb
print(object.size(vector(length = choose(10000, 2), mode = "numeric"))*100, units = "auto")
# 37.2 Gb

What this is saying is that, you need more than 381.4 Mb of RAM to calculate the index of association on 10,000 SNPs, but to go to 100,000 SNPs, you would need over 37 Gb just for that one vector needed for the calculation of the index of association. 

Basically, while you can use as many markers you so desire, poppr's performance degrades once you get over a few hundred. We are working on alleviating that, but we are still in the development phase. 

Another thing to consider is that calculation of the index of association is very sensitive to linkage, so you want to make sure that your SNPs are not physically close together, or you might get spurious results.

Hope that helps,
Zhian 

roguestorm78

unread,
Nov 13, 2014, 4:13:07 PM11/13/14
to po...@googlegroups.com
Hi Zhian,

That's interesting, thanks for pointing that out.  I'm working on nearly 200k SNPs!  So probably will be a problem with RAM then.  I can stick this analysis on a high performance computer though, hopefully that will alleviate this.

Thank you so much for your help!

Jo

Zhian Kamvar

unread,
Nov 13, 2014, 5:12:10 PM11/13/14
to po...@googlegroups.com
Hi Jo,

Forgive me if this is a bit assuming, but I would guess that you will find physical linkage in 200k SNPs. While you can calculate the index of association on 200k SNPs, I don't know if it should be done considering that some pairs or groups of these loci will not be independent. A better approach might be to do a sliding window or doing this on small subsets of loci to get a distribution of the index of association. It's important to consider that this was an approach that was developed for a handful of loci and it's unknown how well it scales up to hundreds of thousands.

Regards,
Zhian 

roguestorm78

unread,
Nov 14, 2014, 3:51:22 AM11/14/14
to po...@googlegroups.com
Hi Zhian,

No, not assuming at all, I completely agree with you!  I think picking a few loci (sort of like an in silico MLST approach) would be a good thing to do here.

Thanks for your help, I appreciate it,

Jo

Brian Knaus

unread,
Nov 14, 2014, 12:34:28 PM11/14/14
to roguestorm78, po...@googlegroups.com
Hi Jo,

I was listening in on the group conversation and had an idea.  I don't know what you're working with upstream of poppr, but if you're using vcf files I think you can thin them with VCFtools.  I believe you could use the vcftools with the --thin option:
to make sure you do not have markers within some integer distance of one another (e.g. 1kbp).  You could then bring this thinned data set into poppr and try the IA again.

Good luck!
Brian


--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/c6e21cf2-7520-46da-be3f-d5ba9abb1029%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Brian J. Knaus, Ph.D.
Research Plant Pathologist
USDA Agricultural Research Service
Horticultural Crops Research Unit
Corvallis, Oregon, USA
brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

Melanie Montes

unread,
Jan 24, 2017, 5:57:44 AM1/24/17
to poppr
I am running into the same "negative length vectors not allowed" error when trying to use the poppr() function on a dataset of 56 individuals with 55,288 SNPs.
It works on smaller subsets from the same data, so I understand that it must be a memory issue as well, but I am on a super computer, so this is a bit surprising to me. Here is the info on the R session and the available memory on the system:

> sessionInfo()

R version 3.2.3 (2015-12-10)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 16.04.1 LTS


locale:

[1] C


attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     


other attached packages:

[1] poppr_2.3.0    adegenet_2.0.1 ade4_1.7-4    


loaded via a namespace (and not attached):

 [1] Rcpp_0.12.8      spdep_0.6-8      plyr_1.8.4       pegas_0.9       

 [5] LearnBayes_2.15  tools_3.2.3      boot_1.3-17      digest_0.6.10   

 [9] tibble_1.2       nlme_3.1-124     gtable_0.2.0     lattice_0.20-33 

[13] mgcv_1.8-11      fastmatch_1.0-4  Matrix_1.2-3     igraph_1.0.1    

[17] shiny_0.14.2     DBI_0.5-1        parallel_3.2.3   coda_0.18-1     

[21] cluster_2.0.3    dplyr_0.5.0      stringr_1.1.0    gtools_3.5.0    

[25] grid_3.2.3       R6_2.2.0         phangorn_2.0.4   sp_1.2-3        

[29] gdata_2.17.0     ggplot2_2.2.0    reshape2_1.4.2   seqinr_3.3-3    

[33] deldir_0.1-12    magrittr_1.5     nnls_1.4         scales_0.4.1    

[37] htmltools_0.3.5  MASS_7.3-45      splines_3.2.3    gmodels_2.16.2  

[41] assertthat_0.1   permute_0.9-4    mime_0.5         ape_4.0         

[45] colorspace_1.3-1 xtable_1.8-2     httpuv_1.3.3     quadprog_1.5-5  

[49] stringi_1.1.2    lazyeval_0.2.0   munsell_0.4.3    vegan_2.4-1  



kxb901@melanie1:~$ free -m

              total        used        free      shared  buff/cache   available

Mem:         483751        1222      482528         117           0      482528

Swap:          1376         654         722  


All other functions, such as the index of association, I have no problem running on the same dataset.
I am able to get some of the diversity measures by instead using mlg.table and diversity_stats, but would really also like to have the eMLG and Hexp. Is there another way to get this? Or is there a way to get these stats from a snpclone object instead? When I run poppr() on a snpclone object it says "use diversity_table()", but that function doesn't seem to exist? 

Thanks as always for dealing with my amateur questions!
Cheers, 
Melanie 

Zhian Kamvar

unread,
Jan 24, 2017, 2:16:57 PM1/24/17
to Melanie Montes, poppr
Hi Melanie,

First: the error message from poppr should read "diversity_stats()". This will be updated in a future version.

As mentioned before, the "negative length vectors not allowed" is a bit enigmatic, so it's not clear where exactly it's coming from. 

To calculated eMLG, you can use the vegan function rarefy:

diversity_stats(my_data, eMLG = function(x) vegan::rarefy(t(as.matrix(x)), 10))

Regarding Hexp: I don't have an example on hand, but if you look at the equation for Hexp here: https://cran.r-project.org/web/packages/poppr/vignettes/algo.pdf, you can calculate it "by hand" by converting the genlight object to a matrix and extracting the minor allele frequencies by taking the colSums and dividing that by n*ploidy. From there, you can just plug those into the equation and you should have Hexp.

Best,
Zhian


-- 
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages