Poppr: index of association

roguestorm78

unread,

Nov 13, 2014, 1:49:22 PM11/13/14

to po...@googlegroups.com

Hi,

I am trying to perform an Index of Association test using poppr on SNP data that I've imported into a genind object. The object contains 3 populations with multiple individuals in each population. However, when I run the ia() test, I get the error "Error in .Call("pairwise_covar", vard.vector, PACKAGE = "poppr") : negative length vectors are not allowed". What's going on here? Are my variances coming up as negative values?

Thanks,

Zhian Kamvar

unread,

Nov 13, 2014, 2:08:06 PM11/13/14

to po...@googlegroups.com

Hi Johanna,

Thanks for the bug report. A quick google search of the error message "negative length vectors are not allowed" turned this up: http://alyssafrazee.com/errors.html:

this is the error you get in R when you try to make a vector with a length of 2^31 or more.

Can you post the results of sessionInfo()?

I have a feeling that you might be using the 32-bit version or an older version of R (< 3.0), which limits the size of vectors. I believe that in the 64-bit versions of R, the size of vectors are only limited by the size of your RAM.

Hope that helps,

Zhian

roguestorm78

unread,

Nov 13, 2014, 2:11:26 PM11/13/14

to po...@googlegroups.com

Hi Zhian,

Thanks for your reply! Here's the sessionInfo() output:

R version 3.1.1 (2014-07-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] poppr_1.1.2 adegenet_1.4-2 ade4_1.6-2

loaded via a namespace (and not attached):

[1] ape_3.1-4 bitops_1.0-6 caTools_1.17.1 colorspace_1.2-4

[5] digest_0.6.4 fastmatch_1.0-4 ggplot2_1.0.0 grid_3.1.1

[9] gtable_0.1.2 htmltools_0.2.6 httpuv_1.3.0 igraph_0.7.1

[13] labeling_0.3 lattice_0.20-29 MASS_7.3-33 Matrix_1.1-4

[17] munsell_0.4.2 nlme_3.1-117 parallel_3.1.1 pegas_0.6

[21] permute_0.8-3 phangorn_1.99-11 plyr_1.8.1 proto_0.3-10

[25] Rcpp_0.11.2 reshape2_1.4 rgl_0.93.1098 RJSONIO_1.3-0

[29] scales_0.2.4 shiny_0.10.1 stringr_0.6.2 tcltk_3.1.1

[33] tools_3.1.1 vegan_2.0-10 xtable_1.7-4

So def using 64-bit and a version >3.0 of R, but I have just seen that there's a new version of R (3.1.2), so will try upgrading to that and see if this gets sorted.

Thanks!

Jo

Zhian Kamvar

unread,

Nov 13, 2014, 2:50:03 PM11/13/14

to po...@googlegroups.com

Hi Jo,

This is odd indeed. I usually expect this kind of error on Windows systems. My gut feeling is that you've reached the limit of your computer's RAM.

How many SNPs do you have in your data? I ask this because of the fact that R stores everything in memory. That error call was to an internal function that calculates a covariance matrix that is of length (n*(n-1))/2, where n is the number of SNPs in your data. This is what that looks like in terms of memory usage:

print(object.size(vector(length = choose(10, 2), mode = "numeric")), units = "auto")
# 400 bytes
print(object.size(vector(length = choose(100, 2), mode = "numeric")), units = "auto")
# 38.7 Kb
print(object.size(vector(length = choose(1000, 2), mode = "numeric")), units = "auto")
# 3.8 Mb
print(object.size(vector(length = choose(10000, 2), mode = "numeric")), units = "auto")
# 381.4 Mb
print(object.size(vector(length = choose(10000, 2), mode = "numeric"))*100, units = "auto")
# 37.2 Gb

What this is saying is that, you need more than 381.4 Mb of RAM to calculate the index of association on 10,000 SNPs, but to go to 100,000 SNPs, you would need over 37 Gb just for that one vector needed for the calculation of the index of association.

Basically, while you can use as many markers you so desire, poppr's performance degrades once you get over a few hundred. We are working on alleviating that, but we are still in the development phase.

Another thing to consider is that calculation of the index of association is very sensitive to linkage, so you want to make sure that your SNPs are not physically close together, or you might get spurious results.

Hope that helps,

Zhian

roguestorm78

unread,

Nov 13, 2014, 4:13:07 PM11/13/14

to po...@googlegroups.com

Hi Zhian,

That's interesting, thanks for pointing that out. I'm working on nearly 200k SNPs! So probably will be a problem with RAM then. I can stick this analysis on a high performance computer though, hopefully that will alleviate this.

Thank you so much for your help!

Jo

Zhian Kamvar

unread,

Nov 13, 2014, 5:12:10 PM11/13/14

to po...@googlegroups.com

Hi Jo,

Forgive me if this is a bit assuming, but I would guess that you will find physical linkage in 200k SNPs. While you can calculate the index of association on 200k SNPs, I don't know if it should be done considering that some pairs or groups of these loci will not be independent. A better approach might be to do a sliding window or doing this on small subsets of loci to get a distribution of the index of association. It's important to consider that this was an approach that was developed for a handful of loci and it's unknown how well it scales up to hundreds of thousands.

Regards,

Zhian

roguestorm78

unread,

Nov 14, 2014, 3:51:22 AM11/14/14

to po...@googlegroups.com

Hi Zhian,

No, not assuming at all, I completely agree with you! I think picking a few loci (sort of like an in silico MLST approach) would be a good thing to do here.

Thanks for your help, I appreciate it,

Jo

Brian Knaus

unread,

Nov 14, 2014, 12:34:28 PM11/14/14

to roguestorm78, po...@googlegroups.com

Hi Jo,

I was listening in on the group conversation and had an idea. I don't know what you're working with upstream of poppr, but if you're using vcf files I think you can thin them with VCFtools. I believe you could use the vcftools with the --thin option:

http://vcftools.sourceforge.net/man_latest.html#SITE%20FILTERING%20OPTIONS

to make sure you do not have markers within some integer distance of one another (e.g. 1kbp). You could then bring this thinned data set into poppr and try the IA again.

Good luck!

Brian

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/c6e21cf2-7520-46da-be3f-d5ba9abb1029%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian J. Knaus, Ph.D.

Research Plant Pathologist

USDA Agricultural Research Service

Horticultural Crops Research Unit

Corvallis, Oregon, USA

brianknaus.com
http://grunwaldlab.cgrb.oregonstate.edu/

Melanie Montes

unread,

Jan 24, 2017, 5:57:44 AM1/24/17

to poppr

I am running into the same "negative length vectors not allowed" error when trying to use the poppr() function on a dataset of 56 individuals with 55,288 SNPs.

It works on smaller subsets from the same data, so I understand that it must be a memory issue as well, but I am on a super computer, so this is a bit surprising to me. Here is the info on the R session and the available memory on the system:

> sessionInfo()

R version 3.2.3 (2015-12-10)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 16.04.1 LTS

locale:

[1] C

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] poppr_2.3.0 adegenet_2.0.1 ade4_1.7-4

loaded via a namespace (and not attached):

[1] Rcpp_0.12.8 spdep_0.6-8 plyr_1.8.4 pegas_0.9

[5] LearnBayes_2.15 tools_3.2.3 boot_1.3-17 digest_0.6.10

[9] tibble_1.2 nlme_3.1-124 gtable_0.2.0 lattice_0.20-33

[13] mgcv_1.8-11 fastmatch_1.0-4 Matrix_1.2-3 igraph_1.0.1

[17] shiny_0.14.2 DBI_0.5-1 parallel_3.2.3 coda_0.18-1

[21] cluster_2.0.3 dplyr_0.5.0 stringr_1.1.0 gtools_3.5.0

[25] grid_3.2.3 R6_2.2.0 phangorn_2.0.4 sp_1.2-3

[29] gdata_2.17.0 ggplot2_2.2.0 reshape2_1.4.2 seqinr_3.3-3

[33] deldir_0.1-12 magrittr_1.5 nnls_1.4 scales_0.4.1

[37] htmltools_0.3.5 MASS_7.3-45 splines_3.2.3 gmodels_2.16.2

[41] assertthat_0.1 permute_0.9-4 mime_0.5 ape_4.0

[45] colorspace_1.3-1 xtable_1.8-2 httpuv_1.3.3 quadprog_1.5-5

[49] stringi_1.1.2 lazyeval_0.2.0 munsell_0.4.3 vegan_2.4-1

kxb901@melanie1:~$ free -m

total used free shared buff/cache available

Mem: 483751 1222 482528 117 0 482528

Swap: 1376 654 722

All other functions, such as the index of association, I have no problem running on the same dataset.

I am able to get some of the diversity measures by instead using mlg.table and diversity_stats, but would really also like to have the eMLG and Hexp. Is there another way to get this? Or is there a way to get these stats from a snpclone object instead? When I run poppr() on a snpclone object it says "use diversity_table()", but that function doesn't seem to exist?

Thanks as always for dealing with my amateur questions!
Cheers,

Melanie

Zhian Kamvar

unread,

Jan 24, 2017, 2:16:57 PM1/24/17

to Melanie Montes, poppr

Hi Melanie,

First: the error message from poppr should read "diversity_stats()". This will be updated in a future version.

As mentioned before, the "negative length vectors not allowed" is a bit enigmatic, so it's not clear where exactly it's coming from.

To calculated eMLG, you can use the vegan function rarefy:

diversity_stats(my_data, eMLG = function(x) vegan::rarefy(t(as.matrix(x)), 10))

Regarding Hexp: I don't have an example on hand, but if you look at the equation for Hexp here: https://cran.r-project.org/web/packages/poppr/vignettes/algo.pdf, you can calculate it "by hand" by converting the genlight object to a matrix and extracting the minor allele frequencies by taking the colSums and dividing that by n*ploidy. From there, you can just plug those into the equation and you should have Hexp.

Best,

Zhian

--
You received this message because you are subscribed to the Google Groups "poppr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to poppr+un...@googlegroups.com.
To post to this group, send email to po...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/8614994c-c699-419a-9ca6-33c04173485f%40googlegroups.com.

Reply all

Reply to author

Forward