Identical values in output of poppr

10 views
Skip to first unread message

Vojtěch Zeisek

unread,
Apr 25, 2025, 3:57:12 AMApr 25
to poppr
Hello,
I have input VCF processed by GATK (and loaded via vcfR) looking like

allinds.vcf
***** Object of Class vcfR *****
2910 samples
1166 CHROMs
7,433 variants
Object size: 416.5 Mb
17.39 percent missing data
***** ***** *****

I converted it into genind and removed missing data:

allinds.genind <- vcfR2genind(x=allinds.vcf, ploidy=2, type="codom")

allinds.genind
/// GENIND OBJECT /////////

// 2,910 individuals; 7,433 loci; 14,810 alleles; size: 169.4 Mb

// Basic content
@tab: 2910 x 14810 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 1-2)
@loc.fac: locus factor for the 14810 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: adegenet::df2genind(X = t(x), sep = sep, ploidy = 2, type = "codom")

// Optional content
@pop: population of each individual (group size range: 12-20)

allinds.genind.cor <- missingno(pop=allinds.genind, type="loci", cutoff=0.05,
quiet=FALSE)

allinds.genind.cor
/// GENIND OBJECT /////////

// 2,910 individuals; 2,330 loci; 4,621 alleles; size: 53 Mb

// Basic content
@tab: 2910 x 4621 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 1-2)
@loc.fac: locus factor for the 4621 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: .local(x = x, i = i, j = j, drop = drop)

// Optional content
@pop: population of each individual (group size range: 12-20)

I then run poppr like

allinds.popst <- poppr(dat=allinds.genind.cor, total=FALSE, sample=1000,
method=4, missing="geno", cutoff=0.1, quiet=FALSE, clonecorrect=FALSE,
plot=TRUE, index="rbarD", minsamp=1, legend=TRUE)

Regarding this large and diverse dataset - it covers 148 populations across
several European countries - it's interesting to see that most of the
populations have value of H *exactly* 2.99573227355399 - multiple populations
across countries. Similarly, for most of the populations is lambda 0.95, E.5 1
and Hexp 0. Finally, p.Ia and p.rD is mostly 0.000999000999000999. Si finally
the only really variable number is Ia. Not surprisingly, N is equal to MLG.

We really did not expect we'd get so identical numbers. I wonder what could be
the reason. Is it some numerical issue regarding how the indices are
calculated, and if we can rely on the results.

Would anyone have any idea what's going on and how to check that the results
are correct?

My system is:

sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-suse-linux-gnu
Running under: openSUSE Tumbleweed

Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so; LAPACK version 3.12.1

locale:
[1] LC_CTYPE=cs_CZ.UTF-8 LC_NUMERIC=C
[3] LC_TIME=cs_CZ.UTF-8 LC_COLLATE=cs_CZ.UTF-8
[5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=cs_CZ.UTF-8
[7] LC_PAPER=cs_CZ.UTF-8 LC_NAME=cs_CZ.UTF-8
[9] LC_ADDRESS=cs_CZ.UTF-8 LC_TELEPHONE=cs_CZ.UTF-8
[11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=cs_CZ.UTF-8

time zone: Europe/Prague
tzcode source: system (glibc)

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] hierfstat_0.5-11 pegas_1.3 poppr_2.9.6 adegenet_2.1.11
[5] ade4_1.7-23 vcfR_1.15.0 ape_5.8-1 rkward_0.8.1

loaded via a namespace (and not attached):
[1] generics_0.1.3 stringi_1.8.7 lattice_0.22-7 digest_0.6.37
[5] magrittr_2.0.3 grid_4.5.0 fastmap_1.2.0 seqinr_4.2-36
[9] plyr_1.8.9 Matrix_1.7-3 promises_1.3.2 mgcv_1.9-3
[13] viridisLite_0.4.2 scales_1.3.0 permute_0.9-7 cli_3.6.4
[17] shiny_1.10.0 rlang_1.1.6 munsell_0.5.1 splines_4.5.0
[21] vegan_2.6-10 tools_4.5.0 reshape2_1.4.4 pinfsc50_1.3.0
[25] dplyr_1.1.4 colorspace_2.1-1 ggplot2_3.5.2 httpuv_1.6.16
[29] boot_1.3-31 vctrs_0.6.5 R6_2.6.1 mime_0.13
[33] lifecycle_1.0.4 stringr_1.5.1 MASS_7.3-65 cluster_2.1.8.1
[37] pkgconfig_2.0.3 pillar_1.10.2 later_1.4.2 gtable_0.3.6
[41] glue_1.8.0 Rcpp_1.0.14 tibble_3.2.1 tidyselect_1.2.1
[45] xtable_1.8-4 htmltools_0.5.8.1 nlme_3.1-168 igraph_2.1.4
[49] compiler_4.5.0 polysat_1.7-7

Sincerely,
Vojtěch

--
Vojtěch Zeisek
https://trapa.cz/en/

Department of Botany, Faculty of Science
Charles University, Prague, Czech Republic
https://botany.natur.cuni.cz/

Institute of Botany, Czech Academy of Sciences
Průhonice, Czech Republic
https://www.ibot.cas.cz/en/
Computing cluster
https://sorbus.ibot.cas.cz/en/start
signature.asc
Reply all
Reply to author
Forward
0 new messages