odd results - any ideas why?

29 views
Skip to first unread message

Linda Lait

unread,
Jul 8, 2025, 2:52:48 PMJul 8
to structure-software
I have been running structure on a number of RADseq datasets, and one of them keeps giving me odd results (I have run 7 others with no problems). I think it must be something to do with the samples themselves, but I am not sure what or why. For every run it pulls out one large group (around 0.8) that is found in every individual and then as you increase K it will separate the populations into their own groups. So there is underlying population structure, but something seems to be masking it.

I have included a picture of K=8, but the same pattern appears from K=2. In this run I have 7 populations, and I have tried running both with and without loc priors (this is without), I have increased the runs to 1,000,000 burn-in and 1,000,000 post burn-in (this is with 100,000 and 300,000), and I have run without any individuals with more than 10% missing data. I have also tried to remove certain populations, and it seems that 4 of the 7 populations are causing this to happen.

Any ideas on what might be causing this, or what I can do to fix it (if it is fixable)?

Thanks!
Linda

K=8.png

Mattia De Vivo

unread,
Jul 9, 2025, 7:18:38 AMJul 9
to structure-software
Hi Linda,
I am not sure if that helps, but I was having similar issues with the ipyrad version of STRUCTURE (see attached file, with K=2) and I had to use ADMIXTURE for checking population structure (which correcttly almost perfectly split each individual in 2 clusters).

I think this pattern might be caused by one of the three potential reasons listed here, but I am not sure if those apply to you:

1) K=1, which, to the best of my knowledge, STRUCTURE has issue to detect (but it seems that you exclude this option);
2) Amount of missing data; according to the ipyrad's creator, it seems that STRUCTURE is pretty sensitive to those (but here maybe we can get more info from the developers of the software. Also, it seems to me that 10% missing data should be good enough for avoiding issues);
3) Linkage disequilibrium (although I suppose we can exclude it in your case, since you use RADseq and LD should have less effect).

Let's see if other people encountered similar stuff and if they have other potential explaination. I think maybe it is also an algorithm thing, since ADMIXTURE worked fine for me

Best wishes,
Mattia
structure_recheck.pdf

Linda Lait

unread,
Jul 9, 2025, 8:51:36 AMJul 9
to structure-software
Hi Mattia,

Thanks for your reply! I'm not sure if these are the cause, although the true K=1 option may be correct.

I did try running the data through admixture, but it did not separate anything (just a random mixture of samples). That may go to support the idea of K=1, it is just that past that point you can see perfect separation into the populations (both with and without locpriors). So it might be that the structure is too subtle even with locpriors.

I would love to hear if anyone else has experienced this, or has any other suggestions.

Cheers,
Linda

Vikram Chhatre

unread,
Jul 9, 2025, 8:53:02 AMJul 9
to structure...@googlegroups.com
What is your deltaK suggesting? From Admixture results, which K had the lowest cross-validation error? 



--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/structure-software/a23e4825-71c7-4207-8f72-26d431e36855n%40googlegroups.com.

Mattia De Vivo

unread,
Jul 9, 2025, 9:05:43 AMJul 9
to structure...@googlegroups.com
Hi Linda,

as Vikram suggested, with ADMIXTURE you can cross-validate the results and see which K is more likely. From the manual:

"A good value of K will exhibit a lowcross-validation error compared to other K values. Cross-validation is enabled by simply adding the --cv flag to the ADMIXTURE command line. In this default setting, the cross-validation procedure will perform 5-fold CV—you can get 10-fold CV, for example, using --cv=10. The cross-validation error is reported in the output. For example, if in our bash shell we ran:

for K in 1 2 3 4 5; \
do admixture --cv hapmap3.bed $K | tee log${K}.out; done

(i.e., ran ADMIXTURE with cross-validation for K values 1,2,3,4 and 5), then we could quickly view the CV errors:

grep -h CV log*.out
"

I hope this helps.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/structure-software/a23e4825-71c7-4207-8f72-26d431e36855n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages