fastSTRUCTURE vs STRUCTURE

1,932 views
Skip to first unread message

Bárbara Müller

unread,
Apr 3, 2018, 2:23:45 PM4/3/18
to structure-software
Dear all, 

I ran the fastSTRUCTURE and STRUCTURE with the same dataset for comparison and I got weird results with fastSTRUCTURE.

I ran the fastSTRUCTURE with the following script:

SEED=$RANDOM
echo "Using seed '${SEED}'"

for i in {1..10}
 do
 python structure.py \
-K $i \
     --input=INPUT_FILE \
     --output= OUTPUT_FILE_simple_K$i \
     --tol=10e-6 \
     --prior=simple \
     --cv=10 \
     --format=bed \
     --full \
     --seed=${SEED}  
 done  

And I ran STRUCTURE with the following summary: 

The admixture model was applied, with correlated allelic frequencies, using no previous population information. The number of tested clusters (K) ranged from 1 to 10, with 10 replications per K. The burn-in period and the number of MCMC iterations were 250,000 and 750,000, respectively.

For the STRUCTURE results the number of genetic group was K=2, as determined based on the criteria proposed by Evanno et al. (2005).
Also I ran a PCA for the same dataset and the first two principal components explained 22% and 15.5% of the genetic variance, in concordance with K=2.
The genetic divergence between these two subpopulations was high (FST=0.45).

However, the fastSTRUCTURE results shows K=5 for the same dataset.
Model complexity that maximizes marginal likelihood = 5
Model components used to explain structure in data = 5

Could you please tell me why I am getting so different results between fastSTRUCTURE and STRUCTURE software?

K=2 explain better the population and make total sense with the other results (PCA and FST).

Also, I am running the fastSTRUCTURE with logistic prior to see if this results can change. But this analysis is taking long time, as the standard STRUCTURE.

I have a large dataset (more than 1 million SNPs) that I am planning to run only with fastSTRUCTURE. However, first I want to know if the fastSTRUCTURE works well and if I am running it properly.

I really appreciate your help!
Barbara.

f.pina...@gmail.com

unread,
Apr 20, 2018, 6:11:30 AM4/20/18
to structure-software
I kind of gave up on fastStructure after I was obtaining **very** different results on different runs with the same datasets (with at least 6 different datasets).
Of course, this may be a peculiarity of my data and might turn out fine for your case.
I would recommend STRUCTURE, or MavericK for running your analyses.
If you need speed (and you should, since you mention 10⁶ SNPs) I recommend using a threading wrapper program, such as Structure_threader, strauto or ParallelStructure.
I am biased towards Structure_threader, but I am it's developer.

Francisco

Bárbara Müller

unread,
Apr 25, 2018, 11:52:17 AM4/25/18
to structure-software
Dear Francisco,

Thank you for your answer, is very helpful.
I will try one of threading wrapper program. 
Also, thank you for developed the Structure_threader.

Best regards, 

Barbara.

f.pina...@gmail.com

unread,
Apr 26, 2018, 10:14:33 AM4/26/18
to structure-software
Happy to help.
If you run into any issues, feel free to post your problem here, or as a github issue.

Cheers,

Francisco
Reply all
Reply to author
Forward
0 new messages