fastSTRUCTURE vs STRUCTURE

Bárbara Müller

unread,

Apr 3, 2018, 2:23:45 PM4/3/18

to structure-software

Dear all,

I ran the fastSTRUCTURE and STRUCTURE with the same dataset for comparison and I got weird results with fastSTRUCTURE.

I ran the fastSTRUCTURE with the following script:

SEED=$RANDOM

echo "Using seed '${SEED}'"

for i in {1..10}

do

python structure.py \

-K $i \

--input=INPUT_FILE \

--output= OUTPUT_FILE_simple_K$i \

--tol=10e-6 \

--prior=simple \

--cv=10 \

--format=bed \

--full \

--seed=${SEED}

done

And I ran STRUCTURE with the following summary:

The admixture model was applied, with correlated allelic frequencies, using no previous population information. The number of tested clusters (K) ranged from 1 to 10, with 10 replications per K. The burn-in period and the number of MCMC iterations were 250,000 and 750,000, respectively.

For the STRUCTURE results the number of genetic group was K=2, as determined based on the criteria proposed by Evanno et al. (2005).

Also I ran a PCA for the same dataset and the first two principal components explained 22% and 15.5% of the genetic variance, in concordance with K=2.

The genetic divergence between these two subpopulations was high (FST=0.45).

However, the fastSTRUCTURE results shows K=5 for the same dataset.

Model complexity that maximizes marginal likelihood = 5

Model components used to explain structure in data = 5

Could you please tell me why I am getting so different results between fastSTRUCTURE and STRUCTURE software?

K=2 explain better the population and make total sense with the other results (PCA and FST).

Also, I am running the fastSTRUCTURE with logistic prior to see if this results can change. But this analysis is taking long time, as the standard STRUCTURE.

I have a large dataset (more than 1 million SNPs) that I am planning to run only with fastSTRUCTURE. However, first I want to know if the fastSTRUCTURE works well and if I am running it properly.

I really appreciate your help!

Barbara.

f.pina...@gmail.com

unread,

Apr 20, 2018, 6:11:30 AM4/20/18

to structure-software

I kind of gave up on fastStructure after I was obtaining **very** different results on different runs with the same datasets (with at least 6 different datasets).
Of course, this may be a peculiarity of my data and might turn out fine for your case.
I would recommend STRUCTURE, or MavericK for running your analyses.
If you need speed (and you should, since you mention 10⁶ SNPs) I recommend using a threading wrapper program, such as Structure_threader, strauto or ParallelStructure.
I am biased towards Structure_threader, but I am it's developer.

Francisco

Bárbara Müller

unread,

Apr 25, 2018, 11:52:17 AM4/25/18

to structure-software

Dear Francisco,

Thank you for your answer, is very helpful.

I will try one of threading wrapper program.

Also, thank you for developed the Structure_threader.

Best regards,

Barbara.

f.pina...@gmail.com

unread,

Apr 26, 2018, 10:14:33 AM4/26/18

to structure-software

Happy to help.
If you run into any issues, feel free to post your problem here, or as a github issue.

Cheers,

Francisco

Reply all

Reply to author

Forward