Dear all,
I ran the fastSTRUCTURE and STRUCTURE with the same dataset for comparison and I got weird results with fastSTRUCTURE.
I ran the fastSTRUCTURE with the following script:
SEED=$RANDOM
echo "Using seed '${SEED}'"
for i in {1..10}
do
python structure.py \
-K $i \
--input=INPUT_FILE \
--output= OUTPUT_FILE_simple_K$i \
--tol=10e-6 \
--prior=simple \
--cv=10 \
--format=bed \
--full \
--seed=${SEED}
done
And I ran STRUCTURE with the following summary:
The admixture model was applied, with correlated allelic frequencies, using no previous population information. The number of tested clusters (K) ranged from 1 to 10, with 10 replications per K. The burn-in period and the number of MCMC iterations were 250,000 and 750,000, respectively.
For the STRUCTURE results the number of genetic group was K=2, as determined based on the criteria proposed by Evanno et al. (2005).
Also I ran a PCA for the same dataset and the first two principal components explained 22% and 15.5% of the genetic variance, in concordance with K=2.
The genetic divergence between these two subpopulations was high (FST=0.45).
However, the fastSTRUCTURE results shows K=5 for the same dataset.
Model complexity that maximizes marginal likelihood = 5
Model components used to explain structure in data = 5
Could you please tell me why I am getting so different results between fastSTRUCTURE and STRUCTURE software?
K=2 explain better the population and make total sense with the other results (PCA and FST).
Also, I am running the fastSTRUCTURE with logistic prior to see if this results can change. But this analysis is taking long time, as the standard STRUCTURE.
I have a large dataset (more than 1 million SNPs) that I am planning to run only with fastSTRUCTURE. However, first I want to know if the fastSTRUCTURE works well and if I am running it properly.
I really appreciate your help!
Barbara.