choosing K in fastSTRUCTURE (isolation by distance population model-confusing output)

Shaghayegh Soudi

unread,

Aug 14, 2017, 5:24:15 PM8/14/17

to structure-software

Hello everyone,

I have ran fastSTRUCTURE for 27 populations (315 individuals) using the command below:

for k in `seq 15`
do
python structure.py -K $k --input=input_geno --output=output_geno --format=str --seed=4321

and then In order to choose the appropriate number of model components that explain structure in the dataset I used this command:

python chooseK.py --input=output_geno

My results are confusing .....

Model complexity that maximizes marginal likelihood = 1
Model components used to explain structure in data = 10

What does that mean? Does that mean that numebr of K is between 1 and 10? Can someone help me to figure out how to choose the K!I feel quite stuck.

***and my study system is probably a general isolation by distance, could that be the reason for confusing results?
Thnaks

Vikram Chhatre

unread,

Aug 14, 2017, 5:30:47 PM8/14/17

to structure-software

Yes, the chooseK script is suggesting that your optimal K is between 1 and 10, which as you figured is not very useful. If you have subtle population structure, you might have to use logistic prior instead of simple. That analysis will take substantially longer. At that point, you might want to subset your loci set and run it with regular STRUCTURE.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.
To post to this group, send email to structure-software@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Shaghayegh Soudi

unread,

Aug 14, 2017, 6:03:49 PM8/14/17

to structure-software

Many thanks Vikram for the advice, but I have faced with something a bit strange! I have tried structure.py with a different seed (actually I tried multiple runs) and for a different seed I got different output ..

Model complexity that maximizes marginal likelihood = 1

Model components used to explain structure in data = 7

so everything is the same, it is just different set of seed. Do you have any clue on that? could it be due to something wrong?

Yifang Tan

unread,

May 13, 2020, 1:32:13 PM5/13/20

to structure-software

This is the closest post to my case.

When I ran:

$ chooseK.py --input=test_structure

I got:
Model complexity that maximizes marginal likelihood = 6

Model components used to explain structure in data = 1

which is opposite to what I could find for Model complexity that maximizes marginal likelihood is smaller than Model components used to explain structure in data. BTW, my run tested K from 1 to 10.
Is this normal? And Can I say the optimal K is still between 1 and 6?

Thanks!

Yifang

On Monday, August 14, 2017 at 3:30:47 PM UTC-6, Vikram Chhatre wrote:

Yes, the chooseK script is suggesting that your optimal K is between 1 and 10, which as you figured is not very useful. If you have subtle population structure, you might have to use logistic prior instead of simple. That analysis will take substantially longer. At that point, you might want to subset your loci set and run it with regular STRUCTURE.

On Mon, Aug 14, 2017 at 3:23 PM, Shaghayegh Soudi <shaghay...@gmail.com> wrote:

Hello everyone,

I have ran fastSTRUCTURE for 27 populations (315 individuals) using the command below:

for k in `seq 15`
do
python structure.py -K $k --input=input_geno --output=output_geno --format=str --seed=4321

and then In order to choose the appropriate number of model components that explain structure in the dataset I used this command:

python chooseK.py --input=output_geno

My results are confusing .....

Model complexity that maximizes marginal likelihood = 1
Model components used to explain structure in data = 10

What does that mean? Does that mean that numebr of K is between 1 and 10? Can someone help me to figure out how to choose the K!I feel quite stuck.

***and my study system is probably a general isolation by distance, could that be the reason for confusing results?
Thnaks

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.

Reply all

Reply to author

Forward