How to determine the best K based on the fastStructure method?

1,268 views
Skip to first unread message

Lin Shawn

unread,
Sep 8, 2016, 10:14:27 AM9/8/16
to structure-software
Hi,

I have a question about how to determine the best K from the fastStructure results. After running fastStructure, I've got the result:  Model complexity that maximizes marginal likelihood = 1, Model components used to explain structure in data =4. Does this mean the best is 4? or the best k could be 1 or 2 or 3 or 4?

Attach is the graph of the fastStructure result. Thank you very much.


Best 
Shawn

fastStructure.png

Vikram Chhatre

unread,
Sep 8, 2016, 10:20:44 AM9/8/16
to structure-software
That means the optimal K is somewhere between 1 and 4.  How many K did you try so far?  Are you using the same coloring scheme in each K plot? K=3 is quite different from K=4.  Make sure to assign the same color to each cluster in successive Ks. 

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.
To post to this group, send email to structure-software@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

Lin Shawn

unread,
Sep 8, 2016, 12:31:16 PM9/8/16
to structure-software
Hi Vikram,

Thank you so much for the reply. 

When you said the optimal K is somewhere between 1 and 4, does it mean the optimal K could be 2 or 3? 

I also realized that K=3 is quite different from K=4. How do I check the same coloring scheme is used in each K plot in R?

So far, I have tried K =6 since I knew that K should be less than 6. All my data is collected from two populations and hybrids. 

Thanks a lot


On Thursday, September 8, 2016 at 7:20:44 AM UTC-7, Vikram Chhatre wrote:
That means the optimal K is somewhere between 1 and 4.  How many K did you try so far?  Are you using the same coloring scheme in each K plot? K=3 is quite different from K=4.  Make sure to assign the same color to each cluster in successive Ks. 
On Wed, Sep 7, 2016 at 10:54 PM, Lin Shawn <xianglin...@gmail.com> wrote:
Hi,

I have a question about how to determine the best K from the fastStructure results. After running fastStructure, I've got the result:  Model complexity that maximizes marginal likelihood = 1, Model components used to explain structure in data =4. Does this mean the best is 4? or the best k could be 1 or 2 or 3 or 4?

Attach is the graph of the fastStructure result. Thank you very much.


Best 
Shawn

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.

Vikram Chhatre

unread,
Sep 8, 2016, 12:36:07 PM9/8/16
to structure-software
Position of the clusters can shift in the meanQ data frame between successive Ks.  You will need to go inside distruct.py and manually change the color order according to these differences.

I have a modified distruct script where its easier to do and you can assign custom colors as well.  See http://distruct2.popgen.org for details.

V

On Thu, Sep 8, 2016 at 10:31 AM, Lin Shawn <xianglin...@gmail.com> wrote:
Hi Vikram,

Thank you so much for the reply. 

When you said the optimal K is somewhere between 1 and 4, does it mean the optimal K could be 2 or 3? 

I also realized that K=3 is quite different from K=4. How do I check the same coloring scheme is used in each K plot in R?

So far, I have tried K =6 since I knew that K should be less than 6. All my data is collected from two populations and hybrids. 

Thanks a lot

On Thursday, September 8, 2016 at 7:20:44 AM UTC-7, Vikram Chhatre wrote:
That means the optimal K is somewhere between 1 and 4.  How many K did you try so far?  Are you using the same coloring scheme in each K plot? K=3 is quite different from K=4.  Make sure to assign the same color to each cluster in successive Ks. 
On Wed, Sep 7, 2016 at 10:54 PM, Lin Shawn <xianglin...@gmail.com> wrote:
Hi,

I have a question about how to determine the best K from the fastStructure results. After running fastStructure, I've got the result:  Model complexity that maximizes marginal likelihood = 1, Model components used to explain structure in data =4. Does this mean the best is 4? or the best k could be 1 or 2 or 3 or 4?

Attach is the graph of the fastStructure result. Thank you very much.


Best 
Shawn

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsubscribe@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.
To post to this group, send email to structure-software@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages