FastStructure output files - identity of data by column???

1,210 views
Skip to first unread message

JJung

unread,
Apr 10, 2016, 9:51:27 PM4/10/16
to structure-software
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


Vikram Chhatre

unread,
Apr 10, 2016, 10:34:46 PM4/10/16
to structure-software

Take a look at github.com/cryptic0/distruct2

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

JJung

unread,
Apr 10, 2016, 11:55:59 PM4/10/16
to structure-software
Well, ok.  Thanks, I think.  

I had a quick look at distruct2.  I don't entirely understand how this is helpful.  I'd have to make two more file sets with pop IDs and pop ID designations using some ML cutoff which I have yet to determine.  And then make changes to the python code to specify the colors I want?  

If you could please address my original question, this would be most helpful: 

What is the data output order, by column for all FastStructure output files?

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


On Sunday, April 10, 2016 at 10:34:46 PM UTC-4, Vikram Chhatre wrote:
On Apr 10, 2016 9:51 PM, "JJung" <jk...@cornell.edu> wrote:
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


-- 
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

On Sunday, April 10, 2016 at 10:34:46 PM UTC-4, Vikram Chhatre wrote:
On Apr 10, 2016 9:51 PM, "JJung" <jk...@cornell.edu> wrote:
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

Vikram Chhatre

unread,
Apr 11, 2016, 1:02:43 PM4/11/16
to structure-software

In the fastStructure meanQ output, the columns do not retain the absolute position among the various K levels tested.  Thus a cluster that appeared in column 2 in K=3 may not be in column 2 in K=4 and onwards.

If you'd like to perform cluster matching before plotting, CLUMPP will do that for you.  Though you will have to manually format the meanQ output to match the input for CLUMPP.  This can be done easily with a good text editor.

The distruct2 script will allow you to arrange the pops arbitrarily.

V



On Apr 10, 2016 11:56 PM, "JJung" <jk...@cornell.edu> wrote:
Well, ok.  Thanks, I think.  

I had a quick look at distruct2.  I don't entirely understand how this is helpful.  I'd have to make two more file sets with pop IDs and pop ID designations using some ML cutoff which I have yet to determine.  And then make changes to the python code to specify the colors I want?  

If you could please address my original question, this would be most helpful: 

What is the data output order, by column for all FastStructure output files?

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


On Sunday, April 10, 2016 at 10:34:46 PM UTC-4, Vikram Chhatre wrote:
On Apr 10, 2016 9:51 PM, "JJung" <jk...@cornell.edu> wrote:
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


-- 
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

On Sunday, April 10, 2016 at 10:34:46 PM UTC-4, Vikram Chhatre wrote:
On Apr 10, 2016 9:51 PM, "JJung" <jk...@cornell.edu> wrote:
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

JJung

unread,
Apr 11, 2016, 2:48:43 PM4/11/16
to structure-software
Thanks, Vikram.  I see....

Yes, I'd like to do cluster matching before plotting, so I can track how Q values shift as different populations cluster out, as well as be able to visualize this in a series of Distruct-like graphical genotype images, with individuals in a specific order and designated colors for each pop.  And I'd also like to calculate best K, through max marginal likelihood, delta K, etc.

Any thoughts on running CLUMPP vs. CLUMPAK?  

I'm thinking to try CLUMPAK as it seems it'll do everything I need as well as accept the simple Q matrix type FastStructure output, and the same optional input files I'm prepping for Distruct.   With so many files to run I'm a bit overwhelmed and am trying to cut down on as much reformatting as possible.

Thanks,
Janelle


To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

On Sunday, April 10, 2016 at 10:34:46 PM UTC-4, Vikram Chhatre wrote:
On Apr 10, 2016 9:51 PM, "JJung" <jk...@cornell.edu> wrote:
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

Vikram Chhatre

unread,
Apr 11, 2016, 4:12:59 PM4/11/16
to structure-software
I may have misspoken a little - apologies.

Cluster matching using CLUMPP for example, is performed on the different iterations of STRUCTURE results for the same K.  For instance, if you tested for K1 through K8, with 10 iterations per K, and found K=4 to be the optimal number.  Then you can take structure output for the 10 iterations at that K and run them through CLUMPP to match the clusters, before plotting.

In case of FastStructure, the end user does not set the number of iterations to be performed for each K.  If you grep all the log files, it will tell you how many iterations were performed at a given K.  

The intended pipeline is:

1. Run Faststructure with simple prior, and then with logistic prior if desired 
2. Use chooseK.py to infer optimal K
3. Prepare plots for each of the few K around your optimal K

V

To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

On Sunday, April 10, 2016 at 10:34:46 PM UTC-4, Vikram Chhatre wrote:
On Apr 10, 2016 9:51 PM, "JJung" <jk...@cornell.edu> wrote:
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

JJung

unread,
Apr 11, 2016, 4:56:59 PM4/11/16
to structure-software
Right, ok.  I was wondering, since I thought FastStructure already ran multiple iterations....

I assume it's because FastStructure doesn't run iteratively from each K values that the output files have no relational structure to each other.  

Aside from having reference type individuals that you know *should* belong to a certain subpop, is there a way to reorder the Q matrix so that membership coefficients can be comparable from being run at different K?  Is such comparison even possible/advisable?

As for visualization, I'll play with Distruct.

Thanks,
Janelle
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

On Sunday, April 10, 2016 at 10:34:46 PM UTC-4, Vikram Chhatre wrote:
On Apr 10, 2016 9:51 PM, "JJung" <jk...@cornell.edu> wrote:
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

Vikram Chhatre

unread,
Apr 11, 2016, 5:04:16 PM4/11/16
to structure-software
I am not sure if I understand your question completely.  But the biology of species and the subpopulations that you collected, should provide clues in terms of tracking a given dominant genetic cluster among results from various Ks.

V


Thanks,
Janelle
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

On Sunday, April 10, 2016 at 10:34:46 PM UTC-4, Vikram Chhatre wrote:
On Apr 10, 2016 9:51 PM, "JJung" <jk...@cornell.edu> wrote:
Hi,

I've been running FastStructure (FS) on K1-K10 for multiple datasets.

I've tried to use the python distruct version included with FastStructure but was confused with the results because though the order of indiv. in the output image should match that in the FS input .fam files, the pop colors/ID and order seemed to vary wildly.

So, I've switched to using the standalone version of distruct1.1 so I can change colors associated with populations, as well as indiv ID order, sorting based on .meanQ files.  (Creating input .popq files for so many is a whole other headache I won't get into here, probably because I don't know how to use R well enough yet.)
 
My question is:  What is the data output order, by column for all FS output files?

There is no column heading in any of the output files, nor info in the manual, so I am clueless.

Logic says columns should be in increasing order of K with successive columns for each additional K, such that for a K.6.meanQ file, Col1=K1meanQvalues....Col6=K6meanQvalues -- is this correct?  

Thanks,
Janelle


--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages