What is Q

1,167 views
Skip to first unread message

Bea Clack

unread,
Dec 6, 2013, 6:36:52 PM12/6/13
to structure...@googlegroups.com
What is Q when the Structure program in the Bar plot option asks you to sort by q? How is it obtained and what is it a measure of?
Thanks

Vikram Chhatre

unread,
Dec 6, 2013, 6:44:17 PM12/6/13
to structure...@googlegroups.com
Bea -

Q is the cluster membership coefficient. You can either sort the plot by Q or by POPDATA. 

V


On Friday, December 6, 2013, Bea Clack wrote:
What is Q when the Structure program in the Bar plot option asks you to sort by q? How is it obtained and what is it a measure of?
Thanks

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.
Visit this group at http://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/groups/opt_out.

Bea Clack

unread,
Dec 7, 2013, 1:33:36 PM12/7/13
to structure...@googlegroups.com
Ok, but what is the cluster membership coefficient? Is it a percent of something or frequency, would you describe in a little bit more of layman's terms? Which is the best way to sort the data for plotting a bar plot?

Vikram Chhatre

unread,
Dec 7, 2013, 2:04:04 PM12/7/13
to structure-software
Hi Bea -

STRUCTURE probabilistically assigns individuals to one or more clusters (read: groupings).  It uses various assumptions and models (that a user chooses) to find probabilistic membership of a given individual in one or more clusters under testing.  Let us look at a simple example:

Highly simplified data set
---------------------------------
10 individuals (1-4 in first population, 5-10 in the second population based on your assumption)
1 locus
Assume admixture
Assume HWE & LE
POPDATA=1

Results:
Let's assume that based on lnPD (log probability of data) and Evanno's Delta K methods, your optimal number of clusters was 2 (K=2).

Simplified and hypothetical results for K=2 
(you could test any number between 1 and n)
--------------------------------
Individual#     Q1       Q2
1                    0.8        0.2
2                    0.75      0.25
3                    0.82      0.18
4                    0.7        0.3
5                    0.2        0.8
6                    0.3        0.7
7                    0.35      0.65
8                    0.1        0.9
9                    0.15      0.85
10                  0.25      0.75
--------------------------------

As you can see, each of your individuals have two Q values: Q1 is it's probabilistic membership in the genetic cluster#1 and Q2 in cluster#2.  All Q values for a given individual always add up to 1.

These results largely confirm your assumptions that genetically, your individuals are split along two clusters/populations.  The membership of individual also falls well within your expectations based on starting data.

Does this make sense?  

While this strictly isn't a forum for lay people, I appreciate the importance of explaining complex scientific concepts to the society at large, so I will take a shot.  Someone correct me if I get this partly/fully wrong.

Layman Explanation:
----------------------------------
Q represents probability of an individual belonging, partially or fully to one or more populations under investigation.  When two population hypothesis is under investigation, the Q1/Q2 values of 0.15 and 0.85 respectively suggest that the given individual draws most of it's genetic ancestry from population#2.

I hope things are much clear to you after this.  In addition, reading of the user manual, and the various publications (the original Pritchard paper (2000) and it's sequels (2003, 2007) and Evanno et al (2005)) is *indipensable*.

All the best
Vikram



--
Reply all
Reply to author
Forward
0 new messages