Large number of genomic markers are now becoming available in most
study systems. Many of you have been doing Structure analysis using
such genome wide data. A question invariably comes up on this list
about linkage disequilibrium between markers and how that affects the
population structure inference.
On more than one occasion, I have suggested estimating the rate of
decay of linkage disequilibrium and depending upon outcome, thinning
out markers that may be tightly linked from further analysis. As I
just found out from a discussion with Dr. Pritchard, Structure
software can handle modest amounts of LD between markers within a
genomic region *if* a large number of genomic regions (which are not
in LD with each other) are also part of the data set.
In other words, if your markers are distributed throughout the genome,
the LD between markers in a particular region is not worrisome and
thus there is no need to thin out markers, except to decrease
computational time.
On the other hand, if most of your markers belong to a single genomic
region, this presents a contrasting situation and you will need to
thin out markers.
I apologize for any confusion my previous posts may have caused.
Please see below for an excerpt from a discussion with Dr. Pritchard.
---------------------------------------
I believe that Structure is robust to modest amounts of LD. To be
more precise, I think that if you have many markers spread across the
genome, or at least in a large number of genomic regions, then local
LD within regions does not seem to cause serious problems. This is
in sharp contrast to a situation where all the markers are in a single
genomic region, in which case the inferred structure may simply
reflect the topology of the local coalescent tree.
Here is an example where we looked at this using ~2500 snps from ~32
genomic regions:
http://www.nature.com/ng/journal/v38/n11/extref/ng1911-S7.pdf (see p6)
Another side note on LD is that if there is a modest amount of LD in
the data then Structure may overestimate its confidence in individual
assignments (ie the credible regions may be slightly more narrow than
they should be). But I don't think most people actually look closely
at the credible regions, so that's not much of a concern in practice.
---------------------------------------
Thanks and feel free to add to this discussion with comments,
questions or to share your experiences with a particular dataset.
Vikram
--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.
To post to this group, send email to structure-software@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.