Effects of different population sample sizes or uneven sampling distribution

HallvardH

unread,

Oct 13, 2009, 8:39:06 AM10/13/09

to structure-software

Hi

Anyone who has assessed the effect on the STRUCTURE output (uniform
priors), of

1) large different sizes (nos individuals) among the identified
clusters

or

2) uneven sampling distribution; ie geographically clustered samples
compared to a uniform geographic sampling distribution,

assuming in both cases that there is some genetic structure among the
sampled individuals.

I would be very glad to know your findings or get a hint to any
publications adressing this question!!

Kind regards
Hallvard

Steve82

unread,

Nov 12, 2009, 8:45:51 AM11/12/09

to structure-software

Hi,

I'm sorry I don't have an answer for you. I'm trying to get the answer
to the same question and was wondering if you had found an answer yet?

Thanks

Steve

dani.elf

unread,

Nov 14, 2009, 9:25:17 AM11/14/09

to structure-software

Hi,
Yes this issue has come up in a number of contexts over the years.
STRUCTURE is definitely more likely to identify subdivisions where
there are a large number of individuals in each group than ones that
only relate to a small number of individuals. The more informative the
data the less important this effect is; the algorithm should find very
small populations if they are sufficiently differentiated (which can
often be not what the user wants if these populations consist of close
relatives).

A more controversial question is whether discrete sampling will tend
to create discrete STRUCTURE clusters. I dont think it is true for
example that the discrete continental clusters in the Human Genome
Diversity Panel are caused by the sampling strategy; there seem to be
definite differences between populations that are continental and that
are stronger than pure distance effects. I think the paper that shows
this fairly convincingly is one of a large number by rosenberg. THis
is rebutting a paper by Serre and Paabo. Balloux also has a strong
opinion on this kind of issue.

What I cant tell you is whether there has been a formal study of the
effect of sample size.

There are also people who have argued for spatially explicit inference
models and they discuss these issues. However I dont myself think they
made qualitative progress on the issue.

On Oct 13, 12:39 pm, HallvardH <hhaan...@gmail.com> wrote:

Lenstra, J.A. (Hans)

unread,

Nov 14, 2009, 10:04:16 AM11/14/09

to structure...@googlegroups.com

We did a lot of Structure runs on livestock breeds. Yes, Structure clusters often represent overrepresented and/or inbred breeds. (The effect of inbreeding was already shown by the Kalash cluster in the 2002 Rosenberg paper.). Isn't this justified by the original purpose of Structure, finding stratification that may confound association studies? We would advise (1) to balance the number of individuals per breed and (2) interprete the clustering on the basis of historic and demographic info (for example, the clusters of neither the Pakistan Kalash tribe or the sheep of the isle of Soay represent ancestral components).

Hans Lenstra, Utrecht University, Netherlands, J.A.L...@uu.nl

From: structure...@googlegroups.com on behalf of dani.elf
Sent: Sat 14-11-2009 15:25
To: structure-software
Subject: [structure-group] Re: Effects of different population sample sizes or uneven sampling distribution

Reply all

Reply to author

Forward