Model comparisons w/ different number of populations...approaches?

40 views
Skip to first unread message

Sean Canfield

unread,
Dec 19, 2021, 9:17:40 PM12/19/21
to migrate-support
Aloha Peter et al.,

A new day, a new question.

I'm running migration model comparisons, and in my initial set of runs I have 5 populations. I'd like to compare these to a model with 4 populations, where populations 2 and 3 are merged into a single putative population.

I am also performing subsampling of populations (~ 50 individuals per population, which seems to work well on the cluster).

In previous runs I've noticed that the subsampling function considers the locations defined in the infile, and not the population reassignments. In other words, if I reassign populations {1 2 3 4 5} to {1 2 2 3 4}, the new combined population 2 (pops 2+3) will contain 100 individuals: 50 from population 2, and 50 from population 3, while pops 1, 3, and 4 will each be represented by 50 individuals each.

Does this uneven sample size affect model estimates? Should I be concerned about this?

One alternative approach is to change the infile to combine populations 2 and 3 directly (rather than use the reassignment function), but I'm worried that doing so would make the models somehow less directly comparable.

I looked over Hotz et al. (2013) after sleuthing these boards; Peter, which approach did you take?

Many thanks,
Sean

Peter Beerli

unread,
Dec 20, 2021, 8:42:39 AM12/20/21
to migrate...@googlegroups.com
Sean,

For model comparison, the data must be the same for all models! So there is no good way to circumvent the issue when combining populations.

The sampling issue often comes up with methods using allele frequencies. In principle, for coalescence methods that use individuals, imbalances should not matter except that the Markov chain Monte Carlo approach will need to run much longer because one needs to guarantee that the population with the few samples also gets rearranged. Extreme imbalances, say 100 individuals in one population and 2 in the other, need careful evaluation and indeed will have different credibility intervals for the parameters but not necessarily wrong parameter estimates. 
In Hotz et al. 2013, we used all individuals at a particular location. We omitted a few locations from the analysis because they had no samples for one of both datatypes [mtdna or allozyme].

I would not worry,

Peter



--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/migrate-support/ec07f9fa-7f1e-4c19-825d-9fa15d135e28n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages