Testing migration models with an included "ghost" populations

186 views
Skip to first unread message

wapello65

unread,
May 20, 2014, 11:23:48 AM5/20/14
to migrate...@googlegroups.com
Dear Migrate Supporters,

I have a dataset of 10 marine plant populations analyzed with a set of 8 microsatellite markers. A STRUCTURE analysis shows the strongest support for two population groups. However, while many of the populations fall cleanly into one group, the others are an admixture of the two, i.e. none of the remaining populations are made up entirely, or even primarily, of group two genotypes. This  indicates to me that a second "source" population is contributing alleles to these populations but was not sampled itself. Furthermore, the admixed populations occur at the northern and southern boundaries of the sampled area, near passes to the open sea. The admixed populations are also closely related to one another (Fst values) even though they are the most geographically distant. I suspect a second "source" population feeds both the southern and northern boundary populations.

I would like to create a "ghost" population (one without data) to test this hypothesis. I read Peter's 2004 paper regarding the effect of "ghost" populations on migration estimates but it's still unclear to me how set up Migrate to include one. I also think I'm asking something different from what the paper did. I would like to test various migration models, with and without the ghost population, to see which one has the strongest support. I have experience using the thermodynamic method to test migration models.

Any suggestion as to how I could do this would be most welcome.

kind regards,

Patrick

Eric

unread,
May 21, 2014, 1:19:07 PM5/21/14
to migrate...@googlegroups.com
Hi Patrick,

It's fairly easy to insert a ghost population. You simply add a line to the datafile with the sample size (0) and any name (e.g. "ghostpop"), and add 1 to the number of pops at the top of the file. Nothing comes after that line, because you don't have any data for it. 

After that you need to treat this population as if it had data when you set up the parmfile (e.g. you need to make an n+1 by n+1 migration matrix, where n is the number of populations that you have data for).

I think it might be interesting to test if Migrate finds support for a model with a ghost population vs. one without. However, because it is an equilibrium model, I am not sure it can assign genotypes to an unsampled source any more than structure can. 

Cheers,

Eric

Peter Beerli

unread,
May 21, 2014, 2:14:34 PM5/21/14
to migrate...@googlegroups.com
Patrick,

follow Eric’s instructions :-)
concerning similarities to structure: structure does not impose a migration model, whereas migrate does, e.g. unidrectional models etc.
I guess support may be difficult because many more parameters and no more data should make it difficult.

Peter 

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

wapello65

unread,
May 21, 2014, 3:16:25 PM5/21/14
to migrate...@googlegroups.com
Thank you Eric and Peter,

Because they contain no data do I need to specify a starting theta and M value for the ghost population? Right now I just use the default Fst options. If so, I guess that means I have to specify starting values for all the populations. There doesn't seem to be an option to specify values for only some of them.

Thanks again,

Patrick

Peter Beerli

unread,
May 21, 2014, 3:18:40 PM5/21/14
to migrate...@googlegroups.com
I do not remember, but if you simply try the FST setting then 
it may fill in the default that will be for msat theta=1 M=1
otherwise, you will need to say something like this
theta=OWN:1
migration=OWN:1
as the shortest version, this sets all start values to the same 

Peter

Eric Crandall

unread,
May 21, 2014, 5:01:52 PM5/21/14
to migrate...@googlegroups.com
Right I forgot to mention that you need to specify all parameters that involve the ghost population as constant (c) in your migration matrix, and they will stay fixed at their starting values. I use my own starting values that are often based on a mean of what I’ve observed in non-ghost pop runs. Probably could use some sensitivity analysis if you really want to dig into it.

-Eric

You received this message because you are subscribed to a topic in the Google Groups "migrate-support" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/migrate-support/uC54-pKyLgs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to migrate-suppo...@googlegroups.com.

Peter Beerli

unread,
May 21, 2014, 5:13:35 PM5/21/14
to migrate...@googlegroups.com
Eric,

setting the interactions with the ghost population to constant is a choice not a necessity, you can set a start value and then let the prior decide,
in principle you can get estimates for the ghost population parameters, but you can certainly guess that those are not great, settings where the ghost is uniquely defined by the sampled populations will be easier than others, for example assume that we have a stepping stone model
S<->G<->S<->S
here the ghost G will be less difficult to estimate then in a scenario where all 3 S populations can connect to G,
But it certainly would be nice to hear some reports about the effect of the ghost and the model selection procedure with G versus without G.

Peter

Eric Crandall

unread,
May 21, 2014, 6:10:53 PM5/21/14
to migrate...@googlegroups.com
Thanks Peter, good to know! I thought they had to be constant.

I’ve had a question that is sort of along these lines. I have some models for which a few parameters appear to be non-identifiable (as they might be in the case of a ghost-population). That is, they are not converging, have low ESS, Gelman-Rubin diagnostic > 1.2 after very long runs. Can we still believe the thermodynamic estimates for the marginal likelihood of the overall model?

Thanks,

Eric

Peter Beerli

unread,
May 21, 2014, 10:54:44 PM5/21/14
to migrate...@googlegroups.com
Eric,

I hope to release version 4.0 this summer, although there is (1) still much to do (2) the manual needs revision, and probably the most time consuming part (3) papers and grant proposal to be written all before I go to the Molecular evolution workshop, where I intend to use the new version in the Hands-On, part (I will put that tutorial also onto the migrate website (August 7). This new version will allow to set priors for each parameter that may help with situations where some parameters are difficult to converge, although they may not cure the problem because 
(a) the migration scenario is much more complex than migrate can resolve
(b) there is not enough information for particular migration directions (I guess this is common if sampling is not extensive), but even worse is if there is a lot of migration and all sampled individuals are recent migrants, this will then lead to problems with the estimates of population size, because you need coalescences within a population to estimate that with certainty.

I need to revise the Gelman-Rubin statistic because it often/always fails on parallel runs and I find it not necessarily that informative, ESS is also not that great as a diagnostic either, of course low numbers are bad , but high numbers are not always good either; for me most consistently is assessing the posterior histograms with the ESS, if the plots look good and the ESS is also great then repeated runs will deliver similar results, but if either is off then repeated runs may deliver surprises.

Now to answer your question: my experience with marginal likelihood is that they converge faster than parameters, even when the histograms and ESS do not look great the marginal likelihoods often are at values that are the ’same’ as when run 100x longer. 
Therefore, I would believe the marginal likelihood results!

wapello65

unread,
Jun 2, 2014, 6:03:50 PM6/2/14
to migrate...@googlegroups.com
Well, I've had a couple of weeks now to model my data with a ghost population but I've been disappointed with the results. I've tried several different things but  the posterior distributions still look rather poor for the full migration matrix (pop268Xf). I've also tried with a somewhat less complicated model but the posterior distributions still look poor. My interpretation is that the model is so far from reality that it will take a very long time for convergence to be achieved. I had hoped to obtain some reasonable results before using the thermodynamic method to compare various migration models using Bayes Factors.

My question is, do you thing there is anything I might have overlooked? Or should I just try to model migration without a ghost pop? Any advice would be appreciated.

Files:

Thanks,

Patrick

Eric,

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsub...@googlegroups.com.

To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to a topic in the Google Groups "migrate-support" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/migrate-support/uC54-pKyLgs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to migrate-support+unsub...@googlegroups.com.

To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "migrate-support" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/migrate-support/uC54-pKyLgs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

Peter Beerli

unread,
Jun 2, 2014, 6:22:36 PM6/2/14
to migrate...@googlegroups.com
Patrick,

in situations where I suspect that the data is not all that informative I usually try a complex model and then the smallest model that I can think may remotely be interesting (this means turning on heating and replication as soon as possible — e.g. when I sample more than 1 million steps I usually use replicates) in addition the marginal likelihoods help to get some idea about how much confidence you should have in different runs using the same population model but different random number seeds.

for testing:
1. run 5 replicates (cut down the long-sample by a factor of 5), turn on heating [I see that your run took already a few days, you run the program on 4 cores. What machine is this? It looks relatively slow, did you use all the cpu cycles? most cpus these days low powered portables or than have 4 real cores, meaning to soak up all cycles you will use the hypercores (for example my mac has 8 usable (hyper-)cores and is much faster than when I simply use 4. Or is this a cluster ? Then you will take what they give you. On your own machine: use one more “core” [the ghost core ;-)] to run migrate because migrate uses a master and workers, the master will occupy a core but does not much, over committing the machine a little actually helps to speed up the run.
2. run a case that has not ghost
3. run a case that has a ghost but most migration routes are set to zero, do the same without a ghost [aka our smallest acceptable case]

I know that you said you want to solidify the runs first before you model test, but if your most complex model is not supported by the data than it will be very difficult to optimize that model.

Sorry for just suggesting more work,
Peter

wapello65

unread,
Jun 3, 2014, 5:31:34 PM6/3/14
to migrate...@googlegroups.com
Hello Peter, thank you for your prompt reply and suggestions. I use Migrate on my desktop machine. Here are the specs:

iMac
2.7 GHz quad core Intel Core i5
4 G  1333MHz DDR3
OX 10.7.5
Migrate 3.6.4

All the cores are being tasked and operating near 100% when Migrate is running, if that's what you mean when you ask if all the cpu cycles are being used. I'm afraid I don't know the difference between a regular core and a hypercore. 

When you say "use one more 'core' (the ghost core)" do you mean to tell migrate at the command line:

mpirun -np 5 --host localhost ./migrate-n-mpi parmfile (i.e. instead of -np 4 ?)

Also, if I understand you correctly, here's what you're suggesting in steps 1,2, and 3:

1. Rerun the full migration matrix, that includes the ghost pop, using 5000 recorded steps, 5 replicates, heating, and a burn-in of 50,000 trees (10%) per chain. 
2. Perform the same analysis, except this time without a ghost pop
3. Perform the analysis again, with the ghost, with perhaps only 1 or 2 migration routes
4. Repeat step 3 without the ghost

compare the log marginal likelihoods of the Bezier approximations. The model with the smallest value will have the strongest support.

Correct? What if none of the posterior distributions look all that good?

Thanks again,

Patrick

Peter Beerli

unread,
Jun 3, 2014, 11:03:06 PM6/3/14
to migrate...@googlegroups.com
your computer has 4 hypercores and two real cores (I guess)
yes run mpirun using 5 cores, it will speed up a little, because the master is not doing much.
mpirun -np 5 —host localhost ./migrate-n-mpi parmfile (like this)

Also, if I understand you correctly, here's what you're suggesting in steps 1,2, and 3:

1. Rerun the full migration matrix, that includes the ghost pop, using 5000 recorded steps, 5 replicates, heating, and a burn-in of 50,000 trees (10%) per chain.
migrate 3.6.4 run the burn-in similarly to the sampling a burn-in=50000 and a long-sample=50000 says that 50% is burn-in per replicate (there is only “one” chain)
 
2. Perform the same analysis, except this time without a ghost pop
You many need to adjust the migration routes! so it may not be equivalent to “except the ghost pop”

3. Perform the analysis again, with the ghost, with perhaps only 1 or 2 migration routes
yes

4. Repeat step 3 without the ghost
yes


compare the log marginal likelihoods of the Bezier approximations. The model with the smallest value will have the strongest support.
NO, THE MODEL WITH THE HIGHEST VALUE WINS! e.g. log marginal likelihood -1000 is smaller than -10, -10 wins.



Correct? What if none of the posterior distributions look all that good?
Well then you will need to run longer, but the simple models will look better then complex ones, if they also are much better than the complex one then I see now point in optimizing the bad models and would simply say so.

Peter
Reply all
Reply to author
Forward
0 new messages