compare migrate-n results to IM

1,355 views
Skip to first unread message

robertkraus

unread,
May 2, 2011, 10:41:59 AM5/2/11
to migrate-support
Dear colleagues,

I have a data set of mtDNA sequences for two populations. When I
started to analyse it I used IMa2 to infer theta and migration rates.
The output was (eventually) sensible. Later I played around with
migrate-n 3.2.7 and the exact same dataset. The funny thing now is
that the results are almost opposite when it comes to magitude, but
luckily in line with each other when it comes to directionality. This
means:

With IM I get pretty large thetas for each pop. With migrate-n they
are tiny, almost zero (a triangle with the peak at zero...). With IM I
get almost no migration (HPD95 well include zero) but the little
migration there may be is directional with an almost 80% probability
that the one rate is higher than the other. In migrate-n, however,
migration rates are numerically huge - and do not at all include zero.
Luckily, the one migration rate in an order of magnitude larger than
the other, which fits the IM analysis.

My question is in a sense pretty specific, i.e., why are the thetas so
tiny??? I am dealing with a species that is abundant, in both
populations, likely >1 million (yes, effective size), and genetically
not very impoverished. When I estimate theta with with DNAsp or so, I
get >11 for each pop. And with IM it goes up into the hundreds,
producing sensible numbers when it comes to population sizes
calculated with a known mutation rate.

But the question is also a bit general: What is the difference between
estimates given by IM and migrate-n, especially in the case of only
two populations, in which genealogy (used by IM) should not imact the
model that much. I would be happy about suggestions.

Cheers,
Robert

Peter Beerli

unread,
May 2, 2011, 10:51:35 AM5/2/11
to migrate...@googlegroups.com
Robert
before we dive into analyze differences could you tell us about the priors for theta nad M and your data (size, variability)?
It could well be that your prior for theta is huge and the number of bins for the histogram is (relatively small)?

Peter

> --
> You received this message because you are subscribed to the Google Groups "migrate-support" group.
> To post to this group, send email to migrate...@googlegroups.com.
> To unsubscribe from this group, send email to migrate-suppo...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/migrate-support?hl=en.
>

Robert Kraus

unread,
May 2, 2011, 11:05:13 AM5/2/11
to migrate...@googlegroups.com
Hi!

Wow, thanks for the immediate reply. HEre are the runtime parameters of migrate-n:

Analysis strategy is Bayesian

Proposal distribution:
Parameter group Proposal type
----------------------- -------------------
Population size (Theta) Slice sampling
Migration rate (M) Slice sampling


Prior distribution:
Parameter group Prior type Minimum Mean(*) Maximum Delta
----------------------- ------------ ---------- ---------- ---------- ----------
Population size (Theta) Uniform 0.000000 0.500000 1.000000 0.100000
Migration rate (M) Exponential 0.000000 500.000000 50000.0000 -

I used an exponential prior for the migartion rate in this final run but also did runs with uniform ones. They all ended up with large numbers. Hence I played around with "forcing" the program to really consider small migration rates - but the data completely overran the the exponential prior with it pretty low mean. The upper bounds is that high after I have set it higher and higher after every new run which suggested it needs to be that high.

It is also unlikely a problem of convergence:

Markov chain settings:
Long chains (long-chains): 1
Steps sampled (long-inc*samples): 500000000
Steps recorded (long-sample): 500000
Static heating scheme
10 chains with temperatures
1.00, 1.50, 3.00,10.00,50.00,100.00,500.00,1000.00,5000.00,10000.00
Swapping interval is 10
Number of discard trees per chain: 1000000000

Let me know if you need more info. Thanks already for having a look!

Cheers,
Robert


-------- Original-Nachricht --------
> Datum: Mon, 2 May 2011 10:51:35 -0400
> Von: Peter Beerli <crece...@gmail.com>
> An: migrate...@googlegroups.com
> Betreff: Re: [migrate-support] compare migrate-n results to IM

--
NEU: FreePhone - kostenlos mobil telefonieren und surfen!
Jetzt informieren: http://www.gmx.net/de/go/freephone

Peter Beerli

unread,
May 2, 2011, 11:58:41 AM5/2/11
to migrate...@googlegroups.com
Robert,
sorry more questions but perhaps some guidance too.

Given the run time parameters convergence is not an issue [10^9 burn-in seems rather excessive :-)]

but I wonder about the theta priors, they seem very wide for DNA data (but very narrow for msats --> I assume it is DNA -- How many loci?),
if the data is from viruses that range may be OK, but for diploids it will be very large and a run with a theta (again depnding on data and species) with an upper bound of 0.1 or if this is nuclear data and all populations are tiny with 0.01 may be much better.

Migrate versus IM: migrate will attribute all ancestral polymorphism that is left at divergence to immigration, I have a note about that at
http://popgen.sc.fsu.edu/Migrate/Blog/Entries/2010/8/15_Violation_of_assumptions%2C_or_are_your_migration_estimates_wrong_when_the_populations_split_in_the_recent_past.html.

Migrate reports Theta per site (e.g. the 11 of dnasp is per locus --> how many sites?

For a single locus IM may have some issues with co-estimation of divergence and immigration rates.

Peter

Robert Kraus

unread,
May 2, 2011, 12:37:51 PM5/2/11
to migrate...@googlegroups.com
Dear Peter,

no worries about more questions! I am happy to answer any questions that solve the "mystery".

Yes, I did a loooong burn-in. I really wanted to exclude convergence issues ;-).

My data set is mtDNA, 600 bp, duck species. Good to know that migrate-n gives theta per site, tihs makes more sense now. I can try setting the theta prior smaller. But the ducks are pretty abundant, I expect large pop sizes. Anyway, worth a try.

Then there is still the interpretation of the large numbers of the migration. How could I interpret numerically huge values. I know they are scaled to mutation rate, but in which sense would they differ form IM output? Also there, parameters are scaled to mutation rate.

But I will have a look at the document link, thanks! It reads as if that could help me ;-).

Thanks already,
Robert


-------- Original-Nachricht --------
> Datum: Mon, 2 May 2011 11:58:41 -0400

Robert Kraus

unread,
May 3, 2011, 3:10:39 AM5/3/11
to migrate...@googlegroups.com
Hi again, Peter,

I read your note on overestimation of migration rates (the link you've sent me in this conversation). Just to make sure, do I understand correctly that with large populations and possible short divergence time, inferred migration rates can be wrong? I guess my effective pop sizes would be a million or more (each), but the divergence time max. 100k generations, and also migration seems very little. So this is likely what happens in my data. But it seems from reading your text, that with longer divergence times the migration rates would be underestimated?? I seem to have over estimation...

What about the relative values of the rates. They are likely biased in the same magnitude in each direction. So if my result is, that in one direction there is a lot of migration, but not in the other direction, this is a valid observation, right? No matter what the magnitude of this measure.

Thanks again for your swift reply and explanation!

Cheers,
Robert


-------- Original-Nachricht --------
> Datum: Mon, 02 May 2011 18:37:51 +0200
> Von: "Robert Kraus" <Robe...@gmx.li>

Peter Beerli

unread,
May 3, 2011, 8:17:48 AM5/3/11
to migrate...@googlegroups.com
Robert,

(1) The ratio of Ne and divergence time is key (Edwards and Beerli 2001 in a slightly different wrinkle of this), for estimation of migration rates between two recently diverged populations the similarity of the populations due to lineage sorting leads to an overestimation of the migration rate because alleles land in both populations from the ancestor and not by migration. The longer ago the divergence event was the less impact this will have:
in practice when the individual populations coalesce before the divergence event than migrate will estimate the migration rate correctly, but with ~2Ne > divtime then overestimation is small in practice this plays only a role with huge population that had a constant size over time, I would think that the estimation of migration rates of a population that was growing after the divergence is less affected by the ancestral lineages.
But beware this is Ne and not census size.

You say that Ne>10^6 and divtime~10^5, this certainly would put you into the "danger zone," but given that your theta estimates looked funny, I seems that your assertion about population size is premature, because migrate in the worst case should give you the size of the ancestor (if the split happened yesterday) and not zero-mode estimates.

(2) Migration rates estimates from migrate will be, in the worst case (huge population and very recent divergence), overestimated. The directionality does NOT get lost, so you should still see the direction, in fact I have run Bayes factor analyses on parasites with presumptive transmission dates that were very recent (on the magnitude of tens of generations) and the (known) direction was confirmed by my migrate run comparison [in this particular dataset IM did not converge].

Peter

Robert Kraus

unread,
May 3, 2011, 9:20:16 AM5/3/11
to migrate...@googlegroups.com
Thanks for this valuable information. I started another run with narrower priors on theta to assess the validity of my last run. However, given excessive burn-in, will also a (too) wide prior harm the analysis?

Cheers,
Robert

-------- Original-Nachricht --------
> Datum: Tue, 3 May 2011 08:17:48 -0400

Peter Beerli

unread,
May 3, 2011, 9:30:46 AM5/3/11
to migrate...@googlegroups.com
Robert,
the burn-in has little to do with that, but all with the number of bins for the posteriors and the window for the prior.
Priors always matter, we just hope that the data is overpowering the prior, in your case the answer seem simply corse.

Peter

Swaraj Kunal

unread,
May 4, 2011, 1:26:00 AM5/4/11
to migrate...@googlegroups.com
hello everyone,
 
I am very new to population genetics studies. I havecollected sample from 2 different location. I have done DNA-RFLP of a segment of mtDNA and also sequencing of the same, now i dont know how to interpret the data.
If anyone can help me in this regard. 
 
Kunal
Warm Regards

Swaraj :-)


Robert Kraus

unread,
May 4, 2011, 8:55:44 AM5/4/11
to migrate...@googlegroups.com
Dear Kunal,

my apologies if this may sound a bit rude. But please first ask a proper question. This concerns two levels:

i) Research question: We can't help you with suggestions if we do not know what you would like to find out.
ii) Questions in a discussion group: Without showing your results, and explaining which of those you don't understand, nobody will be able to help you.

Have you already run migrate-n and don't understand its output? Or do you seek general advise?

Cheers,
Robert

-------- Original-Nachricht --------
> Datum: Wed, 4 May 2011 10:56:00 +0530
> Von: Swaraj Kunal <swar...@gmail.com>

Swaraj Kunal

unread,
May 4, 2011, 11:10:00 AM5/4/11
to migrate...@googlegroups.com
Dear Robert,
 
I beg my pardon for not putting my question properly. Well, i am trying to assess population genetic structure of a marine fish species. I had collected around 100 samples from  two localities (50 from each location), seperated by around 600 miles.  This species is coastal species,  mostly found in these two localities only,  with no information about migration pattern.
 
The genetic variation between the two population is very low, as confirmed by analysis of RFLP and sequencing data set of a region of mtDNA. 
 
I have 3 question.
 
1.) I want  to use migrate-n for analysing effective population size as well as migration rate between these 2 population, what i need to do in this regard ?
2.) And also what other analysis can i make with my data set. ?
3.) Does anaysing nuclearDNA(Microsatellite) would be helpful?
 
Please do let me know, if i am am able to putforth my question to you
Hope this time my question make some sense.
 
Thanking you in anticipation
 
with warm regards
 
Kunal
.
On Wed, May 4, 2011 at 6:25 PM, Robert Kraus <Robe...@gmx.li> wrote:
Dear Kunal

Robert Kraus

unread,
May 4, 2011, 2:07:01 PM5/4/11
to migrate...@googlegroups.com
Hi Kunal!

Much better ;-).

1. Even though I am not the expert (yet ;-)), I think migrate-n can surely help you with your data. If you are talking about two population of same species of fish, you can just try to start a migrate-n run and see what the program does. Unfortunately this type of coalescent simulation program is not trivial in its use and you will have to invest some time in reading the manual and some of the papers cited in it. Then I suggest you install migrate-n and run the test files that are coming with migrate-n to find out if it works properly and prodcues output. If it does, construct an input file of your sequences and run it with some short rnu to see it is working. Then get back to the manual and carefully read the notes about runtime settings (how long the burn-in, how many chains, how long the run...). If all is fine, migrate-n will give you estiamtes of the effective population size and migration rates, both scaled to mutation rate. If you need real demographic units (i.e., number of individuals) then you might need to dig deeper and ask again at a later stage. This goes to far for now.

You say you have both RFLP and mtDNA sequences? Did I understand correctly, that the RFLP and sequence data are from the same locus? In this case you should not use your RFLP becasue you ahve the whole sequnece of that locus. But I seem to misunderstand you.

2. You can do all or nothing, this largely depends on the questions you would like to answer with your research. If you are in an exploratory stage, you might want to read the manuals of some larger "regular" population genetic programs and get inspired. I am now not trying to make no commercials, but these are packages that come to my mind right now: Arlequin, dnasp, MEGA. Just google for "population genetics arlequin" or "population genetics dnasp" or "population genetics mega". Make sure you are reading on the latest version of these programs.

3. Certainly microsatellites or other nuclear data will broaden your possibilities dramatically. In the very case of your wish to learn about the effective population size and migration rates, additional microsatellites will deliver to you much better estimates and the complete picture for the whole species (not only the females as with mtDNA). Of curse, as all the rest, this depends on your research questions.

I hope I could get you started. I am sure you will be busy for a moment in learning what migrate-n does, and how it does that.

Good luck!

Cheers,
Robert


-------- Original-Nachricht --------
> Datum: Wed, 4 May 2011 20:40:00 +0530

> Betreff: Re: [migrate-support] population genetic analyses of mtDNA

Vikram Chhatre

unread,
May 4, 2011, 2:22:42 PM5/4/11
to migrate...@googlegroups.com
Hi Kunal,

I am sure Robert's excellent discussion will get you going. I will
reiterate what he said in that, the analyses you will choose to do
will depend upon the questions you are asking and the hypotheses you
are testing.

I recently saw an interesting paper that discusses the importance of
effective population size for understanding genetics of wild
populations. Here is the doi:
http://dx.doi.org/10.1111/j.1365-294X.2008.03842.x.

You may also wish to look at LAMARC, a companion program of Migrate-n
and read some papers by M. Kuhner that describe that paper. LAMARC
uses Bayesian methods and additional models for microsatellite
evolution (e.g. Brownian motion), so it would be a good comparison.

All the best
Vikram

Peter Beerli

unread,
May 4, 2011, 3:05:15 PM5/4/11
to migrate...@googlegroups.com
Kunal and Vikram,
I want to clarify a few points about lamarc and migrate:
At one point in time while I was still in Seattle we worked on a package called lamarc
containing programs coalesce, fluctuate, recombine and migrate. In about 2000 Mary Kuhner
and collaborators (including me until 2003) started work on lamarc (the program) that would
incorporate all forces (growing population size, migration, recombination, and selection).
After starting my job here at FSU in 2003 I decided that my own program migrate has several
things that we did not incorporate into lamarc and that I wanted to improve on that.
Vikram mentions the brownian mutation model, it is available in both programs because we
ported it from migrate to lamarc. Since then both program do similar things and each does
some things the other does not do, non-exhaustive lists follow:
Migrate:
- runs in parallel (on clusters or multi-core cpus)
- writes output into a PDF with histograms for Bayesian inference
- calculates Bayes factors (using thermodynamic integration]
- calculates approximate likelihood ratio tests and AIC
- is "fast"
- can show skyline plots (not as good as BEAST) and show distribution of migration events over time
Lamarc:
- take into account exponential population growth
- take into account recombination
- allows for physical distances between snps
- does haplotyping
[but in my opinion is very slow when you turn on any of the above]
Both do:
- Bayesian inference (I have the impression that we do this quite differently)
- multiple model for microsatellite evolution (but some of these models are really slow)

You may want to look also into IM (Jody Hey) and BEAST (Drummond, Rambaut, Suchard)
AND check out the two overviews:

Excoffier: http://www.nature.com/nrg/journal/v7/n10/full/nrg1904.html

Kuhner: http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VJ1-4V7CYVX-1&_user=2139768&_coverDate=02%2F28%2F2009&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_acct=C000054272&_version=1&_urlVersion=0&_userid=2139768&md5=309e55c7fa1d8048eff7fc998a57c6ac

Peter

Vikram Chhatre

unread,
May 4, 2011, 3:12:05 PM5/4/11
to migrate...@googlegroups.com
Dr. Beerli,

Thanks for clarifying the differences. I did not know that Migrate
also uses Bayesian methods.

My apologies for any misleading information.
Vikram

Reply all
Reply to author
Forward
0 new messages