Expansion/migration script and random seeding questions

12 views
Skip to first unread message

Cameron Grey

unread,
Nov 4, 2025, 3:51:01 PMNov 4
to dadi-user

Hi Ryan,


I've been using dadi to model the initial expansion and gene flow of a recent invasion, and I've started on three populations. I have SNP data for the 3 populations and generation times for how long they have (probably) been there. I wanted a model that could examine likely initial expansion events as well as migration post founding of each population. I have scripts for each scenario, and have been comparing AIC values calculated from output LL. 


Specifically -  I tested different expansion scenarios first with the nuPOP parameters, and found the best fit expansion scenario. From there, I have added migration overtop of the expansion with the m_POP parameters in my scripts.


1. Question about migration -  I have included a script where I have both the expansion scenario and one of the migration scenarios I am testing overtop the expansion results. I was wondering if this was a reasonable/informative approach to expansion and migration. Because the data is from one time point, how does dadi differentiate initial expansion events from migration that occurs later on given that these shared allele frequency changes can be quite subtle for recent events? I'm curious because I just got a couple results suggesting migration in the same direction as my likely expansion scenario (I tested just expansion first).


2. Question about seeding - I realized that I was missing a random seeding for the different replicates of the model, and I am thinking of adding in np.random.seed() to the run_opt() portion. Any advice on this approach? I appreciate it.


Thank you for all of your help and for making such an interesting model!


-Cameron


scenario2_3popGOM_migBAtoBZ.py

Ryan Gutenkunst

unread,
Nov 7, 2025, 11:02:56 AMNov 7
to dadi...@googlegroups.com
Hello Cameron,

On Nov 4, 2025, at 1:47 PM, Cameron Grey <camero...@gmail.com> wrote:

Hi Ryan,

I've been using dadi to model the initial expansion and gene flow of a recent invasion, and I've started on three populations. I have SNP data for the 3 populations and generation times for how long they have (probably) been there. I wanted a model that could examine likely initial expansion events as well as migration post founding of each population. I have scripts for each scenario, and have been comparing AIC values calculated from output LL.

Do be careful with AIC. If you’re SNPs are linked, then dadi is really computing a composite likelihood, and the AIC will be anti-conservative (favor the complex model too much).

Specifically -  I tested different expansion scenarios first with the nuPOP parameters, and found the best fit expansion scenario. From there, I have added migration overtop of the expansion with the m_POP parameters in my scripts.

1. Question about migration -  I have included a script where I have both the expansion scenario and one of the migration scenarios I am testing overtop the expansion results. I was wondering if this was a reasonable/informative approach to expansion and migration. Because the data is from one time point, how does dadi differentiate initial expansion events from migration that occurs later on given that these shared allele frequency changes can be quite subtle for recent events? I'm curious because I just got a couple results suggesting migration in the same direction as my likely expansion scenario (I tested just expansion first).

Be careful with migration directions. The m12 parameter in dadi is migration into population 1 from population 2, which isn’t obvious.

You may not have power to differentiate directional migration from other scenarios. As you said, it’s a subtle signal.

2. Question about seeding - I realized that I was missing a random seeding for the different replicates of the model, and I am thinking of adding in np.random.seed() to the run_opt() portion. Any advice on this approach? I appreciate it.

Dadi does the same seeding approach internally, so there’s no need to add it. One caveat is that the default np.random.seed() which dadi uses as well, is based on system clock time. If you’re starting a number of jobs simultaneously on a cluster, they can inadvertently get the same seed. dadi-cli works around this by also using additional info about the machine to seed.

I encourage you to also explore dadi-cli, it makes most basic dadi analyses much easier.

Best,
Ryan

Thank you for all of your help and for making such an interesting model!

-Cameron


--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/b4fcad17-e40e-47b4-8845-75b4de67f348n%40googlegroups.com.
<scenario2_3popGOM_migBAtoBZ.py>

Reply all
Reply to author
Forward
0 new messages