bug in pyslim.recapitate

133 views
Skip to first unread message

Peter Ralph

unread,
Jun 20, 2023, 7:43:08 PM6/20/23
to slim-discuss
Dear pyslim users: I'm writing about a bug that we've just turned up in pyslim. It's potentially a bad one, so I'm trying to get word around about it. It's been in pyslim since 0.700, and probably affects you if you're using pyslim.recapitate( ) (although hopefully not seriously; see below). The fix is to upgrade pyslim (to 1.0.2, just released).

First, many sincere apologies. We spend a lot of time working to make our software reliable and well-tested, and in this case I messed up. I hope it doesn't mess you up.

What to do: Just install the most recent version of pyslim, e.g., by doing
 pip install --upgrade pyslim
and make sure any requirements.txt files have pyslim>=1.0.2 .

Description: Before this fix, using the `ancestral_Ne` parameter to `pyslim.recapitate( )` would introduce a bottleneck in each SLiM subpopulation down to diploid size Ne=1 for 1 or 2 generations in most situations. The bug would *not* occur if either (a) it was a WF simulation, with calls to addSubPop() in first() or early() and treeSeqOutput() in late(), or (b) it was a nonWF simulation, with calls to addSubPop() in first() and treeSeqOutput() in early() or late(). The fix correctly starts the msprime population with effective size `ancestral_Ne` at the time of the roots, which might be at the value of `ts.metadata['SLiM']['tick']`, this value minus 1, or this value minus 2. (Previously, the msprime population always started at `ts.metadata['SLiM']['tick']` ticks ago, with populations of size 1 for the intervening 0, 1, or 2 ticks.)

Note: Now, `recapitate` throws an error if any roots of any trees are not at the same time as the others. I cannot think of any use cases this will cause a problem for, but let me know if you encounter one.

How might this affect you? The easy answer is - if you used `pyslim.recapitate(..., ancestral_Ne=xxx)` on a SLiM simulation fitting the description above, then the result you got is probably not what you intended, and you should re-do it. (Hopefully, you have the pre-recapitated tree sequence around, and can just do recapitation again.) What about published studies? Are they all wrong if they used recapitate? Well, that's a harder question - probably not, and all models are approximations anyhow, but some situations are more affected than others. The answer depends on how important the diversity introduced by recapitate is to the results. Here's some examples:

Example: *I ran a SLiM simulation for 5N generations, and then recapitated. How would this affect me?* Happily, probably not much. Recapitation is providing any ancestral diversity present at the time; and most of the genome will have coalesced by 5N generations ago. So, the bug will have resulted in less genetic diversity only on the small pieces of genome that had not coalesced by that point. How much less? Skipping the details, something like 50% less diversity on the 8% of the genome that hasn't coalesced; so, reducing genetic diversity at the end by 4%. Of course, if your method is specifically looking for short haplotypes from a long time ago, there is more cause for concern.

Example: *I ran a large spatial simulation for 100 generations, and then recapitated. How would this affect me?* Here I'd be more worried: the strong bottleneck 100 generations ago would reduce diversity by something like 50%, *and* specifically in very long haplotypes. As a result, the simulation would resemble a recently established population from a single migrant.

Please reply to this thread or to me separately if you have any further questions. And, please let others who might be affected by this know.

Sincerely,
 Peter Ralph

Peter Ralph

unread,
Jun 22, 2023, 12:44:44 AM6/22/23
to slim-discuss
An important postscript: my first fix of the bug omitted the actual bugfix. :double facepalm: So, I've made another release (which also includes some better unit tests!).

You should make sure you update to:
  pyslim>=1.0.3
and NOT 1.0.2 as I said in the previous email.

It's available on pip now, and will be on conda-forge soon.

Again, apologies, and let me know if I can be of assistance.
  Peter.

From: Peter Ralph
Sent: Tuesday, June 20, 2023 4:43 PM
To: slim-discuss <slim-d...@googlegroups.com>
Subject: bug in pyslim.recapitate
 
Reply all
Reply to author
Forward
0 new messages