> 2. Based on pyslim's documentation, it looks like the node indices of the ith individual should be (2*i, 2*i + 1). Please tell me if this isn't correct. I confirmed this by sampling a few individuals and checking using ts.individual.nodes.
This is correct as long as you aren't doing certain things in SLiM,
but it's not guaranteed, and so it's good practice not to rely on
this, and get nodes by (a) pulling out the individuals, and (b)
getting their nodes. For instance (not tested):
ix = np.random.sample(np.arange(slim_ts.num_individuals, size=args.nsamples)
ix_nodes = [ts.individual(i).nodes for i in ix]
longroh = [callroh(*n) for n in ix_nodes]
> 3. For each individual, I subset the tree, selecting only their haplotypes (nodes) and then selected trees which span a region of > 3Mb (I'm interested only in long runs).
>
> 4. Output the length of these trees.
Hm: I don't see anything wrong with your code. Conceptually this is a
bit different than the output of germline because germline is looking
for long runs of IBS; so they'll differ in that (a) there will be some
IBS that extends outside of each of the long segments of recent shared
ancestry you have found, and (b) in principle, you might have a long
segment of shared ancestry that happens to have a bunch of mutations
on it anyhow (eg because it's older than usual for that length). I
don't know if these are enough to produce the differences you are
seeing.
You may also want to look at Georgia Tsambos' methods for doing this
sort of thing:
https://tskit.readthedocs.io/en/latest/python-api.html#tskit.TableCollection.link_ancestors
Also, if you've more questions here, perhaps we could move it to a discussion:
https://github.com/tskit-dev/msprime/discussions
--peter