pyslim - calculating mean tree height for subset of individuals

Greg Torda

unread,

Jul 8, 2021, 3:22:40 AM7/8/21

to slim-discuss

Hi list,

I was wondering if it was possible to calculate certain tree stats on just a subset of individuals. Specifically, I am trying to calculate the mean tree height for individuals in the last generation with a certain phenotype. Phenotype info is stored in a separate list, not the .trees file, with pedigree id serving as key. So basically I am hoping to create a new .trees file by filtering the original one down to the individuals that appear in the reference list.

Code to calculate mean tree heights from entire tree file:

    ts = pyslim.load(file).simplify()

    hfp = []
    for tree in ts.trees():
        mean_height = np.mean([tree.time(root) for root in tree.roots])
        left, right = map(int, tree.interval)
        height_for_pos[left: right] = mean_height
        hfp.append(mean_height)

    mh = np.mean(height_for_pos)

Many thanks in advance for any pointers. (Also feel free to let me know if the above code is totally wrong :) Python is not my first language...)

Cheers,

Greg

Peter Ralph

unread,

Jul 8, 2021, 5:41:08 PM7/8/21

to Greg Torda, slim-discuss

Hi, Greg! This is a job for the "samples" argument to simplify: see
the example here:
https://pyslim.readthedocs.io/en/latest/tutorial.html#simplification
Also see the next example, that discusses extracting individuals.

Your python code looks fine, but note that it's taking the mean height
of trees *not* weighted by how far along the genome they extend for.

-- peter

> --
> SLiM forward genetic simulation: http://messerlab.org/slim/
> ---
> You received this message because you are subscribed to the Google Groups "slim-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/8a929772-695e-4ba7-9f1d-3b0ed7d3e69cn%40googlegroups.com.

Greg Torda

unread,

Jul 10, 2021, 1:21:39 AM7/10/21

to Peter Ralph, slim-discuss

This is it, thanks heaps for the prompt reply, Peter!

Here is what I did (in case anyone was interested):
import numpy as np
import msprime, pyslim
ts = pyslim.load(file).simplify()

# extra simplification step based on a list of individuals
keeper = np.genfromtxt(csvfile, delimiter=',')
# these are pedigree_ids which I need to convert to ids
keeper = keeper.astype(int).tolist()

keep_indivs = []
for i in ts.individuals():
pid = i.metadata["pedigree_id"]
if pid in keeper:
keep_indivs.append(i.id)

keep_nodes = []
for i in keep_indivs:
keep_nodes.extend(ts.individual(i).nodes)
sts = ts.simplify(keep_nodes)

height_for_pos = np.zeros(int(sts.sequence_length))
hfp = []
for tree in sts.trees():

mean_height = np.mean([tree.time(root) for root in tree.roots])
left, right = map(int, tree.interval)
height_for_pos[left: right] = mean_height
hfp.append(mean_height)

mh = np.mean(height_for_pos)

In my slim simulation all basepairs are unlinked and so I calculate a tree for every single one and don't need to weight by span on genome (if I understood your comment right, Peter).

Many thanks again, Peter!
Cheers,
Greg

Reply all

Reply to author

Forward