Hi Jake,
Basically, this is done as follows:
#! /usr/bin/env python
import dendropy
from dendropy import treesim
# read the species tree
sp_tree = dendropy.Tree.get_from_string("[&R] (((A:2, B:2):1,(C:1,
D:1):2):1,E:4);",
"newick")
# set the number of individuals sampled from each species
for leaf in sp_tree.leaf_iter():
leaf.num_genes = 5 # can be any number you want
# if your branch lengths are NOT in coalescent units
# and are given in, e.g. generations, you will need
# to set the population sizes of the edges
#for edge in sp_tree.postorder_edge_iter():
# edge.pop_size = 1.0 # 1.0 => branch lengths in coalescent
units
# Simulate a gene tree within the species tree.
# `gene_tree` will be the constrained/censored/truncated gene tree
# `mapped_sp_tree` is a *clone* of the original input species
tree,
# but with the gene tree nodes as attributes of its
# nodes, so you can see where on the species tree they coalesce
etc.
# You can ignore it if you are not using this information.
gene_tree, mapped_sp_tree = treesim.constrained_kingman(sp_tree)
# show it!
print(gene_tree.as_string("newick"))
print(gene_tree.as_ascii_plot())
You do not mention what units your branch lengths are in. If
unspecified, then the population is taken as "1.0", which means that a
branch length of "1.0" corresponds to 1 (haploid) coalescent unit. If
this is not the case, you will have to either set the population size
on each edge correctly, as shown in the commented-out code above, or
adjust your branch lengths.
There are lots of other options, for using a particular set of taxa or
taxon labels for the gene trees, etc. etc. Let me know if you want to
do some of this.