Modeling migration between specific nodes & RAxML distances.

69 views
Skip to first unread message

Nicholas Crawford

unread,
Apr 6, 2016, 4:03:07 PM4/6/16
to SimPhy
Hello, 

I have two questions:

I'm interested in modeling migration between specific species within my species tree. To do this I think I need to first generate a collection of locus trees (e.g., with SimPhy), annotate the nodes where I want to migration to occur, and then run SimPhy a second time with the locus trees as input. Is there an easier way? I can definitely write some tree annotation code in dendropy, but I thought I'd ask here first. 

Will a gene tree generated with SimPhy be comparable to a tree created from sequence data with RAxML if I include the mutation rate, generation time, and population sizes? I'm specifically wondering the branch length units. 

Thanks for your help.

Nick

Nicholas Crawford

unread,
Apr 11, 2016, 8:43:25 AM4/11/16
to SimPhy
So I've been thinking a bit more about this I'm guessing the best solution for my second question is to use the INDELible script to create alignments. That way I can also model missing data and alignment size. I'd still love some input on my first question. 

Best,
Nick

PS - I might also add that a talk by Keith Crandall, when I was a master's student, was a big factor in convincing me to pursue computational biology. He explained that he had this PhD student named David Posada who was taking CompSci classes and that he had brilliant future ahead of him. 

Diego M.

unread,
Apr 11, 2016, 1:08:40 PM4/11/16
to SimPhy
Dear Nick,

Migration is not considered in SimPhy's model, and if it were, it should be included at the gene tree level (in the coalescent process). Therefore, I don't see an easy shortcut by running SimPhy two times. Could you elaborate further on your idea?
 
If you are interested in simulating gene trees within species trees with ILS and migration only (without GDL and HGT), you could use other simulation software, like CoalEvol https://github.com/MiguelArenas/coalevol or GUMS (https://www.cs.auckland.ac.nz/~yhel002/biopy/#). I don't know of any piece of software that implements a model with GDL or HGT, ILS and migration. I would recommend you against using SimPhy's locus trees as input species trees for other coalescent simulators, since duplications and transfers generate bounded subtrees that those models can't handle.

SimPhy simulates gene trees with branch lengths specified in expected number of substitutions per site. Therefore, they are comparable with those estimated by RAxML.

I hope this helps,

           Diego M.

Diego M.

unread,
Apr 11, 2016, 1:11:21 PM4/11/16
to SimPhy


On Monday, April 11, 2016 at 5:43:25 AM UTC-7, Nicholas Crawford wrote:
So I've been thinking a bit more about this I'm guessing the best solution for my second question is to use the INDELible script to create alignments. That way I can also model missing data and alignment size. I'd still love some input on my first question. 

I do agree. Using INDELible (or any other sequence simulator) is a good idea to model those interesting parameters. 

Nicholas Crawford

unread,
Apr 13, 2016, 11:59:12 AM4/13/16
to SimPhy
Hi Diego,

My question wasn't worded very well. I was using migration and horizontal gene transfer interchangeably. So, to rephrase. How would I model the effects of HGT between specific species on my species tree? I think I just need to annotate the locus trees with the appropriate node kind parameters and set the horizontal gene transfer rate with the -LT flag. 

So my idea is to run SimPhy once without HGT. Then, annotate the locus trees with the appropriate donor / acceptor info, and rerun SimPhy. This should produce two collections of trees. One with ILS only and one with ILS and HGT. 

Does that make more sense?

Nick

Nicholas Crawford

unread,
Apr 20, 2016, 12:08:21 PM4/20/16
to SimPhy
Hi Diego,

I've made some progress with SimPhy. It looks like I figured out how to assign a single locus topology and generate a whole bunch of species trees. Yay! However, when I assign a donor species with the %4 branch parameter it's then missing from the gene trees. It's almost like an extinction event occurred. I would have expected to some genetrees that contained a clade with a mix of B/C labels and a separate clade with only C. Essentially genes from 
C transferring into B. Any idea's whats going on?

- Nick

My parameter file looks like this:

-rs 10 //Number of replicates
-rl f:1 //loci per replicate
-L (A:4000000.0/10#100000,(B:1000000/9%3#100000,C:1000000/10%4#100000):3000000.0,(D:3000000/16#100000,(E:2000000/4#100000,(F:1000000/4#100000,G:1000000/8#100000):1000000.0):1000000.0):1000000.0);
-sg f:0.25 //Generation time
-su f:0.0000000029 //Substitution rate
-sp f:2000000 //Population size
-cs 123 //Seed for the random number generator, in order to make the experiment repeatable.

Diego M.

unread,
Apr 20, 2016, 12:39:06 PM4/20/16
to SimPhy
Hi Nick,

I would like to tell you that you are right and SimPhy allows you to simulate HGT between specific species, but it would not be true. HGT events happen at locus tree level, and therefore are simulated during the locus tree simulation. Modifying the node kind a posteriori would only allow you to simulate coalescent bounds on that nodes, but this is not what you are trying to do.
The problem here is that you are thinking about migration and simulating HGT, and they are not equivalent in our model. In SimPhy, HGT is simulated as a copy of a locus into a different, contemporary species, with replacement and fixation. 
Consider the species tree ((A,B),(C,D)); and a HGT event from C to B, and that we are simulating an HGT event from B to C. The resulting locus tree would be ((A,(B,C)),D). The coalescent process will simulate the coalescent process across those branches, with a bounded coalescent in the B,C node. Therefore, C leaves would be always clustered together, since only one individual received the transference in the original branch C.

If you are still interested in simulating HGT between specific species, I would say we have two options:
a) Simulate Locus trees with SimPhy (if you are interested in GDL) and generate HGT events between specific lineages applying SPRs with another piece of software. Then tag HGT nodes with the specific tag and feed them back to SimPhy to simulate the gene trees.
b) Modify SimPhy to implement HGT events between specific lineages. It should be pretty easy and I could guide you.

I hope this helps,

                  Diego M.

Diego M.

unread,
Apr 20, 2016, 12:42:52 PM4/20/16
to SimPhy
This is related to my previous answer (I'm sorry for the delay).

The tag %4 indicates the HGT receptor. The receptor branch disappears, since it is not leading to any leave in the present.

My previous answer is more relevant, this is just a minor detail.

Best,
            Diego M.
Reply all
Reply to author
Forward
0 new messages