Gene trees in unit of generations

40 views
Skip to first unread message

siavash mirarab

unread,
Oct 24, 2019, 1:49:07 AM10/24/19
to Diego Mallo, SimPhy, Uyen Mai
Hi All,

I could not find a function to output gene trees (not locus trees) in the unit of the number of generations. Gene trees, as far as I can see, are only outputted in the unit of substitution units. Turning on the most verbose logging (-v6) did not help either.

Is it possible to output ultrametric gene trees in the unit of the number of substitutions?

Thanks
--
Siavash Mirarab


siavash mirarab

unread,
Nov 13, 2019, 6:54:25 PM11/13/19
to Diego Mallo, SimPhy, Uyen Mai
OK, I hacked together a solution. I built a version of Simphy that outputs ultrametric trees. Here is what you could do.

1. Run this version with the link below with your parameters of interest. Make sure to give it a seed number.

2. Run the normal Simphy tool with the same exact command (making sure to provide the same seed) on the same exact directory. This will overwrite all the wrong things generated in step 1 and will only keep what is needed.

You will have all the correct gene trees but you will also have a file called ultrametric-genetrees.tre with ultrametric locus trees.

Something like `sed -i -e "s/_0_0//g" */ultrametric-genetrees.tre` will help you makes gene tree names consistent between locus trees and gene trees.


Code changes I made:

diff --git a/src/main.c b/src/main.c
index 4323e1b..8c2aa92 100644
--- a/src/main.c
+++ b/src/main.c
@@ -242,7 +242,7 @@ int main (int argc, char **argv)
     /// <dl><dt>I/O variables</dt><dd></dd></dl>
     char g_prefix[8]="g_trees", command_sufix[9]=".command", map_sufix2[8]="g.map", mapsl_sufix2[9]=".mapsl", maplg_sufix2[10]="g.maplg", db_sufix[4]=".db", params_sufix[8]=".params", tree_sufix[7]=".trees", weirdg_sufix[8]=".ralpha", s_outname[13]="s_tree.trees",l_outname[14]="l_trees.trees",d_outname[27]="bounded_locus_subtrees.out", *g_outname=NULL, stat_outname[10]="stats.txt",weirds_outname[14]="s_tree.ralpha",confile_sufix[6]=".conf",c=0;
     char  *map_outname=NULL, *mapsl_outname=NULL, *maplg_outname=NULL, *db_outname=NULL, *params_outname=NULL, *command_outname=NULL, *curr_outdir=NULL,  *weirdg_outname=NULL, *stree_iname=NULL, *ltree_iname=NULL, *confile_outname=NULL;
-    FILE *stree_ifile=NULL, *ltree_ifile=NULL, *s_outfile=NULL,*l_outfile=NULL,*d_outfile=NULL,*g_outfile=NULL,*stat_outfile=NULL, *params_outfile=NULL, *command_outfile=NULL, *weirds_outfile=NULL, *weirdg_outfile=NULL, *confile_infile=NULL, *confile_outfile=NULL;
+    FILE *stree_ifile=NULL, *ltree_ifile=NULL, *s_outfile=NULL,*l_outfile=NULL,*d_outfile=NULL,*gult_outfile=NULL, *g_outfile=NULL,*stat_outfile=NULL, *params_outfile=NULL, *command_outfile=NULL, *weirds_outfile=NULL, *weirdg_outfile=NULL, *confile_infile=NULL, *confile_outfile=NULL;
     sqlite3 *database;
     int n_sdigits=0, n_ldigits=0, n_gdigits=0, n_istrees=0, n_iltrees=0, error=0;

@@ -881,6 +881,12 @@ int main (int argc, char **argv)
         // ********
         /// <dl><dt> Main loop for each locus tree </dt><dd>
         prev_ltree_eq_stree=0;
+
+       if ((gult_outfile=fopen("ultrametric-genetrees.tre", "w"))==NULL)
+        {
+                perror("Error opening g_tree file: ");
+                ErrorReporter(IO_ERROR,NULL);
+        }

         for (curr_ltree=1; curr_ltree<=get_sampling(nl_trees);++curr_ltree)
         {
@@ -1211,6 +1217,8 @@ int main (int argc, char **argv)
                 else
                     ErrorReporter(SimMLCGTree(locus_tree,&gene_tree,names,epsilon_brent,r,&n_extralin,&max_extralin,verbosity,get_sampling(gen_time),map>0||out_labels==1?1:0),": simulating a gene tree");

+                 printf("\n\t\tUltrametric-GT: ");
+                WriteGTreeFile(gult_outfile,gene_tree,names,out_labels);
                 // ****
                 /// <dl><dt>Gene tree bl modifications</dt><dd>

@@ -1380,6 +1388,7 @@ int main (int argc, char **argv)

 #endif
         }
+        fclose(gult_outfile);

 #ifdef SORTHOLOGS
         //Sorthologs
diff --git a/src/trees.c b/src/trees.c
index 80c2e22..f29f169 100644
--- a/src/trees.c
+++ b/src/trees.c
@@ -5290,7 +5290,7 @@ long int SimMSCGTree(l_tree *wlocus_tree, g_tree **gene_tree, name_c * names, fl
         else
             p_Ne=wlocus_tree->Ne; //Global Ne.

-        p_mu=wlocus_tree->mu*w_lnode->mu_mult;
+        p_mu=1; //wlocus_tree->mu*w_lnode->mu_mult;

         // ** Constraining coalescent time and number of events ** //
         current_ngen=w_lnode->n_gen;


Thanks
Siavash
--
Siavash Mirarab


Diego M.

unread,
Nov 13, 2019, 9:47:32 PM11/13/19
to SimPhy
Dear Siavash,

It seems I am late for just a couple hours, but I have just pushed a new git branch on SimPhy's repository https://github.com/adamallo/SimPhy/tree/gtreengen that adds the functionality you guys were looking for. Now, you can choose to output gene trees with their branch lengths measured in number of generations using the -OG 1 argument. I will eventually merge this branch with the master (after documentation of the new option, etc.).

Best,

       Diego M.

siavash mirarab

unread,
Nov 13, 2019, 11:08:00 PM11/13/19
to Diego M., SimPhy
Thanks, Diego. 

--
You received this message because you are subscribed to the Google Groups "SimPhy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simphy+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/simphy/11a9da44-3a22-468e-afda-4a219c207497%40googlegroups.com.
--
Sent from a phone. Excuse typos and the brevity.
Reply all
Reply to author
Forward
0 new messages