Transfers to the past

140 views
Skip to first unread message

Alejandro Gil-Gómez

unread,
Jan 31, 2024, 4:34:15 PM1/31/24
to GeneRax
Hello,
I am using pargenes and generax on a dated rooted species tree to obtain the ages of the MRCA node on different orthogroups. I am having two main issues with this approach, first is that I think Generax assumes that the model is undated, meaning the reconciliation only minimizes the events based on the tree topology without taking into account the age of the lineages. The second problem is that in some cases such as here the transfer goes back into a node that is the parent of the root of the gene tree, possibly indicating a problem with the rooting.
I am using the following commands:
pargenes.py -m -a 2.1.small_filtered -o 3.0.small_pargenes -c 96 -d aa

python3 GeneRax/scripts/pargenes_to_families_file.py 3.0.small_pargenes  family_file

generax --families family_file \
 --species-tree $treefile \
 --rec-model UndatedDTL \
 --prefix 4.0.generax_small \
 --max-spr-radius 5 \
 --strategy SPR

The species tree was fully dated, but I assume my problem is with the pargenes/generax steps, maybe there is something I am missing.
Kindly advise,
Alejandro.
Screenshot 2024-01-31 160659.png

Alejandro Gil-Gómez

unread,
Feb 2, 2024, 12:06:45 AM2/2/24
to GeneRax
I think one of the issues I am having is probably because I ran pargenes with default parameters, this is only one starting gene and no bootstrap replicates.
I am repeating these steps with 20 starting parsimony trees and 100 bootstrap replicates, like in the supplementary material for the pargenes paper.
pargenes-hpc.py -m -a 2.1.filtered -o 2.3.pargenes -c $SLURM_NTASKS -d aa -p 20 -b 100 --autoMRE
I have been running on a smaller set of 100 genes (I have 10.000 gene families in total), on a single node with 96 cores and it has been taking almost 6 hours with -p 4 -b 100. Most of the time it has been calculating the bootstraps. Would this scale with time? Should I start with other parameters?

I assume the next steps should be to get the optimal tree and then get the family file family ready for generax. Do these look good? I am not sure if the export.py script would work well with the pargenes_to_families script.
Thank you in advance for your guidance.
# Export pargenes output:
#python3 /gpfs/projects/RestGroup/agilgomez/tools/ParGenes/pargenes/pargenes_src/export.py -i 2.3.pargenes -o 2.4.best_pargenes --best-ml-tree --best-ml-model --bootstrap-trees --support-values-tree

# Get pargenes output ready for generax
#python3 /gpfs/projects/RestGroup/agilgomez/tools/GeneRax/scripts/pargenes_to_families_file.py 2.3.pargenes  pg_family_file

Benoit Morel

unread,
Feb 2, 2024, 9:55:47 AM2/2/24
to Alejandro Gil-Gómez, GeneRax
Hi Alejandro,
Sorry for the late reply, I've been sick the whole week and I haven't really recovered yet.
The pargenes step should not affect the subsequent steps that much; because it only computes starting gene trees for the searches.
GeneRax is not able to take into account the dates in the tree: it does not forbid transfers to the past. However it should at least forbid transfers to the parents, but those seem to happen in your case... I'll check that next week
An alternative to GeneRax would be AleRax (the manuscript is under review, and the preprint is here: https://www.biorxiv.org/content/10.1101/2023.10.06.561091v2). It works a bit differently, but the underlying models are the same, with the difference that we support time constraints on highways.
Best,
Benoit

--
You received this message because you are subscribed to the Google Groups "GeneRax" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generaxusers...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/generaxusers/4968d9d9-4e6d-48cb-8765-473555273385n%40googlegroups.com.

Alejandro Gil-Gómez

unread,
Feb 5, 2024, 2:37:19 PM2/5/24
to Benoit Morel, GeneRax
Thank you for the information! I will try AleRax with this flag and that should be fixed my issue.
--transfer-constraint RELDATED
Best, Alex
--
------------------------------------------------------------------------------------
Alejandro Gil-Gomez
Ph.D. Candidate, Department of Ecology & Evolution, Stony Brook University
Life Sciences Building, Room 671
Stony Brook, New York 11794-5245
Phone: 617-506-9872
pronouns: he/him

Alejandro Gil-Gómez

unread,
Feb 7, 2024, 11:18:36 AM2/7/24
to GeneRax
Hello Benoit,
I ran Alerax constraining for RELDATED, but I saw some unexpected output. Some transfers happen between parent nodes and children within the same clade, is that expected behavior? I attached the result I got as an svg.
My guess of what this is happening is that both nodes have wide confident intervals that overlap in time, even if we know that the ancestral lineage should not be contemporary to the modern lineage. I am not sure if using PARENT would fix this issue, or simply forbid transfers in the opposite direction.
Another question I have is that I need an NHX file to load into R to analyze the gene tree branch lengths along specific species tree branch lengths. This output was produced in generax but I don't see an equivalent in Alerax. There are gene trees with node information for each of the 100 reconciliations in the ucl file, but in the summary the only thing I find is a Newick gene tree with no node labels and summary tables with the events per node. Would it be possible to have a consensus tree that includes the per-node information? I am not sure if making a consensus recphyloXML and then converting it into nhx would make the tree, or if this wouldn't make sense.
I guess that I could take the output consensus gene tree and run it though GeneRax without gene tree optimization to get the nhx file.
Thank you in advance for your guidance,
Best, Alejandro.
OG0000117_fa_0.svg

Benoit Morel

unread,
Feb 8, 2024, 6:49:58 AM2/8/24
to Alejandro Gil-Gómez, GeneRax
Hi Alejandro,
The model does not forbid transfers from a parent P to a child C, because it could be the result of a transfer from an extinct lineage S such that P is the last common ancestor of C and S. Of course this is not a real transfer from P to C, but there is no better way to reconcile the gene tree with the species tree since S is not observed.
Now I'm not saying that this is what is really happening in your case: transfers are unfortunately really tricky to estimate accurately and with high confidence...
I hope this makes sense
Benoit

Alejandro Gil-Gómez

unread,
Feb 8, 2024, 1:07:17 PM2/8/24
to Benoit Morel, GeneRax
Thank you very much for your reply. That makes sense.
I think I was also able to find the file everything that I needed, the rec_uml has all the node information for the amalgamated tree that I was looking for. Is it possible to get a single reconciliation recphyloXML file from this final summary rec_uml tree? I tried thirdkind to combine multiple outputs into one visualization but it crashes if there are too many trees. This one shows only 20 of the 100 reconciliations for a single gene family.
image.png
Incredible work and congratulations for the program.
Best, Alejandro.

Libre de virus.www.avg.com

Alejandro Gil-Gómez

unread,
Feb 9, 2024, 4:38:07 AM2/9/24
to GeneRax
A final question I have is that the tips of the reconciliation files are labeled NULL, is that expected?
In generax the nodes were NULL but the tips had the labels from each of the proteins.
Thanks again.
Best, Alex.
Screenshot 2024-02-08 165005.png

Benoit Morel

unread,
Feb 9, 2024, 5:33:22 AM2/9/24
to Alejandro Gil-Gómez, GeneRax
Hi Alejandro,
Regarding null labels: you're right, I was not exporting them! Now it's fixed (you can run ./gitpull.sh and ./install.sh in the git repository to get the update). Thanks a lot for noticing.
Regarding Thirdkind and visualizing many reconciliations... I'm not sure how I would do it. If it crashes, I would encourage you to contact the developer (Simon Penel), he is very nice and reactive.
Best,
Benoit

Alejandro Gil-Gómez

unread,
Feb 9, 2024, 1:36:18 PM2/9/24
to Benoit Morel, GeneRax
Thank you so much for all the help!

Libre de virus.www.avg.com

Stepan Puhov

unread,
May 25, 2024, 10:39:28 AM5/25/24
to GeneRax
Hello Benoit,

I'd like to revive this thread, since the original issue brought by Alejandro seems to have remained unresolved.

As far as I get it from the source code, by default GeneRax would run with the implicit --transfer-constraint PARENTS option, so no HGTs from child to parent nodes are expected. However, in the picture from the first letter in this thread one can clearly observe such a forbidden transfer.

It seems to be a somewhat serious bug undermining the credibility of GeneRax reconciliation results. Moreover, since the two tools use similar reconciliation approaches, it may also be present in AleRax (though still unreported). It would be sad to have such powerful tools suffer from such weaknesses.

If you find some time, could you, please, try to fix this?


Best regards,
Stepan

Benoit Morel

unread,
May 27, 2024, 5:56:44 PM5/27/24
to Stepan Puhov, GeneRax
Hi Stepan,
i'm not sure I'll find the time, but I'll try
I don't think this happens with AleRax. The reconciliation code is different (although similar) and it seemed to work correctly when I tested it. I never found the time to get back to GeneRax to check the transfer constraints

Stepan Puhov

unread,
May 30, 2024, 8:51:05 AM5/30/24
to GeneRax
Hi Benoit,
Thank you for your commitment to help!

I've done a little search myself:
As I see, the transfer constraints are most directly used in the UndatedDTLModel.hpp file. There I could find several suspicious points that might be relevant:
1) In the recomputeSpeciesProbabilities function you use the same code for both the NONE and PARENTS constraints and leave a comment stating that it is not the right thing to do (see line 223).
2) In the getBestTransferLoss function you do not use constraints and so forbid only self-transfers (see line 663).

I have just zero background in C++, so I'm sorry if all of the mentioned is misleading.

Best regards,
Stepan

Benoit Morel

unread,
Jun 6, 2024, 1:34:32 PM6/6/24
to Stepan Puhov, GeneRax
Hi Stepan,
Thank you for the investigation. I'll try to fix this, but can't guarantee when I'll do it. It's already difficult to find the time to answer the questions in the group :(
Best,
Benoit

Reply all
Reply to author
Forward
0 new messages