Order of leafs/nodes in stats file

12 views
Skip to first unread message

Carolina Kurotusch Canettieri

unread,
Jul 10, 2025, 3:31:49 PMJul 10
to GeneRax
Dear Benoit,

Congrats for developing such an incredible tool! It's a huge gain for the study of gene families evolution to have gene-tree reconciliated results for duplications and losses, besides the count-based ones. Thank you for your efforts!

I'm running through the results of GeneRax and I've got a question regarding the stats.txt file generated per family in the results folder. Are there three rates per leaf and node of the tree, right? If so, in each order are the results, regarding the leaves and nodes?

I apologize in advance if I've inattentively missed any explanation about it in Wiki.

Best,
Carolina

Benoit Morel

unread,
Jul 13, 2025, 3:42:23 PMJul 13
to Carolina Kurotusch Canettieri, GeneRax
Dear Carolina;
Thank you for your kind words :-)
As far as I remember, the stats file gives the DTL rates per family over all nodes (3 values in the second line). If you get more values, can you send me one stats file and the command line you used?
If you are interested in per-nodes rates, you can either:
- use per-species rates (but then you can't have per-family AND per-species rates, because the model would be overparameterized: there would be more parameters than data to estimate them)
- look at the frequency of the events in the counts. You still get the advantage of probabilistic methods over "count" methods because the inference is more reliable. But they are not rates
- use AleRax: it uses the same model as GeneRax but it infers distributions of reconciled gene trees (and thus you better account for the uncertainty in the results). It's a bit more efforts though

I hope this helps,
Benoit

--
You received this message because you are subscribed to the Google Groups "GeneRax" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generaxusers...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/generaxusers/639f18b5-653d-445c-918f-f4f3d4b1cc62n%40googlegroups.com.

Carolina Kurotusch Canettieri

unread,
Jul 13, 2025, 11:19:27 PMJul 13
to Benoit Morel, GeneRax
Dear Benoit,

That's right! I just checked the stats.txt file of each gene family and they are all the same. I'm attaching one to this email.
The command I ran was:
mpiexec -np 40 generax -f map.txt -s tree-simplified.nwk -r UndatedDTL --prefix Output --seed 85315 --per-species-rates --prune-species-tree
So, as I used the per-species estimation, I should indeed expect the same rates for all families. I saw one stats.txt per family and the initial thought was that they'd be different, although I was aware about the overparameterization issue. I missed it for a moment.

The species tree I used has 24 leaves and 22 internal branches, and it is rooted (so I should count one external branch for it, right?). Here it is:
(Sp,(Sp,(Sp,(Sp,((Sp,Sp),((((Sp,Sp),(Sp,Sp)),(((((Sp,Sp),Sp),Sp),Sp),(Sp,(Sp,Sp)))),((Sp,Sp),((Sp,Sp),(Sp,Sp)))))))));

In the stats.file, there are 141 values, that is, 3 values per 47 something. As I used the per-species estimation, I supposed it printed 3 rates per 47 branches. Is that so, then?

I'll take a look on how to process the results in the eventcounts files for my data :)

Thank you for suggesting AleRax! I saw some comments in the Google Groups about it, but it didn't come to my mind that it would be a more recent tool. I quickly checked the article and its Wiki page. Very interesting! I'm not sure if I'll be able to run it for my final analysis, but I'll keep it in mind for future projects!

Best,
Carolina
stats.txt
Reply all
Reply to author
Forward
0 new messages