missin per - specie and per - family rates

78 views
Skip to first unread message

Jacopo Martelossi

unread,
Apr 11, 2022, 4:25:32 AM4/11/22
to GeneRax
Hi GeneRax comunity,

I am trying to analyse a dataset composed of 210 Bacteria genomes using SpecieRax and GeneRax v2.0.4 (from Bioconda), but I am encountering several problems. Briefly, I have inferred orthogroups from predicted proteomes and used ParGenes + SpecieRax to build a species tree and correct and reconcile all gene trees (from OG with more than 4 members) with the command line:

mpiexec -np 16 generax --families FamiliFile.txt --si-strategy HYBRID --species-tree MiniNJ --rec-model UndatedDTL --per-family-rates --prune-species-tree --si-estimate-bl --si-quartet-support

Everything works well and I also got a very nice specie tree and reasonable "per specie event counts". However I was interested in deeply analyze OG with highest loss and transfer rates but I can not find the "per - family rates" in the output folder. I have also tried to use the previously corrected gene trees (since the process take 3 weeks for 10,564 OG and a total of 591,169 sequences) to compute "per - specie rates" with the "reconcile" strategy and the command line:

mpiexec -np 16 generax -f FamiliFile_Reconciled2.txt -s Enterobacteria_SpecieRax/species_trees/inferred_species_tree.newick --per-species-rates --strategy RECONCILE -p Enterobacteria_GeneRax_PerSpecieRate

but also the "per species rates" are missing. This is quite strange since I already successfully used GeneRax with the " --per-species-rates" in another project with fewer gene trees and species. Do you have some hints on what is going on?

Cheers,

Jacopo M.

Benoit Morel

unread,
Apr 11, 2022, 5:20:16 AM4/11/22
to GeneRax
Dear Jacopo,

thanks for your post. That's strange. I will give it a try tomorrow, to see why the rates might be missing. I'll come back to you asap ;-)

Cheers,
Benoit

Jacopo Martelossi

unread,
Apr 13, 2022, 4:23:54 AM4/13/22
to GeneRax
Hi Benoit,

thank you for the replay. If you want I can share the data with you. Yesterday I tried with a smaller number of OG (10) but I got the same problems.

Jacopo M.

Benoit Morel

unread,
Apr 13, 2022, 5:33:33 AM4/13/22
to GeneRax
Dear Jacopo,

First, with --per-family-rates: you should find one file per family:
your_generax_output/results/family_name/stats.txt
The second line should have 3 float numbers that should correspond to the DTL rates (I don't remember in which order, but it should be in the wiki). Do you obtain this file? If not, let me know, and you don't need to read the next line yet ;-)
If you have this file: apparently, if you run generax with --prune-species-tree (which is recommended to infer the species tree), we output 10^-7 for all three rates. I need to check if there is a good reason for this... Meanwhile, you can run generax using your inferred species tree and without the --prune-species-tree option to get some meaningful rates (same as your second run, but with --per-family-rates instead of --per-species-rates

Second, with --per-species-rates:
There is indeed a problem with GeneRax. By design, the per-species rates are computed after each SPR round. Since you are running with the RECONCILE option, we skip the SPR rounds and we do not compute them. But we should... I will try to fix this, but I won't be able to do it before next week. A possible bypass: You could try to run it with --strategy SPR and --max-spr-radius 1 (which should be faster than the default radius), but I am afraid this would also take a quite some time...

Benoit

Jacopo Martelossi

unread,
Apr 13, 2022, 6:19:21 AM4/13/22
to GeneRax

for point 1: the files actually exist but DTL reconciliation rates for all families are equal to 0.2 (speciation rates are 1 as expected). As far as I understand these should be the non - optimized rates right?
for point 2: I have just rerun the analyses as you suggested, I'll let you know :-)

Thank you again for the help!

Benoit Morel

unread,
Apr 13, 2022, 7:01:30 AM4/13/22
to Jacopo Martelossi, GeneRax
Yes exactly. For point 1, I would try without the prune node, as explained in my previous message. But I guess you are more interested in the per-species rates ;-)
I will let you know when I manage to fix the second issue.

--
You received this message because you are subscribed to the Google Groups "GeneRax" group.
To unsubscribe from this group and stop receiving emails from it, send an email to generaxusers...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/generaxusers/2d5c3466-9980-42ca-af5e-1877b739d9d8n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages