Dear Benoit,
First, thank you for developing such a powerful and useful tool.
I am using AleRax for ancestral genome content reconstruction, and I would like to be certain about the correct way to interpret the results. My goal is to determine the support/probability for the presence of a gene family at each ancestral node.
I found a relevant discussion in this group from about 7 months ago (
https://groups.google.com/u/1/g/generaxusers/c/tBZ_fDjL5Qk). In that thread, it is suggested that to judge the presence in a common ancestor, one should count the frequency across the 100 individual reconciliation samples.
However, when I inspect the output from my run (using the latest version), the
[FAMILY]_meanSpeciesEventCounts.txt file seems to provide this exact information directly, even for internal nodes.
For example, I see lines like this:
species_label, ..., presence, ...
Node_A_B_0, ..., 0.96, ...
I interpret this to mean that for the ancestral node Node_A..., the gene family was present in 96 out of 100 samples, giving it a support of 96%.
My question is: Is my interpretation correct? Is using the presence column from the _meanSpeciesEventCounts.txt file the officially recommended method to get the support for a gene's presence at ancestral nodes? Or is there a subtle reason why parsing the 100 individual files is still the more accurate approach?
Clarity on this point would be extremely helpful for ensuring the accuracy of my analysis.
Thank you for your time and clarification.
Best regards,
Takahashi