Hi,
I have a big dataset composed of several species of bacteria of a same genera (~1000 isolates). I have two objectives: (i) build a species tree and (ii) get the DTL events for some gene families of interest.
I see 2 possible strategies:
1) either I firstly make a species tree with classic methods, using only core genes or ribosomal genes for exemple, and then, as a second step, I use a second software to reconciliate my gene families of interest with my species tree.
2) I use AleRax to do both at ones.
In the strategy 1, I should better use GeneRax instead of AleRax, right?
In the strategy 2, I am not sure to understand what gene families I should use? I suppose I cannot use only my few gene families of interest as they are not that many of them and as these gene families may have a lot of DTL events (thus, a story quite different than the one of the species tree) ? So should I use all annoated genes in this entire dataset for a better species tree inference? Or just a subset would be enough ? (in this case, how should I select them?) And at the end, I could focus only on the outputs concerning my gene families of interest.
I think I am confused because the exemple with primates is with 8 gene families only, whereas the one with Archae in the paper is with 5379 gene families. In. both cases, I did not get how these genes were selected?
Thank you,
Héloïse