I am new to CAFE and trying to learn how best I can use it.
I ran CAFE for different values of k, and three phylogenies from the same pool of species. My species-of-interest has roughly half the average number of genes found in other species in the phylogeny.
Results
Run_1: Started with a 20 species phylogeny. All ortholog families had failures at all values of k tested, 1 to 5. My species-of-interest showed contractions for at least 2k gene families.
Run_2: Switched to a 9 species phylogeny. No failures at k 1 to 6. But my species-of-interest showed contractions for about 200 gene families.
Run_3: Redid the phylogeny using 14 species. No failures at k 1 to 4, all families failed at k 4 to 6. Now my species-of-interest shows contractions for at least 2k gene families again.
In all three runs, lambda is maximum at k=2. And likelihood decreased as value of k increased for Run_2 and Run_3. For Run_1, the likelihood values first decreased and then increased.
Please see attached files.
Queries
1. The family expansion/contraction output for different k values within a given 'Run' does not vary by much. If I had started with Run_2 and left it that, then the CAFE results would have little utility in terms of gene family contractions. Is there a sweet spot of species or evolutionary time scale to be used?
2. The family expansion/contraction output changes with the phylogeny. For Run_1 and Run_3, I get similar results in terms of gene family contractions, except I get failures for Run_1. What do these failures really mean if the contraction numbers are the same?
3. I am using the divergence time estimates between the outgroup and its closest branch for generating the ultrametric tree. Would the choice of outgroup change the outcome?
4. Even though the family expansion/contraction output for different k values within a given 'Run' does not vary by much, for Run_3, do I use k=2 with max lambda, or k=4 with min likelihood without any failures?
Hope this makes sense.
Thanks.