Hi all,
A couple of questions for someone new to PAML but not phylogenetics. I'm trying to estimate dN/dS and site specific rates for representative transcripts mined from a de novo RNAseq assembly, some of which either are incomplete transcripts or truly have missing triplets among species.
I apologize for beating a dead horse, but is the consensus that codeml simply will not function with any amount of gaps in the alignment? I know what the manual says, but all of my alignments with any amount of gaps (even missing 1 triplet in 1 sequence) completely fail with cleandata=0.
This may also be a can of worms, but I presume that most folks estimate dN/dS from alignments forced on the species tree. However, considering Matt Hahn's and other works about mutations optimized on incongruent trees, and ignoring for now the logistics of getting an accurate gene tree, should dN/dS most properly be estimated from the gene tree topology and not the species tree topology (assuming these differ)?
Thanks. Stu