CODEML analysis for genes with relatively high informative sites to check for positive selection at specific nodes

41 views
Skip to first unread message

Молдир Ермагамбетова

unread,
Apr 30, 2024, 6:02:11 AMApr 30
to PAML discussion group
Dear PAML authors and group,

Thanks for developing the useful package!

I’m a young researcher in Biology. I have no clue how to use CODEML and PAML. I have some questions and need help.

I have 8 variable regions, and I need to consider running a CODEML analysis for genes with relatively high informative sites to check for positive selection at specific nodes.

I prepared the Newick file, fasta file, and PML file. Are the files correct?

However, these 8 variable regions have different sizes, so I can’t cut or align these genes.

What do I do?

Files are attached

all test.nwk
all test2.pml
all test.fas

Janet Young

unread,
Apr 30, 2024, 9:05:14 PMApr 30
to PAML discussion group
hi there,

It is worth spending some time to learn some basic principles of molecular evolutionary analysis before trying to run PAML. I also recommend looking at some examples of how other people have successfully used PAML analysis for other genes.  That will help you understand the types of questions PAML is able to answer.

This paper might be a good place to start - https://academic.oup.com/mbe/article/40/4/msad041/7140562

It looks like your alignment contains the sequences of 8 totally different regions that are unalignable.  We typically use PAML to analyze a single gene at one time, for example using several aligned orthologs from different species for a single gene.  Or perhaps you might include paralogs of genes in a gene family, but you would not analyze unrelated genes together with each other.  So maybe you want to collect orthologous sequences for each of your genes before you run PAML?

For a group of sequences that you cannot align to each other, you will want to think hard about what sorts of evolutionary questions you want to ask - PAML is not an appropriate tool for that.  

Whatever analysis you do with PAML, you always want to start with aligned sequences, not unaligned, and it should always be a meaningful alignment (be careful, because alignment software always gives you some sort of alignment, even for unrelated sequences - it'll just be a bad alignment in that case. same for tree-building tools)

Good luck in your learning,

Janet
Reply all
Reply to author
Forward
0 new messages