dS Values

52 views

Skip to first unread message

Saam Hasan

unread,

Feb 15, 2024, 9:07:49 AMFeb 15

to PAML discussion group

Hi,

I have been advised that when including divergent species in a paml analysis, I should filter out anything that has a dS of over 1. For context, I have 3471 multiple sequence alignments for different single copy orthologs and have run the branch site model on all of them. I wanted to ask if that advise is sound? And if so, where in the output can I actually find dS values for each branch?

Thanks

Sandra AC

unread,

Feb 22, 2024, 5:23:42 AMFeb 22

to PAML discussion group

Hi there,

The idea is that large values for dS may mean saturation so that the precise value is not reliable. Perhaps 1 is too small, not yet large enough to be "worrying". You may want to try "dS > 3" as a better cut-off, although setting a threshold is always arbitrary and hard to establish.

You can find the dS values in the main output file (i.e., the file which name you specify with variable `outfile` in the control file) and more info in the `rst` file that will be generated by CODEML. You can find examples of output files in our GitHub repository `positive-selection`, which is supposed to be used alongside our protocol paper (Álvarez-Carretero et al., 2023). Some links to specific output files below:

Homogenous model, site models, branch model. When running CODEML under these models, you can look for line "dN & dS for each branch" in the main output file that will be generated. Below that line, you will have the header with various columns, one of them being the "dS" values for each branch.
Branch-site models. You can find this information (and much more) in the rst file. I suggest you go through this file and decide what information you want to extract. The main output file has the MLEs for dN/dS.

You may want to write a script to extract the relevant information for your analyses :)