specifying aligner options through SATe

Rohan Maddamsetti

unread,

Dec 9, 2011, 2:40:00 PM12/9/11

to SATe User

Hi everybody,

I'm running SATe as part of a pipeline, where I align orthologous
protein-coding sequences, and calculate dN/dS using the codeml program
in the PAML package. Because codeml cannot interpret gap characters,
it divides the alignment into triplets, and removes triplets that
contain gap characters. Since most DNA alignment programs don't
preserve reading frame, this means my alignment often translates into
a garbage protein sequence.

The prank alignment program that is bundled with SATe is supposed to
be able to handle this issue appropriately: it has options both to
align protein-coding sequences (-codon), and to align DNA sequences
based on their translation into protein (-translate).

My question is the following: is there a way to specify these options
to prank through SATe's interface? Or will I have to muck around with
the SATe source to get this to work?

Cheers,
Rohan

Mark Holder

unread,

Dec 9, 2011, 9:18:57 PM12/9/11

to sate...@googlegroups.com

Hi Rohan,
Note: My comments below all related to version 2.1.0 (available from http://phylo.bio.ku.edu/software/sate/sate.html ). This is the currently recommended version of the software for all users. Please update to that version if you are running an earlier version.

I'm afraid that there is not a great solution at this point for your codon-level analysis.

We haven't tested adding general user-controlled options to prank. We can add that to the next version. The changes that would need to be added to the code to tell Prank to use "-codon" or "-translate" are pretty straightforward (I have some notes below on how to modify the code, if you do want to try this).

Beyond invoking Prank in the correct way, there is one other issue. To produce an alignment in each iteration, SATé uses both alignment tools and merger tools (which merge separate alignments into a single alignment). The currently supported merger tools are Opal and Muscle. Unfortunately (as far as I know) neither of those tools support the equivalent of the codon options that you want to use. The implication of this is that even if Prank forces gaps to occur in triples, the merger tools would probably insert gaps in ways that do not honor the reading frame.

I'll look into what it would take to modify Muscle to do this (but that won't be trivial), so that we can give you a more satisfying answer. In the meantime, there are a couple of things that you could try in the short-term. Both have substantial downsides, and I don't recommend trying either of them. But just in case, I'll describe them below.

So in short: We will modify SATé to accept flexible user-specified options for Prank in the next version of SATé. This will be easy, but won't solve all of the issues that you face when it comes to using SATé for this analysis. I'll try to see if I can work a codon option into muscle (this won't be easy or a particularly quick fix, I'm afraid).

all the best,
Mark

#########################################
Notes on modifying the code if you want to add arbitrary options to prank
#########################################

In the source code file sate/tools.py line 338 you'll see a line that starts with:

invoc = [self.exe, '-once',

this is the line that creates the list of arguments that constitute the invocation of prank. If you add a line *after* this one that looks like this:

invoc.extend(['-codon'])

then Prank will be invoked with the -codon option. Note that the line has to start with the appropriate number of spaces (Python is white-space dependent). The line should line up with the preceding and following line. You also have to make sure that you edit the code in text editor.

#########################################
Notes on dealing with the fact that the merger tools that do not honor the reading frame.
#########################################

A couple of possible workarounds (with substantial downsides).

In both cases you would have to modify the SATe code. Next
1. You could run SATé with options that tell it to not decompose the tree (in the command line version --max-subproblem-size=N where N is the number of sequences in your dataset). Running SATé without decomposition is untested, and would sacrifice important aspects of the algorithmic design. Basically, doing this would reduce SATé to an iterative approach of aligning with Prank then inferring the tree. I don't really think that this would be helpful, but it might help you assess the guide tree dependencies of Prank. It would not improve the running-time behavior of the alignment, because the decomposition into smaller subsets of the sequences are an important aspect of fast running times.

or

2. After modifying the code, you could run SATé and just check to see if the merger tool actually does break the reading frame with its alignments. If your sequences are not very divergent, then the combination of a codon-level alignment of Prank and easy alignments at deeper levels might result in alignments that could be analyzed with codeml. I'm afraid that, for more divergent sequences, there is a very good chance that this strategy will not work.

> --
> You received this message because you are subscribed to the Google Groups "SATe User" group.
> To post to this group, send email to sate...@googlegroups.com
> To unsubscribe from this group, send email to sate-user+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sate-user?hl=en

Rohan Maddamsetti

unread,

Dec 13, 2011, 8:59:12 PM12/13/11

to SATe User

Thanks for your quick and thorough reply, Mark. I tried adding '-
codon' option to prank, but you were right in predicting that the
merger tool would break the reading frame of the alignment. For the
time being, I'm going to write an ugly fix, where I align protein
sequences with SATe, and then back-translate the alignment into DNA,
for codeml to use. I wish I could hack muscle to handle protein-coding
sequences, but I don't have the algorithms experience to really do an
adequate job.

Cheers,
Rohan

On Dec 9, 9:18 pm, Mark Holder <mthol...@gmail.com> wrote:
> Hi Rohan,

> Note: My comments below all related to version 2.1.0 (available fromhttp://phylo.bio.ku.edu/software/sate/sate.html). This is the currently recommended version of the software for all users. Please update to that version if you are running an earlier version.

Reply all

Reply to author

Forward