Use Prodigal to predict CDS sequences in *eukaryotic* transcriptomic sequences

488 views
Skip to first unread message

Luca Venturini

unread,
Mar 10, 2017, 12:36:35 PM3/10/17
to prodigal-discuss
Hello,
         would it be feasible to use Prodigal in "anonymous" mode to predict the CDS in eukaryotic transcripts? I am testing it at the moment with this purpose, as the alternatives I am aware of (TransDecoder, GeneMarkS) fall very short of PRODIGAL in terms of speed.

Kind regards

Luca Venturini

dhyatt1

unread,
Mar 10, 2017, 12:55:21 PM3/10/17
to prodigal-discuss

You would have problems in anonymous/metagenomic mode because it will predict GTG and TTG starts.

My recommended workflow for this would be to combine a bunch of eukaryotic transcripts from the same organism and with similar levels
of GC content into a single file and then just run Prodigal with "-c -g 1" to force closed ends and ATG-only starts.  If the organism has a tight
GC-content distribution, you may be able to put everything in one file.

You could also consider the new tool GeneMarkS-T ( download here: http://topaz.gatech.edu/GeneMark/license_download.cgi ).
In this paper, they compared Prodigal to other tools, and it did ok, but not amazing (not really a fair comparison imo, since Prodigal
was never designed for this purpose).  Prodigal would struggle to recognize Kozak sequence, since it's got a lot of microbial-specific
rules.

Honestly, I could probably write a simple python script in a day that would do this better than existing tools.
Just a basic coding/noncoding bayes classifier + kozak sequence / upstream sequence analysis.
Not sure how much interest there is in this sort of thing, since I don't do much with eukaryotic gene prediction.

The hardest part is dealing with frame shifts in transcripts/transcript assemblies, but I don't know of any tools that do a good job
there.  Having worked with millions of poplar transcripts, they definitely happen a fair bit and you have a hit to a known protein
in two different reading frames, etc. 

I've used TransDecoder quite a bit and it's still my favorite, but, yes it's very slow.

regards,
doug
Reply all
Reply to author
Forward
0 new messages