Use Prodigal to predict CDS sequences in eukaryotic transcriptomic sequences

587 views

Skip to first unread message

Luca Venturini

unread,

Mar 10, 2017, 12:36:35 PM3/10/17

to prodigal-discuss

Hello,
would it be feasible to use Prodigal in "anonymous" mode to predict the CDS in eukaryotic transcripts? I am testing it at the moment with this purpose, as the alternatives I am aware of (TransDecoder, GeneMarkS) fall very short of PRODIGAL in terms of speed.

Kind regards

Luca Venturini

dhyatt1

unread,

Mar 10, 2017, 12:55:21 PM3/10/17

to prodigal-discuss

You would have problems in anonymous/metagenomic mode because it will predict GTG and TTG starts.

My recommended workflow for this would be to combine a bunch of eukaryotic transcripts from the same organism and with similar levels

of GC content into a single file and then just run Prodigal with "-c -g 1" to force closed ends and ATG-only starts. If the organism has a tight

GC-content distribution, you may be able to put everything in one file.

You could also consider the new tool GeneMarkS-T ( download here: http://topaz.gatech.edu/GeneMark/license_download.cgi ).

In this paper, they compared Prodigal to other tools, and it did ok, but not amazing (not really a fair comparison imo, since Prodigal

was never designed for this purpose). Prodigal would struggle to recognize Kozak sequence, since it's got a lot of microbial-specific

rules.

Honestly, I could probably write a simple python script in a day that would do this better than existing tools.

Just a basic coding/noncoding bayes classifier + kozak sequence / upstream sequence analysis.

Not sure how much interest there is in this sort of thing, since I don't do much with eukaryotic gene prediction.

The hardest part is dealing with frame shifts in transcripts/transcript assemblies, but I don't know of any tools that do a good job

there. Having worked with millions of poplar transcripts, they definitely happen a fair bit and you have a hit to a known protein

in two different reading frames, etc.

I've used TransDecoder quite a bit and it's still my favorite, but, yes it's very slow.

regards,

doug

Reply all

Reply to author

Forward

0 new messages

Use Prodigal to predict CDS sequences in *eukaryotic* transcriptomic sequences

Luca Venturini

dhyatt1

Use Prodigal to predict CDS sequences in eukaryotic transcriptomic sequences