overlap correction

137 views
Skip to first unread message

Intikhab

unread,
Mar 22, 2012, 10:02:31 PM3/22/12
to prodigal-discuss
Hi There,

I am quite interested in using Prodigal for prokaryotic gene
predictions but at time I get overlapping prediction results e.g. 5'
of a gene falls within 3' of previous gene.

Is there a way to control it and avoid overlapping predictions e.g.
shifting start position of the next gene where it overlaps with
previous gene's 3'?

Furthermore, is it possible to have explicitly whether the predicted
gene lack 5' or 3' end etc?

Many Thanks,

Intikhab

Torsten Seemann

unread,
Mar 24, 2012, 7:31:09 PM3/24/12
to prodigal...@googlegroups.com, intikhab....@gmail.com
I am quite interested in using Prodigal for prokaryotic gene
predictions but at time I get overlapping prediction results e.g. 5'
of a gene falls within 3' of previous gene.

Bacteria contain lots of these overlapping genes - are you sure you want to ignore them?
They are valid. eg. operons. 

Is there a way to control it and avoid overlapping predictions e.g.
shifting start position of the next gene where it overlaps with
previous gene's 3'?

AFAIK, Prodigal does not do this, because you will miss real genes, as overlapping genes are real. You could write a Bio{Perl/Python} script to do this.
 
Furthermore, is it possible to have explicitly whether the predicted
gene lack 5' or 3' end etc?

The README talks about the /note field in the .gbk output:

The "partial=01", etc., field is used to indicate if genes continue off the 
edges of the contig.  A '0' indicates that the gene is contained within the
contig, and a '1' indicates the gene runs off that edge.  So '11' runs off both
edges of the contig, '10' runs off the left edge, '01' runs off the right edge,
and '00' is fully contained within the contig


--
--Dr Torsten Seemann
--Scientific Director : Victorian Bioinformatics Consortium, Monash University, AUSTRALIA
--Senior Researcher : VLSCI Life Sciences Computation Centre, Parkville, AUSTRALIA
--http://www.bioinformatics.net.au/



dhyatt1

unread,
Mar 26, 2012, 12:11:54 PM3/26/12
to prodigal-discuss

This is an excellent response.

If you must eliminate all overlaps (not recommended), you can go into
the source
code (the files node.h and dprog.h) and set MAX_SAM_OVLP and
MAX_OPP_OVLP to 0.
I can't vouch for the program's behavior if you do this, though,
although I tried it on
E. coli and it worked. Again, I agree with Torsten that this is a bad
idea.

In addition to 11, 10, or 01, Prodigal uses the normal Genbank
convention of < and >
for genes that run off the edge in its Genbank output, so for example
<2..399 means the
gene runs off the left edge, and 50..>1399 means it runs off the right
edge.

regards,
doug

On Mar 24, 7:31 pm, Torsten Seemann <torsten.seem...@monash.edu>
wrote:
> > I am quite interested in using Prodigal for prokaryotic gene
> > predictions but at time I get overlapping prediction results e.g. 5'
> > of a gene falls within 3' of previous gene.
>
> Bacteria contain lots of these overlapping genes - are you sure you want to
> ignore them?
> They are valid. eg. operons.
>
> Is there a way to control it and avoid overlapping predictions e.g.
>
> > shifting start position of the next gene where it overlaps with
> > previous gene's 3'?
>
> AFAIK, Prodigal does not do this, because you will miss real genes, as
> overlapping genes are real. You could write a Bio{Perl/Python} script to do
> this.
>
> > Furthermore, is it possible to have explicitly whether the predicted
> > gene lack 5' or 3' end etc?
>
> The README talks about the /note field in the .gbk output:
>
> *The "partial=01", etc., field is used to indicate if genes continue off
> the *
> *edges of the contig.  A '0' indicates that the gene is contained within the
> *
> *contig, and a '1' indicates the gene runs off that edge.  So '11' runs off
> both*
> *edges of the contig, '10' runs off the left edge, '01' runs off the right
> edge,*
> *and '00' is fully contained within the contig*
>
> --
> *--Dr Torsten Seemann
> --Scientific Director : Victorian Bioinformatics Consortium, Monash
> University, AUSTRALIA*
> *--Senior Researcher : VLSCI Life Sciences Computation Centre, Parkville,
> AUSTRALIA
> --http://www.bioinformatics.net.au/*

Dr. Intikhab Alam

unread,
Mar 26, 2012, 1:21:39 PM3/26/12
to prodigal...@googlegroups.com
Dear Dough,

Thanks for your email and update. I do not want to totally discard the overlaps but would like to see if I can control them, e.g. I would like to allow small overlaps but not very large ones.

I also asked about the frame information from gff output I generate. It always show it as 0.

Another question was about frameshifts, do you assume any frameshifts while Prodigal produce gene predictions?

Many Thanks for help.

Intikhab
Reply all
Reply to author
Forward
0 new messages