If this isn’t a bug please explain what is going on.
I am using prodigal 2.60 for CDS predictions of some Neisseria meningitidis isolates. In doing the genome comparisons I came across one particular gene which should be identical in all of the isolates but there appeared to be two different versions in the group being studied. I eventually traced this back to the start codon being chosen by prodigal. In one case a rbs_motif is being detected and results in the start codon to occur further into the ORF (I’m using the term ORF as stop to stop). In the other case no rbs_motif is being detected and the start codon is the first Met in the ORF and results in a larger protein. The confusing thing is that the sequences of the ORFs are identical. So why is prodigal picking up the rbs_motif in one case and not the other?
I’ve run a couple of these assemblies through multiple iterations of prodigal (20 repetitions each) and the results are identical for each isolate. So the behaviour is reproducible.
I’ve attached a bit of a summary if you can make sense out of it. I did find one commonality between the two groups but it would seem unrelated. In the group with the larger CDS prediction the assembly breaks the contig a few hundred bases up stream of the start of the ORF. In the other group the region is contiguous with the upstream region. I’ve been able to order the contigs in this region as the genes in question are part of the capsule synthesis pathway and the organization is highly conserved over 3 or 4 operons.
One final note. When I run prodigal with the –n switch the CDS predictions are the same between the two groups with the same rbs_motif being identified in all (but it appears to be a different motif than what is found when the –n switch is not applied)
If you agree this is odd behaviour I would be happy to
share the input data if you need it to investigate and correct.
Shaun
The ORFs are identical on the nucleotide level. There is the odd SNP amongst the entire group but nothing within the region that would be considered for rbs predictions. And no consistency between the two groups that I can see.
I didn’t mention but I also tried with a training set generated from one strain and that didn’t change anything. I should also mention that the collection of isolates I’m looking at are clonal. They are not just the same genus or species but virtually identical. Well pretty close. These bug (ers) do a lot of recombination which is proving to be a big pain. But the majority of the gene content, genome organisation, etc. is consistent with comparing virtually identical isolates.
So the only thing I can see that is consistent is that the one group contains contig breaks so the CDS predictions are at the start of the contigs (I’m not allowing for partials) and the others have a contiguous sequence through this region.
Shaun
So the only thing I can see that is consistent is that the one group contains contig breaks so the CDS predictions are at the start of the contigs (I’m not allowing for partials) and the others have a contiguous sequence through this region.
Pretty good guess ;-) Yes there is an IS element just upstream which causes the break. It is actually present in the Genbank sequence I've been using as a reference. Some of my strains have it and some don't. Obviously the ones that are contiguous are lacking the IS. However, that still doesn’t seem to explain the difference in start predictions and why prodigal finds an rbs motif in one case and not the other. Again the sequences of the ORFs are identical and when I bypass the Shine-Dalgarno trainer (-n) the motif is found in both versions.
--
You received this message because you are subscribed to the Google Groups "prodigal-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prodigal-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.