Re: [maker-devel] Maker annotation result contain 10% of gene with incorrect start or stop codon

114 views
Skip to first unread message

Jacques Dainat

unread,
Mar 25, 2021, 5:04:13 PM3/25/21
to 廖家緯, maker...@yandell-lab.org
Hi,

I met this problem in some projects where the ORFs were not well defined. In the mRNA it was not the longest ORF chosen, which is not necessarily wrong but here it was obviously not the correct one chosen. Probably due to bad training of my abinitio tools. 
I ended up to develop a script to fix the predictions and use the longest ORF as CDS. The script is  called agat_sp_fix_longest_ORF.pl 
It is available within AGAT (https://github.com/NBISweden/AGAT)

Hoping it could help,

Best regards,

Jacques Dainat, Ph.D.


On 8 Mar 2021, at 11:25, 廖家緯 <jwli...@gmail.com> wrote:

Hi maker-devel group,

I used the maker with SNAP and Augustus for annotating a green algae genome. I always set the 'always_complete=1'  from the first round of annotation.

After Augustus training, I still get around 1111 and 208 genes that don't have the correct start and stop codon. (Total annotated gene number is 12696)

I provided the close species proteome and the green algae its own RNA-seq data for EST hint.

Does that make sense? Does there have any way to improve the result or fix the incorrect start and stop codon for those gene?

best,
Chai-Wei Liao

--------------------------------------------------

Chia-Wei Liao 廖家緯

Research Assistant

Institute of Molecular Biology,

Academia Sinica, Taipei City, Taiwan

Phone: 886-2-2789-9216 (Lab)

_______________________________________________
maker-devel mailing list
maker...@yandell-lab.org
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Mar 26, 2021, 12:00:17 PM3/26/21
to Jacques Dainat, 廖家緯, maker...@yandell-lab.org
Thanks for this. Just some info on how MAKER sets the start/stop codon. MAKER will maintain the reading frame and codons of the gene predictor, so if a longer ORF exists in another reading frame, MAKER will not switch to it (this is because the gene predictor is saying the other reading frame is less probable). MAKER can extend the ORF in the same reading frame, but only if the initial prediction is partial, mRNA alignment suggests an extension, and the ORF extends to a canonical start in the same frame as the prediction. MAKER can also truncate the ORF if the gene prediction is partial to begin with and there is mRNA evidence of UTR (this can indicate an assembly error in the gene that artificially splits the ORF - or also false merge of neighboring genes through bad mRNA-seq assembly).  The always_complete options adds one extra step that is not necessarily biologically correct, but does help make genes more canonical.  When set, maker will walk off the edge of the CDS without (mRNA evidence) in both directions and extend to the first canonical start or canonical stop that it encounters.  It’s beneficial when using protein2genome alignments for homology based annotation since the alignments can be fuzzy near the edges.

Thanks,
Carson
Reply all
Reply to author
Forward
0 new messages