[maker-devel] tbl2asn errors

358 views
Skip to first unread message

Mack, Brian

unread,
Apr 17, 2014, 4:34:21 PM4/17/14
to maker...@yandell-lab.org

Hi, I thought I would try asking my question here as NCBI was not able to give me much assistance.  In preparation for submitting to NCBI, I converted my my MAKER gff3 to NCBI tbl format using the gff32tbl script that Carson posted a link to in this thread (http://gmod.827538.n3.nabble.com/NCBI-feature-table-tt4040473.html#a4040475). It seemed to have converted fine, however when I use NCBIs tbl2asn program I get numerous errors in my errorsummary.val file:

 

     4 ERROR:   SEQ_FEAT.BadTrailingCharacter

   217 ERROR:   SEQ_FEAT.NoStop

   438 ERROR:   SEQ_FEAT.ShortIntron

   171 ERROR:   SEQ_FEAT.StartCodon

   171 ERROR:   SEQ_INST.BadProteinStart

   291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor

   648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor

   118 WARNING: SEQ_FEAT.ShortExon

 

In addition, all of the genes, cds, and mRNA coordinates in the resulting sqn files are decreased by one. For example my tbl file will have gene coordinates of 440869 – 441931, but the sqn file will have 440868 – 441930. Any ideas what might be causing this?

 

Thanks,

Brian





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

Carson Holt

unread,
Apr 17, 2014, 4:59:05 PM4/17/14
to Mack, Brian, maker...@yandell-lab.org

The only one that may be a real error is the first one (I'm not sure what it means).  You probably need to find them and open them in a viewer like apollo.  The rest I would consider warnings (the NCBI tool doesn't like any weirdness or uncertainty).  You often have to manually edit things to get NCBI to accept all models without complaining (sometimes even going against real biology).  I know some groups use the always_complete=1 option in MAKER to force start and stop codons into every model for example (even though those forced codons are probably false).


*Not sure about this one -->     4 ERROR:   SEQ_FEAT.BadTrailingCharacter

*These are partial genes with no stop (usually happen at the edge of contigs or near strings of NNNN) -->    217 ERROR:   SEQ_FEAT.NoStop

*These are just short introns (intron size is under control of the ab initio predictors) -->   438 ERROR:   SEQ_FEAT.ShortIntron

*These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) --> 171 ERROR:   SEQ_FEAT.StartCodon

*These are partial genes with no start (usually happen at the edge of contigs or near strings of NNNN) -->    171 ERROR:   SEQ_INST.BadProteinStart

*Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) -->  291 WARNING: SEQ_FEAT.NotSpliceConsensusAcceptor

*Non-cononical splicing (can be produced by the ab initio predictor or suggested by EST evidence) -->  648 WARNING: SEQ_FEAT.NotSpliceConsensusDonor

*These are just short exons (exon size is under control of the ab initio predictors) -->   118 WARNING: SEQ_FEAT.ShortExon


You probably need to identify examples of models causing each issue, and then look at the in Apollo.  Apollo lets you open tbl format and save back to it.  I imagine the coordinate change is from NCBI using a 0 based coordinate system as opposed to a 1 based system (I.e. first base is 0 rather than 1).  Unfortunately getting everything to go into NCBI is usually a grueling task.

--Carson


_______________________________________________ maker-devel mailing list maker...@box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Geib, Scott

unread,
Apr 17, 2014, 4:59:22 PM4/17/14
to Mack, Brian, maker...@yandell-lab.org, Brian Hall (bhall7@hawaii.edu)

Hi Brian,

We have a tool to deal with this in development, you should not directly upload your maker output to NCBI, you need to filter out genes, check that things are sane, etc. 

http://brianreallymany.github.io/GAG/

It is still in active development, first full release is planned for the end of this month (if you can wait 1.5 weeks).  It has no dependencies and maintains parent/child relationships (for example if you remove a gene, it will also remove associated CDS/mRNA).  In a release planned for then end of the month, you will be able to  perform functions like removing short features, long features, flagging things for review, etc. It also generates an updated genome.fasta file, gff3 file, and sequences files for CDS/mRNA/peptide based on edits made.  Hopefully this is helpful to you.


Scott

Carson Holt

unread,
Apr 17, 2014, 5:27:53 PM4/17/14
to Geib, Scott, Mack, Brian, maker...@yandell-lab.org, Brian Hall (bhall7@hawaii.edu)
Very cool.  I'll try it out as well.

--Carson

Geib, Scott

unread,
Apr 17, 2014, 6:37:49 PM4/17/14
to Carson Holt, Mack, Brian, maker...@yandell-lab.org, Brian Hall (bhall7@hawaii.edu)

Just so not to be discouraged, current version has limited functionality and is pretty much un-documented (although will write a .tbl file).  Will email the list when first real release is complete and documented.

Scott

Shaun Jackman

unread,
Oct 6, 2014, 7:29:35 PM10/6/14
to Geib, Scott, maker...@yandell-lab.org, Brian Hall (bhall7@hawaii.edu)
Hi, Scott, Carson. What's currently the best/easiest way to convert a MAKER GFF to GenBank TBL format, and what's the state of your GAG tool, Scott?

Cheers,
Shaun

Barry Moore

unread,
Oct 6, 2014, 11:53:35 PM10/6/14
to Geib, Scott, maker...@yandell-lab.org, Brian Hall (bhall7@hawaii.edu)
Hi Scott, 

Just FYI, github is giving me a 404 error on the link below.  Were others able to follow the link successfully? 

B

Barry Moore
-------------------------------------------------
Director, Research & Science
USTAR Center for Genetic Discovery
Dept. of Human Genetics
University of Utah
Salt Lake City, UT

Brian Hall

unread,
Oct 7, 2014, 12:03:03 AM10/7/14
to Barry Moore, Geib, Scott, maker...@yandell-lab.org
Hi Barry,

Try this one:

http://genomeannotation.github.io/GAG/

Sorry about that!

--Brian Hall

Barry Moore

unread,
Oct 7, 2014, 12:06:49 AM10/7/14
to Brian Hall, maker...@yandell-lab.org, Geib, Scott, Barry Moore
Cool, thanks Brian,

B

Barry Moore
-------------------------------------------------
Director, Research & Science
USTAR Center for Genetic Discovery
Dept. of Human Genetics
University of Utah
Salt Lake City, UT

Geib, Scott

unread,
Oct 8, 2014, 12:34:11 PM10/8/14
to Shaun Jackman, maker...@yandell-lab.org, Brian Hall (bhall7@hawaii.edu)

Hi,
I know Carson had a script to generate a tbl file he had posted before.  If you want to do more filtering, GAG should work.  If you come across any issues, please post a bug on the github page. 

 

http://genomeannotation.github.io

 

Also, NCBI is a bit of a moving target on what their current format is that they accept.   You should be able to supply a scaffold assembly, but they will have limitations on how short your CDS can be, question single exon stuff, etc.  Hopefully GAG could help you get to where they are happy.

 

If they want a contig + agp file, you will also need to split your GFF file as well (we can do, but I am not sure it is posted on the github page).

 

Scott

Reply all
Reply to author
Forward
0 new messages