[maker-devel] est_gff input does not provide any gene model

180 views
Skip to first unread message

Jacques Dainat

unread,
Oct 31, 2016, 1:24:56 PM10/31/16
to maker...@yandell-lab.org
Hello,

I’m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output.
This time I used Stringtie output to feed Maker, but I don’t have any gene model predicted using the est2genome parameter.

Any explanation ? Is it due to the gff3 format differences between these two file ?

Cufflinks output example:
Pnalgiovense_4592      Cufflinks       match   363     977     17.844829       -       .       ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2;
Pnalgiovense_4592      Cufflinks       match_part      363     666     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +;
Pnalgiovense_4592      Cufflinks       match_part      743     977     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +;

Stringtie output example:
Pnalgiovense_112      StringTie       gene    20      1256    1000    +       .       ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       mRNA    20      1256    1000    +       .       ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       exon    20      1256    1000    +       .       ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1


If it’s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ?

Best regards,


Jacques Dainat, PhD
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service

Address: (room E10:4204 - last floor)
Uppsala University, BMC
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: 01 84 71 46 25

Carson Holt

unread,
Oct 31, 2016, 11:31:26 PM10/31/16
to Jacques Dainat, maker...@yandell-lab.org
Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part.

—Carson


_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Jacques Dainat

unread,
Nov 1, 2016, 12:09:19 PM11/1/16
to Carson Holt, maker...@yandell-lab.org
Thank you for the quick confirmation !

Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS.

I haven’t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?).
It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn’t exits too. The warning is not obvious to catch when launching on a cluster...)

A last question. do the scores from the score column are used by MAKER from the est_gff file ?

Jacques 

Carson Holt

unread,
Nov 1, 2016, 5:19:58 PM11/1/16
to Jacques Dainat, maker...@yandell-lab.org
The score will be ignored. The format to be used for evidence alignments is specified in the GFF3 spec (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md). An EST alignment example is also given as part of the GFF3 Spec.

—Carson
Reply all
Reply to author
Forward
0 new messages