[maker-devel] est_gff input does not provide any gene model

Skip to first unread message

Jacques Dainat

Oct 31, 2016, 1:24:56 PM10/31/16
to maker...@yandell-lab.org

I’m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output.
This time I used Stringtie output to feed Maker, but I don’t have any gene model predicted using the est2genome parameter.

Any explanation ? Is it due to the gff3 format differences between these two file ?

Cufflinks output example:
Pnalgiovense_4592      Cufflinks       match   363     977     17.844829       -       .       ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2;
Pnalgiovense_4592      Cufflinks       match_part      363     666     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +;
Pnalgiovense_4592      Cufflinks       match_part      743     977     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +;

Stringtie output example:
Pnalgiovense_112      StringTie       gene    20      1256    1000    +       .       ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       mRNA    20      1256    1000    +       .       ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       exon    20      1256    1000    +       .       ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1

If it’s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ?

Best regards,

Jacques Dainat, PhD
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service

Address: (room E10:4204 - last floor)
Uppsala University, BMC
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: 01 84 71 46 25

Carson Holt

Oct 31, 2016, 11:31:26 PM10/31/16
to Jacques Dainat, maker...@yandell-lab.org
Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part.


maker-devel mailing list

Jacques Dainat

Nov 1, 2016, 12:09:19 PM11/1/16
to Carson Holt, maker...@yandell-lab.org
Thank you for the quick confirmation !

Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS.

I haven’t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?).
It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn’t exits too. The warning is not obvious to catch when launching on a cluster...)

A last question. do the scores from the score column are used by MAKER from the est_gff file ?


Carson Holt

Nov 1, 2016, 5:19:58 PM11/1/16
to Jacques Dainat, maker...@yandell-lab.org
The score will be ignored. The format to be used for evidence alignments is specified in the GFF3 spec (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md). An EST alignment example is also given as part of the GFF3 Spec.

Reply all
Reply to author
0 new messages