GFF3 compatibility error

29 views
Skip to first unread message

Maria Stager

unread,
May 25, 2023, 12:00:08 PM5/25/23
to pasapipeline-users
Hi Brian,

I apologize for the extremely basic nature of this question but hopefully it is quick: I would like to update an existing annotation (Junco hyemalis) and so I'm first validating the format of the gff3. I receive the following error when I use the validator tool:

$~/bin/PASA/misc_utilities/pasa_gff3_validator.pl /Jhye_annotation.gff3

Fatal Error: cannot parse ID from entry

SclofgA_1__HRSCAF___1 GeMoMa CDS 5161 5233 . - 0 Parent=Jhye_g00001.1 at /bin/PASA/misc_utilities/pasa_gff3_validator.pl line 58, <$fh> line 6.

Is the problem simply that all features need an ID attribute? My understanding of gff3 format (i.e., https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) suggests that the format is acceptable but several features have only Parent IDs, so perhaps it is not. 

head -10 /Jhye_annotation.gff3

##gff-version 3

#SOFTWARE INFO: GeMoMaPipeline 1.7.1; SIMPLE PARAMETERS: species: pre-extracted; ID: zfinch; weight: 1.0; species: pre-extracted; ID: chicken; weight: 1.0; ID: braker; weight: 1.0; annotation evidence: true; tblastn: false; tag: mRNA; RNA-seq evidence: NO; denoise: DENOISE; DenoiseIntrons.maximum intron length: 15000; DenoiseIntrons.minimum expression: 0.01; DenoiseIntrons.context: 10; Extractor.upcase IDs: false; Extractor.repair: false; Extractor.Ambiguity: AMBIGUOUS; Extractor.discard pre-mature stop: true; Extractor.stop-codon excluded from CDS: false; Extractor.full-length: true; GeMoMa.reads: 1; GeMoMa.splice: true; GeMoMa.gap opening: 11; GeMoMa.gap extension: 1; GeMoMa.maximum intron length: 15000; GeMoMa.static intron length: true; GeMoMa.intron-loss-gain-penalty: 25; GeMoMa.e-value: 100.0; GeMoMa.contig threshold: 0.4; GeMoMa.region threshold: 0.9; GeMoMa.hit threshold: 0.9; GeMoMa.predictions: 10; GeMoMa.avoid stop: true; GeMoMa.approx: true; GeMoMa.protein alignment: true; GeMoMa.prefix: ; GeMoMa.timeout: 3600; GeMoMa.Score: ReAlign; GAF.common border filter: 0.75; GAF.maximal number of transcripts per gene: 2147483647; GAF.default attributes: tie,tde,tae,iAA,pAA,score; GAF.filter: start=='M' and stop=='*' and (isNaN(score) or score/aa>=0.75); GAF.sorting: evidence,score; GAF.alternative transcript filter: tie==1 or evidence>1; AnnotationFinalizer.UTR: NO; AnnotationFinalizer.rename: SIMPLE; AnnotationFinalizer.prefix: Jhye_g; AnnotationFinalizer.digits: 5; AnnotationFinalizer.name attribute: false; predicted proteins: true; predicted CDSs: true; predicted genomic regions: false; output individual predictions: false; debug: true; restart: false; BLAST_PATH: ; MMSEQS_PATH: 

##sequence-region SclofgA_1__HRSCAF___1 1 9096

SclofgA_1__HRSCAF___1 GAF gene 3273 5233 . - . ID=Jhye_g00001;transcripts=1;complete=1;maxEvidence=1;combinedEvidence=1

SclofgA_1__HRSCAF___1 GeMoMa mRNA 3273 5233 . - . ID=Jhye_g00001.1;ref-gene=chicken_gene-RPTC15L;aa=104;score=292;ce=4;rce=4;pAA=0.6769;iAA=0.5769;nps=0;start=M;stop=*;evidence=1;Parent=Jhye_g00001;sumWeight=1.0;

SclofgA_1__HRSCAF___1 GeMoMa CDS 5161 5233 . - 0 Parent=Jhye_g00001.1

SclofgA_1__HRSCAF___1 GeMoMa CDS 4602 4651 . - 2 Parent=Jhye_g00001.1

SclofgA_1__HRSCAF___1 GeMoMa CDS 4181 4289 . - 0 Parent=Jhye_g00001.1

SclofgA_1__HRSCAF___1 GeMoMa CDS 3273 3352 . - 2 Parent=Jhye_g00001.1

##sequence-region SclofgA_4__HRSCAF___4 1 33198 

Thank you for your help!

Maria

Brian Haas

unread,
May 30, 2023, 8:28:09 AM5/30/23
to Maria Stager, pasapipeline-users
Hi Maria,

In our EVM software toolkit, we have a converter for GeMoMa that might give you the gff3 formatting that PASA would accept and would validate:

https://github.com/EVidenceModeler/EVidenceModeler/blob/master/EvmUtils/misc/GeMoMa_gff_to_gff3.pl

You could try installing EVM and then running the above script that's included.

best,

~b

--
You received this message because you are subscribed to the Google Groups "pasapipeline-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pasapipeline-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pasapipeline-users/34c3d32b-54f3-48bb-a286-b298422e91d9n%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Maria Stager

unread,
May 30, 2023, 1:11:37 PM5/30/23
to pasapipeline-users
Ah, amazing! Of course you already had designed a fix. That did the trick! 

You're wonderful! 

Thank you,
Maria

Brian Haas

unread,
May 30, 2023, 1:23:36 PM5/30/23
to Maria Stager, pasapipeline-users
Great to hear!  :-)

Reply all
Reply to author
Forward
0 new messages