Sorry, prediction transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1 fails validation.
(CAT) GG 271739-272313 GT TGG
(GCC) AG 272816-272931 GC AGG
(GTT) AG 273047-273169 GT AGG
(GGT) AG 273271-273456 GT AGG
(GTC) AG 273716-273931 GT AGG
(TGT) AG 274970-275187 GT AGG
(TGA) AG 276219-276294 GT AGG
(CTA) AG 276379-276468 GT AGG
(GCT) AG 276829-278003 GT GTG
-recovered transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1, internal, 272816, 272931
-recovered transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1, internal, 273047, 273169
-recovered transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1, internal, 273271, 273456
-recovered transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1, internal, 273716, 273931
-recovered transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1, internal, 274970, 275187
-recovered transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1, internal, 276219, 276294
-recovered transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1, internal, 276379, 276468
-recovered transdecoder_NW_017388807.1_mrna_XM_018977167.1_649.p1, internal, 276829, 278003
The above is a case where the coding prediction is not full-length, so doesn't start with a start codon, and it looks like it doesn't end at a stop codon.
In these cases, EVM uses what it can and incorporates the exons that it can classify as internal, initial, or terminal exons, based on splice dinucleotides and start/stop codons. That's where the 'recovered...' messages come in.
In regard to cases where there are multiple isoforms, EVM is only capable of modeling a single isoform structure and only the coding regions for that isoform. In this case, it should pick out the 'best' (as defined by the scoring system and weights), and it'll lack any UTR exon annotations.
What we would normally do is to run EVM, and then use the EVM predictions as an annotation input to PASA for adding on UTRs and modeling alt splicing isoforms that are well supported by the transcriptome data.
I hope this helps. I'm happy to continue to look into any issues. Hopefully my stderr/stdout files will provide a reference for hunting things down too.
best,
~brian
Hi Brian:Thanks, this was very helpful. One more thing I just noticed with the EVM output ... when I generate the combined gff3 file with the commandfind . -regex ".*evm.out.gff3" -exec cat {} \; > EVM.all.gff3there are some strange lines in the gff3 file. In addition to creating the gff3 lines for all the chromosomes and scaffolds that are in genome.fa, it also includes some lines with genome.fa as the Chr. (see attached) This causes an error when I run gff3_file_to_proteins.pl because it can't find a chromosome called "genome.fa". I haven't been able to figure out where these lines are coming from. I'm guessing something went wrong with one of my partitions, but I can't find "genome" in any of the partition-specific gff3 files.Any suggestions on that? I don't want to just delete them if they are real genes.Thanks,Monica
--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-...@googlegroups.com.
To post to this group, send email to evidencemo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidencemodeler-users/8302d895-a38b-456f-bbfa-9242b6b1c8b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.