I have generated annotations with (comparative) augustus in gff3 format, but for some reason, when using rsem-prepare-reference, the rsem-gff3-to-gtf step is producing a zero sized gtf file. There doesn't seem to be a verbose output option to try and understand what is going on. But, of course, since the gtf is empty, it is not surprising that rsem-extract-reference-transcripts produces a "The reference contains no transcripts!" error.
For what it's worth, I assumed that I'd probably have to repair the parent and id info in the transcript field as it didn't seem to conform to specifications, e.g. in the original file:
DPSCF301198 AUGUSTUS transcript 1844 3933 . - . jg1.t1
needed to be converted to
DPSCF301198 AUGUSTUS transcript 1844 3933 . - . gene_id "jg1"; transcript_id "jg1.t1";
It should be noted that the same error occurred with and without this repair. I also tried sorting the resulting gff3 with gff3sort, but still the same error.
Without any logging during the gff3 to gtf conversion step, I am at a loss as to how to troubleshoot this. I should also note that gffread will extract transcript sequences from the gff, but for consistency sake -- we are in the midst of a genome annotation methods comparison study--I'd rather use rsem to extract transcript sequences.
Any help troubleshooting would be greatly appreciated.
Best,
Adam Freedman
FAS Informatics Group, Harvard Univ.