Hi, so I did sort of resolve this issue using AGAT, it turned out that the gff3 had orphaned parent links and was missing utrs (which majiq shouldn't care about but did?)
So I used the gtf hosted on wormbase parasite, converted GFF3 → GTF → GFF3 using AGAT, and then validated that : all exon features present, parent-child structure: gene → mRNA → exon, no zero-length or malformed entries.
From this I was able to build my annotation splicegraph. But when I did, I noticed that it ran suspiciously fast and when I inspected:
SpliceGraph[7 contigs, 46926 genes, 160350/1539/116630 exons/introns/junctions]
I found a very small number of introns
From there I troubleshooted (troubleshot?)
Are there genes and features present in the gff3? Yes.
Can introns be inferred from the gff3? Yes, via a python script I found ~200,000 inferred introns from exon chaining
Substituted transcript → mRNA. This had no effect
Validated exon-mRNA-gene hierarchy via script and all IDs and Parents are linked.
Purged the old splicegraph just in case? Still
[7 contigs, 46926 genes, 160350/1539/116630 exons/introns/junctions]
Ran with build_debug.log but there are are no errors or warnings about skipped transcripts
Here's a link to the gff3 that I've been using. Let me know if I can provide further details or if you have any suggestions for a conversion?
Thanks very very much in advance!
https://www.dropbox.com/scl/fi/qo6knpyb0o4dg3gaagiex/majiq_ready.tar?rlkey=j2vart50imvabmvf8vhgl7nfd&st=rbm1wmaw&dl=0