Hi,
I'm trying to test out majiq 2.2, and can unfortunately only get it to run with the hg19 .GFF provided in your documentation. It runs fine with that, I've just built my pipeline around another annotation file and would really rather not run the entire process again.
Ensembl and NCBI database .GFFs do not work and give thousands of errors such as:
(PID:1155) - WARNING - Error, incorrect gff. exon doesn't have valid mRNA b'rna-NR_106918.1
(PID:1155) - WARNING - Error, incorrect gff. exon doesn't have valid mRNA b'rna-MIR7846'
I'm wondering what's going on? The accession numbers are used in NCBI's refseq ftp for .GFF files instead of chr numbers, but shouldn't that not matter if I built and mapped my genome with the accession? Here's and example of GRCh38.p13 refseq .GFF:
NC_000001.11 Gnomon exon 168100 168165 . - . ID=exon-XR_001737579.2-4;Parent=rna-XR_001737579.2;Dbxref=GeneID:100996442,Genbank:XR_001737579.2;gbkey=ncRNA;gene=LOC100996442;product=uncharacterized LOC100996442%2C transcript variant X5;transcript_id=XR_001737579.2
I've tried with Ensembl as well, here's an example of GRCh37.87 from Ensembl (which seems to be formatted correctly, except for "chr"?):
1 ensembl_havana mRNA 47264718 47285085 . + . ID=transcript:ENST00000271153;Parent=gene:ENSG00000142973;Name=CYP4B1-001;biotype=protein_coding;ccdsid=CCDS542.1;havana_transcript=OTTHUMT00000021911;havana_version=1;tag=basic;transcript_id=ENST00000271153;version=4
Is there a place to download .GFFs or .GTFs that are reliably formatted for this program..? Also, if my genome was built and .bam's aligned without a "properly" formatted .GFF, will majiq find no LSV's even if it's correct in that step? Any help would be appreciated. Thanks!
Patrick