I've tried Majiq with various attempts using different .gff3 files. The only one that works is Hg38 downloaded from ensembl, but I get LOADS of errors

483 views
Skip to first unread message

William Wright

unread,
Jul 20, 2018, 4:29:56 PM7/20/18
to majiq_voila
I have 3 replicates of condition 1 and 3 replicates of condition 2. 
I've tried the majiq builder the Hg19.gff3 file provided in the MAJIQ documentation link. It would not work, and my jobs would run for a few days before either giving an error like "index is 0" or I would just cancel it. 

After playing around with everything I could think of, I had a successful attempt using the Hg38.gff3 file from ensembl. However, my gene of interest (which i know is subject to lots of alternative splicing both within and across these samples) does not show up in the voila_psi or voila_dpsi outputs. I looked into this and noticed that when I ran the MAJIQ builder, I got a HUGE number of these errors :
"(PID: 14174) - WARNING - Error, incorrect gff. exon doesn't have valid mRNA transcript: ENST0XXXXXX" 

It's worth noting that this, this only .gff3 file I got to work, completed the majiq builder in about 20 minutes (8 cores), and like I mentioned was the only one to give me a completed result which I could continue in psi and dspi analyses. 

Interestingly, even the Hg38.gff3 file from Gencode did not work (no progress after 3 days, even though my processors show they are operating at ~100% on and off). 


Which hg38 and hg19 .gff3 files are actually correct and I should be using? MAJIQ has produced some beautiful results this far, but I don't know if I should trust it without my gene of interest being present (in addition to not knowing if the 'incorrect gff' errors have a big effect). 

If it's of any concern at all, in every attempt I've ever tried I do see the message "index file is older than bam file". 

Thanks very much,
Charlie

Eric Marshall

unread,
Jul 23, 2018, 10:51:43 AM7/23/18
to majiq_voila
Charlie,

Could you share the builder command you tried to use with the gencode gff? What genome did you align to? I'm a frequent user and gencode gffs have given me no troubles.

Jordi Vaquero

unread,
Jul 23, 2018, 11:09:33 AM7/23/18
to majiq_voila
Hello William, 

I am sorry to hear about this issues. The annotation DB to be used is the same version that you used to map the RNA seq experiment in the first place, so if you used hg19, mapping to hg38 is going to return a misplaces splicing cases. 
Now the reason why the hg19.gff3 is having errors we are going to fix now. 
Can you send me one of your bam files in order to check it more closely?. 
Can you send the setting.ini and tell me how did you mapped the bam and which version of the genome you used?


the WARNING MSG, appears only for the genes that are not specified as a valid mRNA identifier in the gff3, those use to be pseudo-transcript and thinks like that, that the gencode or ensembl consortium include in the latest version. The accepted transcripts are

accepted_transcripts = ['mRNA', 'transcript', 'lnc_RNA', 'miRNA', 'ncRNA', 'rRNA', 'scRNA', 'snRNA', 'snoRNA', 'tRNA', 'pseudogenic_transcript']

If there is one of the transcripts or genes that are out of this identifiers for type but you think they are relevant for your work, just point them to us and we will include them. 

Thanks

Jordi


On Friday, July 20, 2018 at 4:29:56 PM UTC-4, William Wright wrote:

owens...@tamu.edu

unread,
Aug 13, 2018, 11:45:58 AM8/13/18
to majiq_voila
Hello,
I am having very similar issues where the build step will not complete. 
I've attempted the majiq build step for my samples multiple times with an issue of the program seeming to "stop" while I can see my processors operating fine. I have used other .gff3 files (mouse ensembl from majiq and newer versions of gencode) and it still seems to be extremely delayed as I have also has it run for a few days with no output and I end up quitting the job. 

The job did run with the Mus_musculus.GRCm38.93.gff3 from Ensembl but with the same errors Mr. Wright had.

 WARNING - Error, incorrect gff. exon doesn't have valid mRNA transcript:ENSMUST00000103420


 I used gencode vM16 for these samples when I aligned them and would prefer to use that build. I have also run that .gtf file through your provided perl script with no difference in output. 

Here is my input:

$majiq build gencode.vM16.annotation.gff3 -c settings1.config -j 2 -o buildattempt0813182 --debug

/Users/thewatsonlab/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.

  from ._conv import register_converters as _register_converters

2018-08-13 09:52:18,376 (PID:979) - INFO - Majiq Build v1.1.3a

2018-08-13 09:52:18,376 (PID:979) - INFO - Command: /Users/thewatsonlab/anaconda3/bin/majiq build gencode.vM16.annotation.gff3 -c settings1.config -j 2 -o buildattempt0813182 --debug

2018-08-13 09:52:18,414 (PID:979) - INFO - ... waiting gff3 parsing

2018-08-13 09:52:18,422 (PID:1007) - INFO - [Th 0]: START child,Process-2

2018-08-13 09:52:49,959 (PID:979) - INFO - Retrieve denovo features

2018-08-13 09:52:49,959 (PID:979) - INFO - Create 2 processes

2018-08-13 09:52:49,971 (PID:979) - DEBUG - Start Queue Manager

2018-08-13 09:52:49,973 (PID:1008) - INFO - Reading DB

2018-08-13 09:52:49,973 (PID:1009) - INFO - Reading DB

2018-08-13 09:52:56,076 (PID:1008) - INFO - READ JUNCS from BAM/CON1.bam

2018-08-13 09:52:56,077 (PID:1009) - INFO - READ JUNCS from BAM/KO1.bam



I would appreciate any help to solve this issue. Thank you so much!

owens...@tamu.edu

unread,
Aug 14, 2018, 1:01:16 PM8/14/18
to majiq_voila
I have tried the settings.ini file as well. The above example was just one attempt of many to figure out what the issue was. 



Reply all
Reply to author
Forward
0 new messages