Advice about a warning

80 views
Skip to first unread message

TM

unread,
Dec 5, 2020, 3:40:16 PM12/5/20
to majiq_voila
Hello!
I am trying to run majiq build, but I have a TON of errors that look like this:
 WARNING - Error, incorrect gff. exon doesn't have valid mRNA b'transcript:ENST00000431853'
I have like a hundred screens of the thing. How can I fix it? Are there any conditions on the gff3 file (I downloaded it straight from ensmbl)? Is it its relation with the bam files?
Thank you!

myhoan...@gmail.com

unread,
Dec 5, 2020, 7:57:00 PM12/5/20
to majiq_voila
Hi TM,

(Im not a majiq developer, just an user, just to be clear)
Regarding the error you mentioned, just wonder if you used that ensembl gff3 to create your bam? (since if you used ensembl GTF to create your bam, but feed those bam and GFF3 from ensembl website to majiq, majiq wont run since info in your bam and gff3 dont match for reasons stated in majiq website. )

Jordi Vaquero

unread,
Dec 7, 2020, 4:04:05 AM12/7/20
to myhoan...@gmail.com, majiq_voila

Hi,

The transcript errors you are seeing are due to the gff3 estructure. Majiq assumes that the gff3 is a tree like structure where

                            Gene

                       /    …               \

           Transcript1             transcript N

           / …  \                         /        …      \

     Exon 1.1   exon1.m     exonN.1         exon N.p

 

That is specified by the attributes Id and parentID in each line of the gff3.

If an exon row is found with a parentID that has not been found before, that error appears. Is not a big deal, it is just those exon definition are discarded. Take in care that some gff3 like the older versions of ensembl include the type in the name, like transcript:ENST00000431853. Check that the transcript keyword is included in the transcript ID row as well. We found some of these cases happening in ensemble, but they are few and it will not affect the overall run.

 

 

Jordi Vaquero

--
You received this message because you are subscribed to the Google Groups "majiq_voila" group.
To unsubscribe from this group and stop receiving emails from it, send an email to majiq_voila...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/majiq_voila/45d75478-c154-45e2-8c77-8f192745bd01n%40googlegroups.com.

 

TM

unread,
Dec 7, 2020, 7:13:36 AM12/7/20
to majiq_voila
Hi! Thank you for your answers.

Myhoan, Its a little weird to me that the errors come from a mismatch between the bam files and the gff3. That's because the errors come really quickly after I input the command and the message 'Reading bam files' only comes up after all the errors are done. If that is indeed the case, then I need to convert the gtf I used with STAR to a gff3, something I tried to do with the gtf2gff3 program but failed because I couldnt find the configuration file it wanted. Is there any other way to convert the file?

Jordi, these errors are not few at all. As I said, there are so many that I wonder if there are any left at all. I will try to run the majiq psi command and see if I get anything.
How do you think I can fix these issues? while we're at it, where can I get a good hg38 gff3 file, that doesn't cause these errors?

Thank you again!

TM

Caleb Radens

unread,
Dec 7, 2020, 8:08:27 AM12/7/20
to TM, majiq_voila

TM

unread,
Dec 7, 2020, 8:11:20 AM12/7/20
to majiq_voila
Thanks Caleb, I will try these out.
TM

myhoan...@gmail.com

unread,
Dec 7, 2020, 10:44:16 AM12/7/20
to majiq_voila
Hi TM,

If you havent got any success, I think you should definitely try to convert gtf to gff3 with gtf2ff3 again. So far that was the only way that worked for my case. (I created bam using STAR and ensembl gtf. Then convert that ensembl gtf to gff3 with gtf2gff3. And use that gff3 (not the gff3 from ensembl website) and the bam (generated w star and gtf) to feed to majiq build).

If you have trouble with gtf2gff3, maybe the following info will help:
+ command that i use: (as suggested in link from majiq website https://biociphers.bit bucket.io/majiq/quick.html )
gtf2gff3 --cfg gtf2gff3.cfg path/to/gtf > path/to/new/gff3
+ config script for gtf2gff3 (not to be confused with config script for majiq build, found by gg):

TM

unread,
Dec 7, 2020, 10:48:15 AM12/7/20
to majiq_voila

Thanks Myhoan! 
I will definitely try that config . Did you use the default setting there, or did you change something?
TM

TM

unread,
Dec 27, 2020, 10:19:38 AM12/27/20
to majiq_voila
Hello Caleb and Jordi.

In order to get rid of the thousands of warnings I got while running majiq build, I used new gtf, gff3, and fa files from the first link Caleb gave me.
I used STAR to index the genome and created the bam files using the gtf file. I then attempted to ran majiq build using the gff3 file. Unfortunately it did not solve the problem.
I continue to get all the errors, even though I am positive the two files contain the same information, just in a different format. It seems a mismatch was not my only probelm.
Can you think of any other reason why all my exon definitions are discarded?

Thank you,
TM

On Monday, 7 December 2020 at 15:08:27 UTC+2 cra...@biociphers.org wrote:
Reply all
Reply to author
Forward
0 new messages