More exons than in annotation file

73 views
Skip to first unread message
Assigned to pje...@biociphers.org by me

Justin Malin

unread,
Mar 9, 2021, 6:57:48 PMMar 9
to majiq_voila
Hi, 

Thank you for a very useful tool.

I am observing an unexpected discrepancy in reported exonic structure. Per ensembl and ucsc browsers (mm10),  Klf6 has 4 exons.
Screen Shot 2021-03-09 at 6.44.41 PM.png

In majiq/voila, however, there are 5 exons and, per the legend, all 5 exons are supported by both the RNA-Seq and the db:
Screen Shot 2021-03-09 at 6.47.13 PM.png 

From the ensembl annotation file that I used:

>grep Klf6  Mus_musculus.GRCm38.102.gff3 

chr13 ensembl_havana gene 5861482 5870394 . + . ID=gene:ENSMUSG00000000078;Name=Klf6;biotype=protein_coding;description=Kruppel-like factor 6 [Source:MGI Symbol%3BAcc:MGI:1346318];gene_id=ENSMUSG00000000078;havana_gene=OTTMUSG00000066476;havana_version=1;logic_name=ensembl_havana_gene_mus_musculus;version=7

chr13 ensembl_havana mRNA 5861482 5870394 . + . ID=transcript:ENSMUST00000000080;Parent=gene:ENSMUSG00000000078;Name=Klf6-201;biotype=protein_coding;ccdsid=CCDS26229.1;havana_transcript=OTTMUST00000161492;havana_version=1;tag=basic;transcript_id=ENSMUST00000000080;transcript_support_level=1;version=7

chr13 havana mRNA 5861541 5867275 . + . ID=transcript:ENSMUST00000222857;Parent=gene:ENSMUSG00000000078;Name=Klf6-203;biotype=nonsense_mediated_decay;havana_transcript=OTTMUST00000161493;havana_version=1;transcript_id=ENSMUST00000222857;transcript_support_level=5;version=1

chr13 havana lnc_RNA 5864678 5866770 . + . ID=transcript:ENSMUST00000221734;Parent=gene:ENSMUSG00000000078;Name=Klf6-202;biotype=retained_intron;havana_transcript=OTTMUST00000161494;havana_version=1;transcript_id=ENSMUST00000221734;transcript_support_level=2;version=1


I don't know if this contributes, but the .bam files I'm using are single cell -- Chromium 10X Genomics.

Would you know the reason for the discrepancy in the number of exons?

Thank you,
Justin


Paul Jewell

unread,
Mar 15, 2021, 7:39:52 PMMar 15
to majiq_voila
Hi Justin, 

I'm able to check this gene on a similar mouse mouse run with our annotation, and I only see four exons. (It looks like the "exon 2" from your splicegraph, is missing in mine). Would be able to make the gff3 available so that I could determine if that is perhaps the source of the issue?

Thanks, 
-Paul

Justin Malin

unread,
Mar 15, 2021, 8:04:07 PMMar 15
to majiq_voila

Hi Paul,

This is the link I downloaded the mm10 gff3 file from:

Matthew Gazzara

unread,
Mar 17, 2021, 9:13:08 AMMar 17
to majiq_voila
Hi Justin,

Thanks for your interest. 

The "extra" exon 2 you are seeing is from the "non-sense mediated decay" Klf6-203 in the screen capture you shared corresponding to transcript ENSMUST00000222857 from the gff3 lines you shared. When MAJIQ builds the splice graph it collapses all the transcripts fed to it through the gff3 file and what you get is the union of all exons seen in the annotation (or detected de novo from your data) and the longest versions of those exons seen in all transcripts. This is why exon 3 in your splice graph is longer / extends upstream of the splice junction: note the Klf6-202 transcript in your screen capture has an upstream transcription start site in this region. 

As for the legend saying all 5 exons are supported by RNA-seq and the DB I will get back to you. I am fairly certain we do not consider exonic read coverage in determining this so perhaps the legend is a bit misleading. Exon 2 in your splice graph may be labeled this way because there is a splice junction or reads supporting the green intron retention event. 

Let me know if you have further questions.

-Matt

Justin Malin

unread,
Mar 17, 2021, 5:37:45 PMMar 17
to majiq_voila

Hi Matthew,

Great, that would explain it. Thanks for the explanation. 

I do have another question, unrelated. I'll open up a new thread for it.

Thanks again,
Justin
Reply all
Reply to author
Forward
0 new messages