Voila view and intron retention

Seth Kelly

unread,

Apr 3, 2024, 2:55:56 PM4/3/24

to Biociphers

Hello,

I've been using Majiq and Voila to view splicegraphs and had a quick question regarding the number over the lines connecting the exons. It seems like (from a previous thread on this group) that the number of a line that connects to exons represents the median number of reads that support that junction among all samples within a given group.

I'm wondering though about what that number represents for intron retention since the reads wouldn't (most of the time) cover the entire distance across the whole intron. Does this number represent the median number of reads mapped to that intron? Is it normalized to intron length in some way?

This might be somewhere in the paper or the online documentation, but I couldn't seem to fine it.

Thanks,

~Seth

Ry Guy

unread,

Apr 23, 2024, 1:00:06 PM4/23/24

to Biociphers

That line that arcs over the introns represents the number of reads that bridge the gap between the two exons, so the area under the arc is not included in the reads. In other words the transcripts representing that junction have nucleotides that start in the first exon, then the sequence of that read skip over the intron to the start of the next exon where the nucleotides align there. So that transcript is "bridging" the gap, and the intronic regions (normally, as you mentioned) are not present in that read. For example, if the line connecting the exons says 50 then it means that there were 50 transcripts that aligned to the end of the first exon and the start of the second exon, without aligning to the intronic region in between. Does that make sense?

San Jewell

unread,

Mar 5, 2025, 12:45:13 PMMar 5

to Biociphers

Clarification from Joseph Aicher:

Yes, read coverage for introns is normalized as a function of intron length.

To explain MAJIQ read counts:

For junctions: MAJIQ counts the number of reads where the alignment has a split corresponding to the junction coordinates and there are at >= 8nt aligned overhang on both sides of the split.

For introns: MAJIQ counts the number of reads where the alignment overlaps >= 8nt into the intron without any split (i.e. junction) within the intronic portion of the alignment.

If we say r is the maximum read length, then we normalize the read count for an intron with length L by multiplying by: (r - 15) / (r + L - 15).

Explanation of this normalization: since we require >= 8nt overhang, there are r - 15 possible ways for a read to overlap a junction. For an intron with length L, there are r + L - 15 ways the read can overlap the intron.

Chris Khoury

unread,

Jun 16, 2025, 8:53:09 PMJun 16

to Biociphers

Hi @SanJewell

Hope you have been well.

Samantha here, sorry to tag along this thread after a few months, but I was hoping for some clarification on the above note from Joseph regarding 8nt overhangs.

I have listed some specific key parameters to aid with context for the question.

Experiment:
150bp RNA seq files from human blood
Total RNA
Ribo depleted
Sequenced ~ 200M

STAR Aligner:
--alignSJoverhangMin 6 --alignSJDBoverhangMin 6

MAJIQ BUILD: (please note it is one human sample per build)
--min-experiments 0.1 --minreads 1 --minpos 1 --markstacks=-1.0

MAJIQ PSI:
--min-experiments 1 --minreads 1 --minpos 1

Query:
Basically, I set an intentional overhang in STAR of 6, and would like those reads included my junction counting of splices.
Is the 8nt overhang hard-coded into MAJIQ (i.e MAJIQ will be counting less than what is required).
Besides min reads, min pos and markstacks, is there any other filtering/removal of junction reads that MAJIQ does in the back end that we cannot toggle off ?

Thanks in advance,
Samantha.

bsl...@seas.upenn.edu

unread,

Jun 17, 2025, 2:39:53 PMJun 17

to Biociphers

Dear Samantha,

The minimum overhang equal to 8 is hard-coded both in MAJIQ V2 and our coming release, MAJIQ V3.

The direct answer to the filtering/removal question is no. However, there is more to say about retained intron detection in MAJIQ V2 (I am not sure whether retained introns are relevant to your concern-- if so, read on!). MAJIQ V2 parses “annotated introns” from the input gff3 as annotated exonic regions which connect two other exons. Then, by default, retained introns (annotated and denovo) must pass coverage thresholds to be included in LSV definitions (args: --annotated_ir_always, --disable-denovo-ir). To determine whether an intron passes these thresholds, MAJIQ divides an intron into segments (“bins”) and checks that at least a certain proportion of bins (default 0.5, half of them; arg --irnbins) have a certain length-normalized read coverage level (default 0.01; arg --min-intronic-cov). So, it is possible to have some reads splitting an exon-intron boundary and yet the intron will not be included in LSV quantifications if detection coverage thresholds are not met. Moreover, MAJIQ V2 uses a heuristic to evaluate the above coverage criteria which can result in stricter-than-documented thresholds applied for relatively short introns. MAJIQ V3 replaces this heuristic and also does not require annotated introns to pass coverage thresholds for inclusion in LSVs. Our coming MAJIQ V3 preprint update will include more information on this.

Please feel free to ask any additional questions.

Best Regards,

Barry

Chris Khoury

unread,

Jul 4, 2025, 2:31:26 AMJul 4

to Biociphers

Thanks Barry, regarding the information on the hard-coded overhang.

Regarding intron retention, yes it is of great interest.

We saw Dr Barash's post and look forward to playing with V3.

Our pipelines focus on comparing patient RNA splicing to a group of tissue and age matched controls (Rare Diseases).

Currently, we execute MAJIQ PSI for all samples where each sample is its own build group to include rare junctions. Then, we compare the values between the subject of interest and the control panel to detect splicing variations.

Assuming we transition to V3, is this approach currently available, or streamlined through the module MAJIQ-CLIN ? ( We are on the most recent 2.5 Version)

Many thanks,

Samantha.

bsl...@seas.upenn.edu

unread,

Sep 3, 2025, 4:20:09 PMSep 3

to Biociphers

Dear Samantha,

MAJIQ-CLIN is designed for the use-case you described. MAJIQ-CLIN is not yet generally available, however:

(1) If you are non-commercial (e.g. part of a university lab) and want to use MAJIQ-CLIN now, please have your lab’s PI contact Yoseph Barash at yos...@upenn.edu with a description of the use-case and request for access. We will provide early access for academic use-cases intended to identify aberrant splicing between patients and controls.

(2) MAJIQ-CLIN will be generally available for academic and commercial licensing after the paper is published, which we hope will be soon.