Voila Interpretation

Alex PG

unread,

Nov 12, 2025, 11:34:34 AM11/12/25

to Biociphers

Dear BioCiphers community,

I am still new to majiq and I’m having some difficulty interpreting certain elements of the Voila visualization (screenshot attached). Do you mind helping me with the following questions?

The dotted splice junction lines are labeled as “DB only.” Do these refer to the Ensembl annotations provided in the GFF3 file?
What does the single short line extending after exon 13 represent?
What is the meaning of the small block shown below exon 11?
Why does the red junction line near exon 6 (classified as DB & RNASeq) not display any median read support?
Is it expected that the number of exons displayed in the Voila visualization often exceeds the number of exons in any individual transcript annotation?

Thank you so much for all your support! Would appreciate your help.

All the best,

Alex

voila.png

San Jewell

unread,

Nov 13, 2025, 10:47:07 AM11/13/25

to Biociphers

Hi Alex,

Thanks for reaching out! I'll answer things in the number order you indicated:

You are correct. These junctions are indicated only in the annotation gff3 supplied during build
Lines like this are called "half-exons", in which some evidence was found for one splice site of an exon but not the other. From the docs (https://biociphers.bitbucket.io/majiq-docs/getting-started-guide/faq.html): "MAJIQ detects de novo junctions that can create new exons. Sometimes the boundaries of those exons are not clear ( not present, to far away, or not well covered…), in those coses we define the exon as half-exon. The missing coordinate is specified as na"
This is actually a visual glitch in which a great number of annotated transcription end sites have been rendered right on top of one another (usually indicated by the caret symbol (^)) ; it is not a situation I've noticed before ; we may decide to represent things differently to make this more clear in the future.
Looking at your screenshot, the skip junction (from exon 5 to 7), looks to be db-only, so I assume you mean the red junction indicated by the highlighted LSV which has 2 reads in group Polyp and none in group Normal. If this is correct, it is because the coloring of junctions is determined by the factors of all experiments together. I.e. if any junction had reads, then all of the splicegraphs will show this fact. (it is only grey if there are no reads from *any* experiment)
Yes. The structure of the splicegraph shown will be a collapsed / combined version of all of the annotation transcripts, as well as any additional de-novo junctions/exons/intron retention events. This will generally be shown for each group in the same way besides read support and other minor coloring changes between groups.

Let me know if it makes sense when you have a chance. Or if you have suggestions for future improvements.

Thanks,

-San

Alex PG

unread,

Nov 13, 2025, 2:44:13 PM11/13/25

to Biociphers

Hi San,

Thank you so much for your answer. It was extremely helpful and clarified a lot!

Regarding point 4: yes, I was referring to the highlighted LSV you mentioned. It makes sense that the coloring of junctions is determined by all experiments together.

I think MAJIQ is a fantastic resource, thank you for developing and maintaining it! This might be a time-consuming suggestion, but I thought it would be really interesting if transcripts could be linked to their organ of interest, allowing users to view the LSVs of transcripts most abundant in that organ (similar to GTEx: https://gtexportal.org/home/transcriptPage or like in the attached screenshot).

I came across another question: the list of genes generated with viola view using /build/splicegraph.sql and /heterogen/.het.voila* represent a set of LSVs that seems to depend on the chosen settings (e.g., threshold, test, etc.). How is this number calculated? I’m asking because it differs quite a bit from the list in modulalized/summary.tsv. For example, for this experiment with the default settings [abs(E(dPSI)) threshold 0.2, P-value threshold 0.5, and P-value stat TNOM] viola view shows 366 entries (filtered from 110,543 total), while modulalized/summary.tsv lists 280 entries.

Thank you so much for taking the time.

All the best,

Alex

All the best,
Alex

transcriptofinterest.png

San Jewell

unread,

Nov 13, 2025, 5:04:42 PM11/13/25

to Biociphers

Hi Alex,

No problem, and I appreciate you reaching out with comments and things to consider.

I'll first answer the question about the difference in filter criteria between the table in voila view and the files output by tsv/modulizer modes: the reason it because it is not efficient enough to re-calculate all stat thresholds when the confidence slider is adjusted for the index, so the thresholds for the index page are actually binned into 10 confidence bins per row. It functions as an approximate filter criteria rather than exact. Meanwhile, all values in the cells of voila tsv / modulizer are always calculated exactly based on all of the threshold switches specified. Let me know if it makes sense and if you find anything amiss.

For the question about transcripts enhancement, I'd like to know a little more about what you had in mind, and then I can put it to my lapmates for additional input. In general, MAJIQ is not a transcript-centric tool, and all data saved is generally over the collapsed splicegraph. We do not do any quantification over transcripts, only LSVs. The list of annotation transcripts above the splicegraphs is unprocessed, just copied from the gff3 to serve as a convenient comparison. However, if there is some additional information/metadata about them that's easy for the user to provide, we could potentially save it for easy viewing in voila downstream, assuming the user has assembled a definitive source on transcripts-to-organs. As our tool is mainly LSV-centered, this is where most of the screen real-estate goes, for example, you can look for our majiqlopedia GTEX run which is located here: (choose "normal tissues" or "cancer") https://majiq.biociphers.org/majiqlopedia/

Thanks,

-San

Alex PG

unread,

Nov 17, 2025, 4:26:49 PM11/17/25

to San Jewell, Biociphers

Hi San,

Thank you so much for your reply! It was super helpful, as always. Huge apologies for the delay!

Yes, Majiqlopedia is super helpful! What I was thinking is that it would be great to have an option to see transcripts or LSVs that are specific to the organ you’re interested in. Or maybe even be able to pick a cancer type in Majiqlopedia and compare it right there in an html file generated by voila view.

For example, I’m looking at splicing events in precancerous lesions, so I’ve been comparing the outputs with Majiqlopedia (normal and cancer) and GTEX. It might be beneficial for interpretation if the info from Majiqlopedia (normal and cancer) regarding a specific organ/cancer type is available as a feature in the output html file of the experiments. But I can imagine it might be a bit time consuming to build.

I also had two more questions:
Where in the output can we see de novo events? In the summary.tsv, I only find de novo introns. Figure 4c in Vaquero-Garcia et al. 2023 shows a nice visualization including the ratio of de novo events, so I was wondering where that information can be found.

Additionally, I noticed that some events in the summary file are associated with the same gene but appear across multiple rows, while others are grouped in a single row with increasing counts. What’s the algorithm behind that difference?

Thank you so much for your time! Really appreciate it.

All the best,

Alex

--
You received this message because you are subscribed to the Google Groups "Biociphers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to majiq_voila...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/majiq_voila/15e0541f-62cb-47e3-b9bd-34f529b757cfn%40googlegroups.com.

San Jewell

unread,

Nov 18, 2025, 2:05:21 PM11/18/25

to Biociphers

Hi Alex,

Let me first answer the latter questions:

1) de-novo events can be found in tsv files (for example, you mention summary.tsv, it has a column called "denovo_juncs" as well as the one you mention "denovo_introns". If you are interested in drilling down for a specific event/gene, I'd highly recommend using the voila view mode to get an idea of the visual splicegraph structure. In Figure 4 of Vaquero-Garcia et al. 2023, parts "a" and the top half of "b" are directly rendered by voila. The bottom half of "b" and all of figure "c" are figures made specifically for the paper, I believe by Caleb Radens, I (or you) can reach out to him if you are interested in the source code for those figures. Figure 4C itself is derived almost directly from the modulizer summary.tsv, by counting up the number of each event type found.

2) I think the reason for the confusion here is that summary.tsv is broken down primarily by modules, not by genes. Please take a look at the modulizer section of the documentation to get an idea of how modules are defined: https://biociphers.bitbucket.io/majiq-docs/modulizer/how-modulizer-works.html For less module-focused breakdowns, the other files that modulizer outputs beyond summary.tsv may be more relevant for your purposes.

So, back to the questions about voila view improvements, I think that I still need a little more information about what you are envisioning, though I will also pass this thread along in the main chat of my lab space to see if they want to comment here. First off, I don't think there will generally be a direct link between user majiq runs and majiqlopedia, because majiq is designed to by run over any annotation / species so for many analysis there will be no link, and it feels inefficient to make a feature specific for one special case. You also mention "seeing transcripts or LSVs which are specific to an organ" Functionally I think it breaks down into each of those questions separately. Say instead of defining by "organ" we define by a group (like in majiqlopedia, where there is one group per organ):

I) transcript to group mapping. We do not quantify groups based on transcripts, so in this case the mapping would have to come from an external source. I can imagine this being written to the gff3 itself as a feature, or as a separate file type that could be supplied to voila with a mapping of transcript_id to group/organ name, then functionally it would be a simple case of displaying an annotation organ association per listed transcript at the top of the splicegraph, for visual comparison. IF this sounds like what you like, then I'd basically like to know what file the user would get with this mapping, or how you expect you would supply it to voila if you wanted to use this feature.

II) lsv to group mapping. In this case it sounds like there would need to be some method used to "call" a certain LSV as dominated by a certain group/organ in some situation, but it is less straightforward how to narrow a decision in that way. i.e. if there is a junction in the lsv which is heavily used, and shows high PSI in one organ, and all other organs show a high PSI in some other junction, what information would that provide? I suppose for each junction in each LSV you could check for which group it might be dominated by, but this is already pretty much what the voila plots show in voila view, and you wouldn't be able to call the junction itself this way as it might be included in multiple different LSVs which disagree with one another. Interested to know how your thoughts went on this.