help with interpreting dpsi output

Jee Min Kim

unread,

Feb 26, 2026, 1:04:36 PMFeb 26

to Biociphers

Hello!

I have a question about how de novo flags are defined in MAJIQv3 dpsi output. (to get the dpsi output, I've ran the following commands:
$majiq-v3 gff3 \
gencode.v44.nochr.annotation.gff3 \
majiq_v3_results/annotations/sg.zarr
$majiq-v3 sj 07_REP1-nosgRNA_S7.sorted.bam majiq_v3_results/annotations/sg.zarr majiq_v3_results/sj/07_REP1-nosgRNA_S7.sj
$majiq-v3 psi-coverage majiq_v3_results/build/sg.zarr majiq_v3_results/psi/nosgRNA.psicov \
majiq_v3_results/sj/07_REP1-nosgRNA_S7.sj \
majiq_v3_results/sj/13_REP2-nosgRNA_S12.sj \
majiq_v3_results/sj/19_REP3-nosgRNA_S16.sj
$majiq-v3 quantify majiq_v3_results/psi/nosgRNA.psicov \
--min-experiments 0.5 --splicegraph majiq_v3_results/build/sg.zarr \
--output-tsv majiq_v3_results/psi/nosgRNA.tsv --overwrite
$majiq-v3 deltapsi \
--splicegraph majiq_v3_results/build/sg.zarr \
--output-voila majiq_v3_results/dpsi/nosgRNA-vs-sgNEG.dpsicov \
--output-tsv majiq_v3_results/dpsi/nosgRNA-vs-sgNEG.tsv \
-psi1 majiq_v3_results/psi/nosgRNA.psicov \
-psi2 majiq_v3_results/psi/sgNEG.psicov

The output has four de novo-related columns: `is_denovo`, `ref_exon_denovo`, `other_exon_denovo`, and `event_denovo`. From examples I've looked at, I'm trying to understand exactly what triggers each flag.

For example, I have an event (gene ENSG00000286185, event type 's') where:
- ref_exon: 149543413–149543585
- other_exon: 149548150–149548322
- start-end: 149543434–149548172
- ref_exon_denovo = FALSE, other_exon_denovo = FALSE, but event_denovo = TRUE

The junction start/end fall inside the annotated exon boundaries rather than at their edges, and the event is correctly flagged de novo. This suggests event_denovo captures junction novelty rather than just exon novelty.

However, I have another event (SFI1, event type 's') with a similar coordinate pattern — junction start falls inside the ref exon boundary — but all de novo flags are FALSE. Is the de novo flag purely based on whether the junction was seen in the annotation, and if so, how exactly is a junction matched to annotated junctions given that the reported start/end coordinates don't always align exactly with exon boundaries in the output?

More specifically:
1. What exactly does each de novo flag (`ref_exon_denovo`, `other_exon_denovo`, `event_denovo`) capture?
2. How are the `start`/`end` junction coordinates in the dpsi output defined relative to ref/other exon boundaries — are they raw splice site positions from RNA-seq reads?
3. Is it possible for a junction connecting two annotated exons at a non-canonical splice site to not be flagged as de novo?

Thank you very much in advance for your help with this!

Best,

Jee Min

San Jewell

unread,

Feb 26, 2026, 2:26:53 PMFeb 26

to Biociphers

Hi Jee,

Thank you for your questions!

I think you generally most things understood well, however, the main point of confusion is that defined exon boundaries are not generally automatically trimmed to match the junction coordinate, they are the final exon boundaries (usually the maximum size) which is found by majiq after (1) collapsing all annotated exons together into a splice graph (2) adding completely de-novo exons or de-novo exon extensions to annotated exons based on rnaseq data. The easiest way to get a feel for this is to use $ voila view ; on your data. As an example, you can look at majiqlopedia here, which is based on voila: https://tools.biociphers.org/majiqlopedia_normal/gene/gene:ENSG00000094914/ ; note exons 17 and 18 can be spliced together in a number of different ways, with multiple possible annotated junctions and multiple possible de-novo junctions, however, in the output, the exon coordinates will always be 53307738-53307929. (note: there is an exception in voila modulizer, where exon trimming is possible to exclude non-spliced exon parts which are only alternate start or end)

Given this, to answer your numbered questions:

1) ref_exon_denovo is true if the reference exon is only detected by rnaseq and not annotation, other_exon_denovo is the same, but for the exon at the other end of the junction defined by that output row, event_denovo checks if there is either a denovo exon or junction found in any junction/intron/exon of the entire event

2) start/end of junctions are defined relative to the annotation exon boundry for a annotated transcript (if annotated junction) or RNA-seq reads (if de-novo junction). They may differ from exon boundaries as explained above.

3) a junction which connects two annotated exons will be marked as de-novo if either the 3' or 5' splice site is not also annotated (by way of a transcript exon/start existing as per (2)), so the answer to your question is no. If the splice site is not annotated, the junction should be de-novo.

Please let me know if it makes sense or if you have further questions!

Thanks,

-San

Jee Min Kim

unread,

Feb 27, 2026, 12:52:31 PMFeb 27

to Biociphers

Hi San,

Thank you so much for your detailed and helpful explanation!It really clarified a lot about how the de novo flags and exon boundaries work!

I have a few follow-up questions if you don't mind:

1. I ran voila view on my deltapsi output and it works well. Is there a way to run voila view on a voila modulize output instead? (i.e., to look for genes with alt3ss by the voila modulize output)

2. When I looked at the alt3ss.tsv output from voila modulize, I found some genes with good deltapsi values that appear to have changing events. However, when I tried to find these genes in voila view, I couldn't locate them. Do you know why a gene might appear in the modulize TSV output but not be visible in voila view?

Thank you again for your time and help!

Best,
Jee Min

San Jewell

unread,

Mar 4, 2026, 12:54:30 PMMar 4

to Biociphers

Hi Jee,

I thought I responded to this but maybe groups ate it. >.<

1) There isn't a way to directly use modulizer output as an input to voila view at this time. It is possible to use the switch "--enable-type-indexing", which will build some of the types using a simpler algorithm which is older yet faster than modulizer (you might also need --force-index if you already have a voila index generated from a prior run), this option will create checkboxes at the top of the lex index page which allows you to limit to lsvs of a certain type. It is not as comprehensive but might still help in your case.

2) The lsv list is restricted by p-value and psi/dpsi thresholds at the top of the index page table, have you tried adjusting these to be most permissive? In the case that you cannot get this to work, it may be because of an edge case where the simpler dpsi index step-function calculation disagrees with the more accurate filter of the modulized table, It should still show in the table at maximum permissive filters, though... If you still cannot get it to show, it's also possible to bypass that table by manually typing in the gene_id into the url box, for example http://voila_url:voila_port/gene/<gene_id>

Let me know if any of these options help!

Thanks,

-San

Reply all

Reply to author

Forward