Hello!
I have a question about how de novo flags are defined in MAJIQv3 dpsi output. (to get the dpsi output, I've ran the following commands:
$majiq-v3 gff3 \
gencode.v44.nochr.annotation.gff3 \
majiq_v3_results/annotations/sg.zarr
$majiq-v3 sj 07_REP1-nosgRNA_S7.sorted.bam majiq_v3_results/annotations/sg.zarr majiq_v3_results/sj/07_REP1-nosgRNA_S7.sj
$majiq-v3 psi-coverage majiq_v3_results/build/sg.zarr majiq_v3_results/psi/nosgRNA.psicov \
majiq_v3_results/sj/07_REP1-nosgRNA_S7.sj \
majiq_v3_results/sj/13_REP2-nosgRNA_S12.sj \
majiq_v3_results/sj/19_REP3-nosgRNA_S16.sj
$majiq-v3 quantify majiq_v3_results/psi/nosgRNA.psicov \
--min-experiments 0.5 --splicegraph majiq_v3_results/build/sg.zarr \
--output-tsv majiq_v3_results/psi/nosgRNA.tsv --overwrite
$majiq-v3 deltapsi \
--splicegraph majiq_v3_results/build/sg.zarr \
--output-voila majiq_v3_results/dpsi/nosgRNA-vs-sgNEG.dpsicov \
--output-tsv majiq_v3_results/dpsi/nosgRNA-vs-sgNEG.tsv \
-psi1 majiq_v3_results/psi/nosgRNA.psicov \
-psi2 majiq_v3_results/psi/sgNEG.psicov
The output has four de novo-related columns: `is_denovo`, `ref_exon_denovo`, `other_exon_denovo`, and `event_denovo`. From examples I've looked at, I'm trying to understand exactly what triggers each flag.
For example, I have an event (gene ENSG00000286185, event type 's') where:
- ref_exon: 149543413–149543585
- other_exon: 149548150–149548322
- start-end: 149543434–149548172
- ref_exon_denovo = FALSE, other_exon_denovo = FALSE, but event_denovo = TRUE
The junction start/end fall inside the annotated exon boundaries rather than at their edges, and the event is correctly flagged de novo. This suggests event_denovo captures junction novelty rather than just exon novelty.
However, I have another event (SFI1, event type 's') with a similar coordinate pattern — junction start falls inside the ref exon boundary — but all de novo flags are FALSE. Is the de novo flag purely based on whether the junction was seen in the annotation, and if so, how exactly is a junction matched to annotated junctions given that the reported start/end coordinates don't always align exactly with exon boundaries in the output?
More specifically:
1. What exactly does each de novo flag (`ref_exon_denovo`, `other_exon_denovo`, `event_denovo`) capture?
2. How are the `start`/`end` junction coordinates in the dpsi output defined relative to ref/other exon boundaries — are they raw splice site positions from RNA-seq reads?
3. Is it possible for a junction connecting two annotated exons at a non-canonical splice site to not be flagged as de novo?
Thank you very much in advance for your help with this!
Best,
Jee Min