modulized output nomenclature unclear

11 views
Skip to first unread message

Alex PG

unread,
Feb 12, 2026, 11:48:43 AM (14 days ago) Feb 12
to Biociphers
Hi Majiq community,

I am currently analyzing MAJIQ results from cancer RNA-seq data compared to normal tissue and have a few questions regarding the definitions of specific terms in some of the output TSV files.
I reviewed the documentation page. The output overview lists these terms, but I was not able to find detailed definitions. If the definitions are described elsewhere, I would greatly appreciate it if you could kindly point me to the appropriate link.

For the cassette.tsv file, do you mind giving the definitions for the following terms?

event_size, event_non_changing, event_changing, junction_changing

In one case, I observed that a cassette module reports different results for what appears to be the same splicing event, so I would like to better understand how these terms are defined and calculated.

For the alt3prime.tsv file, I have a question about how lsv_id is defined. In the cassette file, the lsv_id appears to correspond to the gene ID and the coordinates of the reference exon. However, in the alt3prime file, the coordinate included in the lsv_id does not seem to match any other coordinates listed for that event, so I am unsure how it is determined.

I have attached an Excel file with example events (each event type in a separate sheet) for reference.

Thank you very much for your help.

All the best,
Alex
examples_majiq.xlsx

San Jewell

unread,
Feb 20, 2026, 3:45:10 PM (6 days ago) Feb 20
to Biociphers
Hi Alex, 

I will start by describing the column names you were curious about:

event_size: the difference in coordinates between the beginning of the first exon and the end of the last exon in the event
junction_changing: given the thresholds provided on the commandline or their defaults (see $ voila modulize --help), is this junction confidently changing, when looking at the combination of DPSI and HET input files. Changing requires at least one of these quantified inputs to pass the between-group-psi and pvalue thresholds to be marked as true
event_non_changing/event_changing: similar to above, except rather than for the specific junction/row, this is summarized over the event. The logic is as follows:
    -to be marked as event changing = true, at least one junction in the event must be confidently changing according to the thresholds mentioned earlier
    -to be marked as event non changing = true, ALL of the junctions in the event must be confidently non changing according to the non-changing thresholds

For the second question about the lsv_id ; each row in these output functions corresponds to one junction, from one "perspective", which should, in the majority of cases, limit the lsv_id column to only one lsv id being output. For example, even though a cassette event has only three junctions, there are four rows output, because the skipping junction is part of a 5' source lsv and also a 3' target LSV. In this way, one can see the quantifications for both sides of the junction. The coordinate part of the LSV_ID is the coordinates of the reference exon in the splicing event. For example, if you look at the reference exon coordinates in the file you attached, you can see that they match the latter part of the lsv_id. Does this answer your question? Let me know when you have a chance. 

Thanks!
-San


Reply all
Reply to author
Forward
0 new messages