Hi Alex,
I will start by describing the column names you were curious about:
event_size: the difference in coordinates between the beginning of the first exon and the end of the last exon in the event
junction_changing: given the thresholds provided on the commandline or their defaults (see $ voila modulize --help), is this junction confidently changing, when looking at the combination of DPSI and HET input files. Changing requires at least one of these quantified inputs to pass the between-group-psi and pvalue thresholds to be marked as true
event_non_changing/event_changing: similar to above, except rather than for the specific junction/row, this is summarized over the event. The logic is as follows:
-to be marked as event changing = true, at least one junction in the event must be confidently changing according to the thresholds mentioned earlier
-to be marked as event non changing = true, ALL of the junctions in the event must be confidently non changing according to the non-changing thresholds
For the second question about the lsv_id ; each row in these output functions corresponds to one junction, from one "perspective", which should, in the majority of cases, limit the lsv_id column to only one lsv id being output. For example, even though a cassette event has only three junctions, there are four rows output, because the skipping junction is part of a 5' source lsv and also a 3' target LSV. In this way, one can see the quantifications for both sides of the junction. The coordinate part of the LSV_ID is the coordinates of the reference exon in the splicing event. For example, if you look at the reference exon coordinates in the file you attached, you can see that they match the latter part of the lsv_id. Does this answer your question? Let me know when you have a chance.
Thanks!
-San
,