MAJIQ Build question

Maoting Chen

unread,

Jan 26, 2026, 11:59:39 AMJan 26

to Biociphers

Hi,

I have a sequencing dataset consisting of 4 conditions and each with 4 replicates. My goal is to compare the alternative splicing among the 4 conditions.

I have a question regarding how MAJIQ build function deals with the dataset if I run the build command on all 16 samples together with grouping info specified (there will be 4 build groups then). According to the documentation, only one splicegraph will be generated. Then for one LSV that passes the filters in one build group but not in another, will this LSV still in the slicegraph? Or only the LSVs that pass the filters in all build groups will be in the splicagraph?

If the LSV that only passes in one build group is in the splicegraph, then for the downstream dPSI calculation between different build groups, HOW to calculate the splicing change given that this LSV is omitted in other conditions?

If I want to get reasonable PSIs and dPSIs for all LSVs from all four conditions, how should I perform the build command?

Really appreciate your help in advance!

Thank you,

Maoting

Matthew Gazzara

unread,

Jan 26, 2026, 12:45:07 PMJan 26

to Biociphers

Hi Maoting,

Thanks for your interest. Yes, building all 16 samples together with grouping according to experimental groups is the way to go in this case. This ensures the LSV definitions (including LSV ID strings, junctions in the splice graph, ordering of junctions in the output) will be the same in the downstream outputs (when they make it to the outputs, see below) and are thus more easily comparable.

To decide which LSVs are in the data and which junctions are associated with each LSV there are filtering heuristics applied at the build stage (is the LSV/junction reliably detected in at least one group) and the quantification stage (is the LSV quantifiable in this group (PSI) or in both groups (dPSI or HET)). The reliable thresholds just have to be met in at least one group for that LSV or de novo junction to be a part of the splicegraph. If the LSV was not reliably detected in other groups, when you run quantifications on that LSV that involve that other group, the LSV will not be output there, although it still exists in the splicegraph and has the same junctions, exon boundaries, etc.

In the dPSI quantification step the quantifiable filters must be met in both groups being compared for the LSV to make it to the output. You could run PSI on each experimental group separately if you want quantifications for every LSV in that group or capture any that are unique to a specific group. In my experience the heuristics and filters help clean out low read coverage events, but if you are really interested in detecting and quantifying everything even with low read evidence you can consider altering these filters or changing the minimum number of experiments you need for the LSV to be reliably detected at the build stage. By default this is 51% of your samples (so it would be 3 out of 4), so you could consider changing this to 2 or even 1 sample if you want to be maximally permissive.

Hope this helps and feel free to follow up if anything was unclear.

All the best,

Matt Gazzara

Chen, Maoting

unread,

Jan 26, 2026, 2:20:42 PMJan 26

to Matthew Gazzara, Biociphers

Hi Mathew,

Thank you for your prompt response.

I do want to have a systematic comparison of splicing profiles across all four conditions. Therefore, it would be nice to have a common splicegraph and the LSVs are shown up in all samples.

Therefore, do you think if it's reasonable that I build all my 16 samples as 1 group, and set the minimum number of experiments as 16 to make sure the detected LSVs exist in all samples while lower the --minreads as 1 to loosen the filter? And then introduce the group information later in the quantification step?

Also, if I want to get the PSI values of a LSV in each sample in addition to the meanPSI of a condition, how should I do that?

Thank you,

Maoting

Matthew Gazzara <mgaz...@biociphers.org> 于2026年1月26日周一 12:45写道：

--
You received this message because you are subscribed to a topic in the Google Groups "Biociphers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/majiq_voila/OsJREHLmjy8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to majiq_voila...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/majiq_voila/e5d72993-2806-4ae7-b506-a66797f620a8n%40googlegroups.com.

Maoting Chen

unread,

Jan 27, 2026, 2:07:09 PMJan 27

to Biociphers

Hi Mathew,

Thank you for your prompt response.

I do want to have a systematic comparison of splicing profiles across all four conditions. Therefore, it would be nice to have a common splicegraph and the LSVs are shown up in all samples.

Therefore, do you think if it's reasonable that I build all my 16 samples as 1 group, and set the minimum number of experiments as 16 to make sure the detected LSVs exist in all samples while lower the --minreads as 1 to loosen the filter? And then introduce the group information later in the quantification step?

Also, if I want to get the PSI values of a LSV in each sample in addition to the meanPSI of a condition, how should I do that?

Thank you,

Maoting

San Jewell

unread,

Jan 28, 2026, 2:39:29 PMJan 28

to Biociphers

Hi Maoting,

Are you using v2 or v3? In general most of the downstream outputs do summarize over groups. This can be sidestepped to show per sample psi by just having each sample in its own group. Additionally, there was a one-off use case added in for $ voila tsv mode with the switch "--show-per-sample-psi" ; but it was only implemented for heterogen mode -- this can be added to other modes in the future if this is desired and considered useful. Finally, if you using voila view mode, individual samples appear as a swarm plot over top of the violin plots, and hovering over each point in the swarm will show the per-sample psi.

Let me know if this helps.

Matt can perhaps further answer as to the --min-experiments 16 question ; it sounds stricter than most filters I've seen used in practice but perhaps for some use cases that makes sense.

Thanks,

-San

Maoting Chen

unread,

Jan 29, 2026, 12:00:02 PMJan 29

to Biociphers

Hi San,

Thank you for your explanation. It helps a lot. I am using v2.

I have a quick question: for a LSV consisting of a DB annoated junction but with 0 read count detected in the sequencing but the other junction passes the filters, will this LSV still be reported and quantified in the downstream analysis?

Also I would like to confirm that for a LSV reported in one group/replicate but not the other, the only reason is that this LSV passes the filters in that group/replicate but doesn't pass in the other, correct?

And looking forward to more comments on the --min-experiments 16 question. Thank you!

Best,

Maoting

San Jewell

unread,

Jan 29, 2026, 4:11:33 PMJan 29

to Biociphers

Hi Maoting,

-If there is a zero read junction and only one other annotated/denovo junction on a potential LSV location, this is still enough to create an LSV definition, as we treat the annotation as indicating some initial prior of evidence. However, the psi value for the no-read junction will be so low that the default downstream filters will certainly remove it in most cases. One notable change from v2 to v3 is actually that in v2, annotated introns are not given this default weight, whereas in v3 they are. However, the behavior of junctions between v2 and v3 is the same in this regard.
-The LSV creation logic in general looks for the pattern of the LSV to exist in at least one group/replicate, in order for it to appear in the list of lsvs. (the other groups may not have it quantified, but during the build, calculated LSVs are pulled into a consensus across all groups for downstream quantification)

Thanks,

-San

Maoting Chen

unread,

Feb 24, 2026, 9:16:48 PM (2 days ago) Feb 24

to Biociphers

Hi MAJIQ team,

Thank you for the detailed explanations for my questions. I have another two quick questions.

1. Related to my previous question about majiq build, for my 16 samples (from 4 conditions and 4 reps per condition), is there any major difference between building by group (set MIN_EXP as 0.5) and building all 16 samples (set MIN_EXP as 3 to include potential condition specific LSV), except for the latter approach having looser restraints? Which approach is more prevalent for AS analysis.

2. Regarding to the reproducibility of MAJIQ analysis, are the LSVs and PSI for each LSV reproducible when analysis is repeated? Or there will be some randomness during PSI quantification?