voila modulize fails on heterogen output when using separate replicates (but works with combined psicov)

Andranik Ivanov

unread,

Sep 14, 2025, 7:13:27 AMSep 14

to Biociphers

Hi everyone,

I'm running into an issue when trying to run voila modulize on heterogen results from MAJIQ v3.

If I combine all my WT replicates and all my KO replicates into combined psicov files and then run majiq heterogen, voila modulize runs without any errors.

example of the working script

majiq-v3 sj star_2pass.KO1.Aligned.sortedByCoord.out.bam KO1.sj --prefix KO1 -- --overwrite
majiq-v3 sj star_2pass.KO2.Aligned.sortedByCoord.out.bam KO2.sj --prefix KO2 -- --overwrite
majiq-v3 sj star_2pass.KO3.Aligned.sortedByCoord.out.bam KO3.sj --prefix KO3 -- --overwrite

majiq-v3 sj star_2pass.WT1.Aligned.sortedByCoord.out.bam WT1.sj --prefix WT1 -- --overwrite
majiq-v3 sj star_2pass.WT2.Aligned.sortedByCoord.out.bam WT2.sj --prefix WT2 -- --overwrite
majiq-v3 sj star_2pass.WT3.Aligned.sortedByCoord.out.bam WT3.sj --prefix WT3 -- --overwrite

majiq-v3 psi-coverage sg.zarr WT.psicov WT1.sj WT2.sj WT3.sj --overwrite
majiq-v3 quantify WT.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv WT.tsv --overwrite

majiq-v3 psi-coverage sg.zarr KO.psicov KO1.sj KO2.sj KO3.sj --overwrite
majiq-v3 quantify KO.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv KO.tsv --overwrite

majiq-v3 heterogen --stats infoscore mannwhitneyu ttest tnom \
--splicegraph sg.zarr --output-voila KO_WT.hetcov --output-tsv KO_WT.tsv \
-psi1 KO.psicov -psi2 WT.psicov --overwrite

voila modulize sg.zarr KO_WT.hetcov KO1.sgc KO2.sgc KO3.sgc WT1.sgc WT2.sgc WT3.sgc \
-d voila_modulizer --debug --debug-num-genes 100

If I instead generate separate psicov and tsv files per replicate and run heterogen using them, the voila modulize step fails for all genes with the following error:

WARNING - Some error processing gene ENSMUSG00000102269.1 , turn on --debug for more info
Traceback (most recent call last):
...
ValueError: 'star_2pass.WT1.Aligned.sortedByCoord.out' is not in list

majiq-v3 psi-coverage sg.zarr WT1.psicov WT1.sj --overwrite
majiq-v3 psi-coverage sg.zarr WT2.psicov WT2.sj --overwrite
majiq-v3 psi-coverage sg.zarr WT3.psicov WT3.sj --overwrite

majiq-v3 quantify WT1.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv WT1.tsv --overwrite
majiq-v3 quantify WT2.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv WT2.tsv --overwrite
majiq-v3 quantify WT3.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv WT3.tsv --overwrite

majiq-v3 heterogen --stats infoscore mannwhitneyu ttest tnom \
--splicegraph sg.zarr --output-voila KO_WT.hetcov --output-tsv KO_WT.tsv \
-psi1 KO1.psicov KO2.psicov KO3.psicov \
-psi2 WT1.psicov WT2.psicov WT3.psicov --overwrite

voila modulize sg.zarr KO_WT.hetcov KO1.sgc KO2.sgc KO3.sgc WT1.sgc WT2.sgc WT3.sgc \
-d voila_modulizer --debug --debug-num-genes 100

This gives the error shown above (ValueError: 'star_2pass.WT1.Aligned.sortedByCoord.out' is not in list).

Question

What is causing voila modulize to fail when using separate per-replicate psicov files for heterogen, while it works fine when using combined psicov files?
Conceptually, what are the expected result differences (if any) between running heterogen on:
- combined psicov files per condition (WT.psicov vs KO.psicov)
- versus using each replicate’s psicov file separately (WT1, WT2, WT3 vs KO1, KO2, KO3)?

Thanks a lot for any help or clarification!

San Jewell

unread,

Sep 15, 2025, 11:23:31 AMSep 15

to Biociphers

Hi Aandranik,

In general there is not a difference in running psi-coverage on one vs multiple prefixes at a time, though this would allow you to specify certain filters/options on one group of samples and not another. It looks like the way you have run it, the output psi should be identical for either the grouped or ungrouped run.

I have just tested two identical runs using groups of three sj combined and separate as you have, and then run modulizer on each. So far, I have not been able to reproduce the error message. Can you verify that you have the latest version of majiq v3 installed by running the pip install command once again? IF the error still persists I may need to attempt to reproduce from your input data, would you be comfortable to share a minimum reproducable example with me?

Thanks,

-San

Andranik Ivanov

unread,

Sep 18, 2025, 11:48:49 AMSep 18

to Biociphers

Hi San,

Thanks a lot for looking into this and testing it out. I had a quick follow-up question to make sure I understand the implications correctly. My current MAJIQ version is 3.0.7.dev1+g7ff3f711. I reran pip install to ensure it’s up to date, but this did not resolve the error when using separate replicate .psicov files — voila modulize still fails. If there’s no difference in how the downstream statistics are calculated, though, then I’m happy to just use the combined .psicov files (which do work with modulizer). Specifically, could you confirm: when I combine all replicates from one condition into a single .psicov (e.g. WT.psicov and KO.psicov) and run heterogen on just those two files, am I losing replicate-level information for the statistical tests (Mann–Whitney U, t-test, etc.)? Or are those tests still able to use replicate variability somehow from the combined file? I just want to be sure that combining replicates doesn’t flatten the variability and thereby weaken or bias the statistical analysis.

Thanks again for your help and clarification!

Best
Andranik

San Jewell

unread,

Sep 18, 2025, 3:02:12 PMSep 18

to Biociphers

Hi Andranik,

For my first analysis I was only comparing computed psi values. It looks like I'm seeing some differences in statistics, specifically in ttest values, between the combined and separate psi coverage runs. As this is not an area of the software I'm as familiar with, I'm going to ask some members of my lab about the expected behavior and documentation for running the analysis in both ways you have mentioned.

Whether or not that is the case, though, I do believe there should be no error in running modulizer. I find the specific message you posted to be very odd as the naming star_2pass.WT1... should have already been overridden by your prefix argument and thus not even in the data downstream. I do have one more thing for you to try before potentially sharing data, though. Can you try to give the groups of your heterogen run group names? For example $ majiq-v3 heterogen -n KO WT (for -psi1 being KO files and -psi2 being WT files) ; can you check if it makes any difference this way?

Thanks,

-San

bsl...@seas.upenn.edu

unread,

Oct 10, 2025, 2:39:49 PMOct 10

to Biociphers

To follow up, no, you do not lose replicate-level information for statistical tests by combining experiments into one psicov file. However, you would need to explicitly declare the groups to be compared using --select-grp1-prefixes and --select-grp2-prefixes (see majiq heterogen --help).

In general, different runs of majiq on identical inputs can yield results which are numerically slightly different. The reason is that majiq intentionally uses random sampling in its execution. Moreover, when more than one thread is used, it is not currently possible to get numerically identical results even with the same random seed. However, the differences due to this random sampling are relatively small. For example, in my tests, the 95th percentile absolute change in delta-psi is 0.0044. On request, I can provide more stats and/or information about randomization in majiq.

Best Regards,

Barry

Andranik Ivanov

unread,

Nov 10, 2025, 11:42:38 AMNov 10

to Biociphers

Dear Barry,

what happens if I dont specify --select-grp1-prefixes and --select-grp2-prefixes in case of combined psicov files?Basically specifying only -psi1 KO.psicov -psi2 WT.psicov.Would majiq treat this as one replicate per group?

Thanks for your help.

Best

Andranik

Barry Slaff

unread,

Nov 10, 2025, 11:47:52 AMNov 10

to Andranik Ivanov, Biociphers

In that case MAJIQ will treat all the experiments in KO as one group and all in WT as another group and compare the two groups. (No, not “one replicate per group.”)

Barry

--
You received this message because you are subscribed to the Google Groups "Biociphers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to majiq_voila...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/majiq_voila/1c6f29dc-4efc-4245-a9c0-a47e99b9f936n%40googlegroups.com.

Reply all

Reply to author

Forward