voila modulize fails on heterogen output when using separate replicates (but works with combined psicov)

14 views
Skip to first unread message

Andranik Ivanov

unread,
Sep 14, 2025, 7:13:27 AM (7 days ago) Sep 14
to Biociphers

Hi everyone,

I'm running into an issue when trying to run voila modulize on heterogen results from MAJIQ v3.


If I combine all my WT replicates and all my KO replicates into combined psicov files and then run majiq heterogen, voila modulize runs without any errors.

example of the working script

majiq-v3 sj star_2pass.KO1.Aligned.sortedByCoord.out.bam KO1.sj --prefix KO1 -- --overwrite
majiq-v3 sj star_2pass.KO2.Aligned.sortedByCoord.out.bam KO2.sj --prefix KO2 -- --overwrite
majiq-v3 sj star_2pass.KO3.Aligned.sortedByCoord.out.bam KO3.sj --prefix KO3 -- --overwrite

majiq-v3 sj star_2pass.WT1.Aligned.sortedByCoord.out.bam WT1.sj --prefix WT1 -- --overwrite
majiq-v3 sj star_2pass.WT2.Aligned.sortedByCoord.out.bam WT2.sj --prefix WT2 -- --overwrite
majiq-v3 sj star_2pass.WT3.Aligned.sortedByCoord.out.bam WT3.sj --prefix WT3 -- --overwrite

majiq-v3 psi-coverage sg.zarr WT.psicov WT1.sj WT2.sj WT3.sj --overwrite
majiq-v3 quantify WT.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv WT.tsv --overwrite

majiq-v3 psi-coverage sg.zarr KO.psicov KO1.sj KO2.sj KO3.sj --overwrite
majiq-v3 quantify KO.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv KO.tsv --overwrite

majiq-v3 heterogen --stats infoscore mannwhitneyu ttest tnom \
  --splicegraph sg.zarr --output-voila KO_WT.hetcov --output-tsv KO_WT.tsv \
  -psi1 KO.psicov -psi2 WT.psicov --overwrite

voila modulize sg.zarr KO_WT.hetcov KO1.sgc KO2.sgc KO3.sgc WT1.sgc WT2.sgc WT3.sgc \
  -d voila_modulizer --debug --debug-num-genes 100

If I instead generate separate psicov and tsv files per replicate and run heterogen using them, the voila modulize step fails for all genes with the following error:

WARNING - Some error processing gene ENSMUSG00000102269.1 , turn on --debug for more info
Traceback (most recent call last):
  ...
ValueError: 'star_2pass.WT1.Aligned.sortedByCoord.out' is not in list

majiq-v3 psi-coverage sg.zarr WT1.psicov WT1.sj --overwrite
majiq-v3 psi-coverage sg.zarr WT2.psicov WT2.sj --overwrite
majiq-v3 psi-coverage sg.zarr WT3.psicov WT3.sj --overwrite

majiq-v3 quantify WT1.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv WT1.tsv --overwrite
majiq-v3 quantify WT2.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv WT2.tsv --overwrite
majiq-v3 quantify WT3.psicov --min-experiments 0.01 --splicegraph sg.zarr --output-tsv WT3.tsv --overwrite

majiq-v3 heterogen --stats infoscore mannwhitneyu ttest tnom \
  --splicegraph sg.zarr --output-voila KO_WT.hetcov --output-tsv KO_WT.tsv \
  -psi1 KO1.psicov KO2.psicov KO3.psicov \
  -psi2 WT1.psicov WT2.psicov WT3.psicov --overwrite

voila modulize sg.zarr KO_WT.hetcov KO1.sgc KO2.sgc KO3.sgc WT1.sgc WT2.sgc WT3.sgc \
  -d voila_modulizer --debug --debug-num-genes 100

This gives the error shown above (ValueError: 'star_2pass.WT1.Aligned.sortedByCoord.out' is not in list).

Question
  1. What is causing voila modulize to fail when using separate per-replicate psicov files for heterogen, while it works fine when using combined psicov files?

  2. Conceptually, what are the expected result differences (if any) between running heterogen on:

    • combined psicov files per condition (WT.psicov vs KO.psicov)

    • versus using each replicate’s psicov file separately (WT1, WT2, WT3 vs KO1, KO2, KO3)?

Thanks a lot for any help or clarification!


San Jewell

unread,
Sep 15, 2025, 11:23:31 AM (6 days ago) Sep 15
to Biociphers
Hi Aandranik,

In general there is not a difference in running psi-coverage on one vs multiple prefixes at a time, though this would allow you to specify certain filters/options on one group of samples and not another. It looks like the way you have run it, the output psi should be identical for either the grouped or ungrouped run.

I have just tested two identical runs using groups of three sj combined and separate as you have, and then run modulizer on each. So far, I have not been able to reproduce the error message. Can you verify that you have the latest version of majiq v3 installed by running the pip install command once again? IF the error still persists I may need to attempt to reproduce from your input data, would you be comfortable to share a minimum reproducable example with me?

Thanks,
-San

Andranik Ivanov

unread,
Sep 18, 2025, 11:48:49 AM (3 days ago) Sep 18
to Biociphers

Hi San,

Thanks a lot for looking into this and testing it out. I had a quick follow-up question to make sure I understand the implications correctly. My current MAJIQ version is 3.0.7.dev1+g7ff3f711. I reran pip install to ensure it’s up to date, but this did not resolve the error when using separate replicate .psicov files — voila modulize still fails. If there’s no difference in how the downstream statistics are calculated, though, then I’m happy to just use the combined .psicov files (which do work with modulizer). Specifically, could you confirm: when I combine all replicates from one condition into a single .psicov (e.g. WT.psicov and KO.psicov) and run heterogen on just those two files, am I losing replicate-level information for the statistical tests (Mann–Whitney U, t-test, etc.)? Or are those tests still able to use replicate variability somehow from the combined file? I just want to be sure that combining replicates doesn’t flatten the variability and thereby weaken or bias the statistical analysis.

Thanks again for your help and clarification!

Best
Andranik

San Jewell

unread,
Sep 18, 2025, 3:02:12 PM (3 days ago) Sep 18
to Biociphers
Hi Andranik,

For my first analysis I was only comparing computed psi values. It looks like I'm seeing some differences in statistics, specifically in ttest values, between the combined and separate psi coverage runs. As this is not an area of the software I'm as familiar with, I'm going to ask some members of my lab about the expected behavior and documentation for running the analysis in both ways you have mentioned.

Whether or not that is the case, though, I do believe there should be no error in running modulizer. I find the specific message you posted to be very odd as the naming star_2pass.WT1... should have already been overridden by your prefix argument and thus not even in the data downstream. I do have one more thing for you to try before potentially sharing data, though. Can you try to give the groups of your heterogen run group names? For example $ majiq-v3 heterogen -n KO WT  (for -psi1 being KO files and -psi2 being WT files) ; can you check if it makes any difference this way?

Thanks,
-San
Reply all
Reply to author
Forward
0 new messages