voila tsv --lsv-types crashes on DeltaPsi zarr inputs due to mismatched LSV ID space in validate_filters()

11 views
Skip to first unread message

Koustav Pal

unread,
Apr 9, 2026, 11:40:37 PMApr 9
to majiq...@googlegroups.com
voila tsv works for the main significant DeltaPsi export, but crashes when --lsv-types is used. Debugging shows the failure is caused by Voila validating LSV types by iterating m.lsvs(), where the generic ViewMatrixType.lsv_ids() yields IDs from Events.ec_idx, while _ViewDeltaPsiZarr.lsv(lsv_id) expects IDs valid in the DeltaPsiDataset index space. Those are different index universes.

As a result, validate_filters() eventually passes an out-of-range ID into _ViewDeltaPsiZarr.lsv(), causing an IndexError.

Observed error

INFO - Validating LSV types filter...
ERROR - index 114793 is out of bounds for axis 0 with size 114793

Environment
Runtime package path on cluster:

rna_voila: /home/svu/koustav.pal/miniconda3/envs/majiq/lib/python3.10/site-packages/rna_voila/__init__.py
view_matrix_zarr.py: /home/svu/koustav.pal/miniconda3/envs/majiq/lib/python3.10/site-packages/rna_voila/api/view_matrix_zarr.py
Tool version observed earlier in logs:

majiq-v3 v3.0.19.dev1+gde48918a
matching Voila dev build in same environment

Reproduction pattern
A plain significant export works:

voila tsv \
  --show-read-counts \
  --threshold 0.2 \
  --probability-threshold 0.95 \
  -f ASO4_KD_vs_NC.diff_splicing.significant.tsv \
  majiq_diff_out/build/sg.zarr \
  majiq_diff_out/deltapsi/ASO4_KD_vs_NC.voila.zarr \
  majiq_diff_out/build/ASO4_NC.sgc \
  majiq_diff_out/build/ASO4_KD.sgc

  The per-type export crashes:

voila tsv \
  --show-read-counts \
  --threshold 0.2 \
  --probability-threshold 0.95 \
  --lsv-types a5ss \
  -f ASO4_KD_vs_NC.diff_splicing.A5SS.tsv \
  majiq_diff_out/build/sg.zarr \
  majiq_diff_out/deltapsi/ASO4_KD_vs_NC.voila.zarr \
  majiq_diff_out/build/ASO4_NC.sgc \
  majiq_diff_out/build/ASO4_KD.sgc

Key debug evidence
From rna_voila/tsv.py:

DEBUG rna_voila_file=/home/svu/koustav.pal/miniconda3/envs/majiq/lib/python3.10/site-packages/rna_voila/__init__.py
DEBUG vmz_file=/home/svu/koustav.pal/miniconda3/envs/majiq/lib/python3.10/site-packages/rna_voila/api/view_matrix_zarr.py
From validate_filters():

DEBUG lsv_ids_impl=ViewMatrixType.lsv_ids module=rna_voila.api.view_matrix_zarr
From _get_events():

DEBUG _get_events class=_ViewDeltaPsiZarr events_class=rna_majiq.core.Events.Events
From ViewMatrixType.lsv_ids() no-gene route:

DEBUG lsv_ids route=A class=_ViewDeltaPsiZarr ec_len=319581 ec_tail=[319576 319577 319578 319579 319580]
From ViewMatrix.lsvs() and _ViewDeltaPsiZarr.lsv():

DEBUG lsvs i=114792 lsv_id=114792
DEBUG deltapsi_concrete_lsv class=_ViewDeltaPsiZarr lsv_id=114792
DEBUG i=114792 lsv_id=114792
DEBUG lsvs i=114793 lsv_id=114793
DEBUG deltapsi_concrete_lsv class=_ViewDeltaPsiZarr lsv_id=114793
At this point the crash occurs before the next DEBUG i=114793 ... line, so the failure happens inside _ViewDeltaPsiZarr.lsv(114793).

From _ViewDeltaPsiZarr.lsv() backing object inspection:

DEBUG deltapsi_lsv_id=0 cov_type=DeltaPsiDataset attrs=['EVENTS_EXPECTED_VARIABLES', 'EXPECTED_VARIABLES', 'bootstrap_logposterior', 'bootstrap_posterior', 'comparison_experiments', 'comparisons', 'cov_events', 'df', 'discrete_logprior', 'discrete_prior', 'event_passed', 'event_size', 'events_df', 'events_to_zarr', 'from_deltapsi', 'from_zarr', 'get_events', 'groups', 'lsv_idx', 'lsv_offsets', 'num_comparisons', 'num_connections', 'num_events', 'num_groups', 'passed', 'prior', 'psibins', 'to_dataframe', 'to_zarr']
This shows _ViewDeltaPsiZarr.lsv() is backed by a DeltaPsiDataset, not the generic Events index space.

Root cause
The bug is in the --lsv-types validation path in rna_voila/tsv.py:

validate_filters() calls m.lsvs()
m.lsvs() uses ViewMatrixType.lsv_ids()
ViewMatrixType.lsv_ids() yields IDs from events.ec_idx
for DeltaPsi zarr views, _ViewDeltaPsiZarr.lsv(lsv_id) expects IDs in the DeltaPsiDataset LSV/event space
these spaces do not match
So validate_filters() is iterating a generic MAJIQ events ID space and feeding those IDs into a DeltaPsi-specific view object. Eventually it emits an ID that is valid in events.ec_idx but out of range for the DeltaPsi arrays, producing:

index 114793 is out of bounds for axis 0 with size 114793

Proposed fix:

ViewMatrixType.lsv_ids() should not yield events.ec_idx for DeltaPsi zarr views
DeltaPsi views need an override that yields IDs in the DeltaPsiDataset local LSV/event space
or validate_filters() should validate types from the already-materialized TSV/LSV objects without crossing index spaces

San Jewell

unread,
Apr 10, 2026, 1:10:26 PM (14 days ago) Apr 10
to Biociphers
Hello, 

Thank you for the full analysis you have done. However, the --lsv-types as indicated in voila tsv have been fully deprecated by the more accurate and full-featured types output by $ voila modulize  ; I would highly recommend that you use this version for your analysis as this tsv switch is unmaintained and will likely be removed in a future version. 

-San
Reply all
Reply to author
Forward
0 new messages