voila tsv works for the main significant DeltaPsi export, but crashes when --lsv-types is used. Debugging shows the failure is caused by Voila validating LSV types by iterating m.lsvs(), where the generic ViewMatrixType.lsv_ids() yields IDs from Events.ec_idx, while _ViewDeltaPsiZarr.lsv(lsv_id) expects IDs valid in the DeltaPsiDataset index space. Those are different index universes.
As a result, validate_filters() eventually passes an out-of-range ID into _ViewDeltaPsiZarr.lsv(), causing an IndexError.
Observed error
INFO - Validating LSV types filter...
ERROR - index 114793 is out of bounds for axis 0 with size 114793
Environment
Runtime package path on cluster:
rna_voila: /home/svu/koustav.pal/miniconda3/envs/majiq/lib/python3.10/site-packages/rna_voila/__init__.py
view_matrix_zarr.py: /home/svu/koustav.pal/miniconda3/envs/majiq/lib/python3.10/site-packages/rna_voila/api/view_matrix_zarr.py
Tool version observed earlier in logs:
majiq-v3 v3.0.19.dev1+gde48918a
matching Voila dev build in same environment
Reproduction pattern
A plain significant export works:
voila tsv \
--show-read-counts \
--threshold 0.2 \
--probability-threshold 0.95 \
-f ASO4_KD_vs_NC.diff_splicing.significant.tsv \
majiq_diff_out/build/sg.zarr \
majiq_diff_out/deltapsi/ASO4_KD_vs_NC.voila.zarr \
majiq_diff_out/build/ASO4_NC.sgc \
majiq_diff_out/build/ASO4_KD.sgc
The per-type export crashes:
voila tsv \
--show-read-counts \
--threshold 0.2 \
--probability-threshold 0.95 \
--lsv-types a5ss \
-f ASO4_KD_vs_NC.diff_splicing.A5SS.tsv \
majiq_diff_out/build/sg.zarr \
majiq_diff_out/deltapsi/ASO4_KD_vs_NC.voila.zarr \
majiq_diff_out/build/ASO4_NC.sgc \
majiq_diff_out/build/ASO4_KD.sgc
Key debug evidence
From rna_voila/tsv.py:
DEBUG rna_voila_file=/home/svu/koustav.pal/miniconda3/envs/majiq/lib/python3.10/site-packages/rna_voila/__init__.py
DEBUG vmz_file=/home/svu/koustav.pal/miniconda3/envs/majiq/lib/python3.10/site-packages/rna_voila/api/view_matrix_zarr.py
From validate_filters():
DEBUG lsv_ids_impl=ViewMatrixType.lsv_ids module=rna_voila.api.view_matrix_zarr
From _get_events():
DEBUG _get_events class=_ViewDeltaPsiZarr events_class=rna_majiq.core.Events.Events
From ViewMatrixType.lsv_ids() no-gene route:
DEBUG lsv_ids route=A class=_ViewDeltaPsiZarr ec_len=319581 ec_tail=[319576 319577 319578 319579 319580]
From ViewMatrix.lsvs() and _ViewDeltaPsiZarr.lsv():
DEBUG lsvs i=114792 lsv_id=114792
DEBUG deltapsi_concrete_lsv class=_ViewDeltaPsiZarr lsv_id=114792
DEBUG i=114792 lsv_id=114792
DEBUG lsvs i=114793 lsv_id=114793
DEBUG deltapsi_concrete_lsv class=_ViewDeltaPsiZarr lsv_id=114793
At this point the crash occurs before the next DEBUG i=114793 ... line, so the failure happens inside _ViewDeltaPsiZarr.lsv(114793).
From _ViewDeltaPsiZarr.lsv() backing object inspection:
DEBUG deltapsi_lsv_id=0 cov_type=DeltaPsiDataset attrs=['EVENTS_EXPECTED_VARIABLES', 'EXPECTED_VARIABLES', 'bootstrap_logposterior', 'bootstrap_posterior', 'comparison_experiments', 'comparisons', 'cov_events', 'df', 'discrete_logprior', 'discrete_prior', 'event_passed', 'event_size', 'events_df', 'events_to_zarr', 'from_deltapsi', 'from_zarr', 'get_events', 'groups', 'lsv_idx', 'lsv_offsets', 'num_comparisons', 'num_connections', 'num_events', 'num_groups', 'passed', 'prior', 'psibins', 'to_dataframe', 'to_zarr']
This shows _ViewDeltaPsiZarr.lsv() is backed by a DeltaPsiDataset, not the generic Events index space.
Root cause
The bug is in the --lsv-types validation path in rna_voila/tsv.py:
validate_filters() calls m.lsvs()
m.lsvs() uses ViewMatrixType.lsv_ids()
ViewMatrixType.lsv_ids() yields IDs from events.ec_idx
for DeltaPsi zarr views, _ViewDeltaPsiZarr.lsv(lsv_id) expects IDs in the DeltaPsiDataset LSV/event space
these spaces do not match
So validate_filters() is iterating a generic MAJIQ events ID space and feeding those IDs into a DeltaPsi-specific view object. Eventually it emits an ID that is valid in events.ec_idx but out of range for the DeltaPsi arrays, producing:
index 114793 is out of bounds for axis 0 with size 114793
Proposed fix:
ViewMatrixType.lsv_ids() should not yield events.ec_idx for DeltaPsi zarr views
DeltaPsi views need an override that yields IDs in the DeltaPsiDataset local LSV/event space
or validate_filters() should validate types from the already-materialized TSV/LSV objects without crossing index spaces