Hi there,
I'm having an issue with memory with Voila modulize
- Running a pretty large dataset - ~ 1000 samples
Allocating 250GB of memory to the job
Running Voila v3.0.20.dev1+gcd116a77c
voila modulize \
-j 1 --changing-between-group-dpsi 0.1 \
--show-read-counts \
-d "$basedir/output/het_modulized_mem" \
/majiq/output/build/sg.zarr \
majiq/output/het/normal_tumour.hetcov \
/majiq/output/sgc/*.sgc
Get the following error
2026-05-25 13:04:00,426 (PID:3977105) - ERROR - Unable to allocate 2.31 GiB for an array with shape (579, 1070386) and data type float32
Traceback (most recent call last):
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/rna_voila/run_voila.py", line 541, in main
args.func()
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/rna_voila/classify.py", line 29, in __init__
config = ClassifyConfig()
^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/rna_voila/config.py", line 749, in __new__
files, settings = _getInputFilesSet(config_parser, cov_multiarray=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/rna_voila/config.py", line 531, in _getInputFilesSet
files['cov_zarr'][cov_file] = open_cov_wrapper(cov_file, preload=zarr_preload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/rna_voila/api/view_matrix.py", line 119, in open_cov_wrapper
cov = nm.HeterogenDataset.from_zarr(path, preload=preload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/rna_majiq/voila/HeterogenDataset.py", line 268, in from_zarr
return HeterogenDataset(df, events_df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/rna_majiq/voila/HeterogenDataset.py", line 202, in __init__
].sel(prefix=df["prefix_grp"] == grp),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/xarray/core/dataset.py", line 3140, in sel
result = self.isel(indexers=query_results.dim_indexers, drop=drop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/xarray/core/dataset.py", line 2973, in isel
return self._isel_fancy(indexers, drop=drop, missing_dims=missing_dims)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/xarray/core/dataset.py", line 3029, in _isel_fancy
new_var = var.isel(indexers=var_indexers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/xarray/core/variable.py", line 1033, in isel
return self[key]
~~~~^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/xarray/core/variable.py", line 800, in __getitem__
data = indexing.apply_indexer(indexable, indexer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/xarray/core/indexing.py", line 1027, in apply_indexer
return indexable.oindex[indexer]
~~~~~~~~~~~~~~~~^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/xarray/core/indexing.py", line 367, in __getitem__
return self.getter(key)
^^^^^^^^^^^^^^^^
File "/mnt/homes/hmg/envs/majiq/lib/python3.12/site-packages/xarray/core/indexing.py", line 1504, in _oindex_get
return self.array[key]
~~~~~~~~~~^^^^^
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 2.31 GiB for an array with shape (579, 1070386) and data type float32
I have tried running using the -lazy-load flag, but this is still running and is taking 2 days + already.
Is it simply that my dataset size is too great to run normally? I would've hoped that 250GB would be enough to tackle it.
Thank you