Error Running Anisotropy correction

76 views
Skip to first unread message

Rajiv Ranjan Singh

unread,
Feb 6, 2025, 6:55:58 PM2/6/25
to spis...@googlegroups.com
Hi All,

I am getting the following error when trying to do the anisotropy correction.

"

Traceback (most recent call last):

  File "/home/spuser/miniconda3/envs/spisonet/bin/spisonet.py", line 8, in <module>

    sys.exit(main())

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/spisonet.py", line 549, in main

    fire.Fire(ISONET)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 143, in Fire

    component_trace = _Fire(component, args, parsed_flag_args, context, name)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire

    component, remaining_args = _CallAndUpdateTrace(

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace

    component = fn(*varargs, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/spisonet.py", line 182, in reconstruct

    map_refine_n2n(halfmap1,halfmap2, mask_vol, fsc3d, alpha = alpha,beta=beta,  voxel_size=voxel_size, output_dir=output_dir, 

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/map_refine.py", line 145, in map_refine_n2n

    network.train([data_dir_1,data_dir_2], output_dir, alpha=alpha,beta=beta, output_base=output_base0, batch_size=batch_size, epochs = epochs, steps_per_epoch = 1000, 

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/network_n2n.py", line 265, in train

    mp.spawn(ddp_train, args=(self.world_size, self.port_number, self.model,alpha,beta,

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 241, in spawn

    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes

    while not context.join():

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 158, in join

    raise ProcessRaisedException(msg, error_index, failed_process.pid)

torch.multiprocessing.spawn.ProcessRaisedException: 


-- Process 0 terminated with the following error:

Traceback (most recent call last):

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 68, in _wrap

    fn(i, *args)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/network_n2n.py", line 151, in ddp_train

    pred_y = model(data_e)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

    return forward_call(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1523, in forward

    else self._run_ddp_forward(*inputs, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward

    return self.module(*inputs, **kwargs)  # type: ignore[index]

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

    return forward_call(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/unet.py", line 98, in forward

    x = self.decoder(x, down_sampling_features)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

    return forward_call(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/unet.py", line 66, in forward

    x=op(x)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

    return forward_call(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/unet.py", line 28, in forward

    return self.net(x)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

    return forward_call(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward

    input = module(input)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

    return forward_call(*args, **kwargs)

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 767, in forward

    return sync_batch_norm.apply(

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply

    return super().apply(*args, **kwargs)  # type: ignore[misc]

  File "/home/spuser/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/_functions.py", line 108, in forward

    return torch.batch_norm_elemt(input, weight, bias, mean, invstd, eps)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 10.91 GiB of which 13.06 MiB is free. Including non-PyTorch memory, this process has 10.42 GiB memory in use. Of the allocated memory 10.10 GiB is allocated by PyTorch, and 98.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

"


Any help to troubleshoot would be highly appreciated.

Best,
Rajiv Ranjan Singh

YUNTAO LIU

unread,
Feb 9, 2025, 1:15:29 AM2/9/25
to Rajiv Ranjan Singh, spis...@googlegroups.com
Hi Rajiv,

It seems that you do not have sufficient GPU memory. Please consider setting the --acc_batches to 2 or 4, to reduce the VRAM consumption. 

--
You received this message because you are subscribed to the Google Groups "spIsoNet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spisonet+u...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/spisonet/CAOnWSxRYj600xZ9qMZaPZiB2XOR4XT_JkWk63aY-3SEO5DK8qg%40mail.gmail.com.


--
Best Regards,
Yuntao Liu,  Postdoc.

California NanoSystem Institute
University of California Los Angeles
Message has been deleted

francois hoh

unread,
Jul 11, 2025, 2:40:22 PM7/11/25
to spIsoNet
Hi
I tried both value of acc_batches and have the same error than Rajiv..
Thank's

François Hoh
Reply all
Reply to author
Forward
0 new messages