SpatialCrossEntropy fails in device side assertion error

575 views
Skip to first unread message

Morpheus

unread,
Mar 9, 2018, 10:58:25 AM3/9/18
to torch7
I have a network which predicts segmentation labels. The network takes an RGB input of 256x256 and outputs a segmented image of size 256x256x16. Where the 16 channels corresponds to each class labels. I am using the cudnn.SpatialCrossEntropy on a batch size of 4. The forward pass works, but the network fails in the criterion's forward pass with the following:

/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [823,0,0] Assertion `t >= 0 && t < n_classes` failed.
/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [824,0,0] Assertion `t >= 0 && t < n_classes` failed.
/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [825,0,0] Assertion `t >= 0 && t < n_classes` failed.
/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [826,0,0] Assertion `t >= 0 && t < n_classes` failed.
/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [827,0,0] Assertion `t >= 0 && t < n_classes` failed.
/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [828,0,0] Assertion `t >= 0 && t < n_classes` failed.
/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [829,0,0] Assertion `t >= 0 && t < n_classes` failed.
/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [830,0,0] Assertion `t >= 0 && t < n_classes` failed.
/var/scratch/pdas/torch/extra/cunn/lib/THCUNN/SpatialClassNLLCriterion.cu:38: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int) [with T = float, AccumT = float]: block: [3,0,0], thread: [831,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/var/scratch/pdas/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered
/var/scratch/pdas/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at /var/scratch/pdas/torch/extra/cutorch/lib/THC/generic/THCStorage.c:32
stack traceback:
        [C]: at 0x2aaab6564c20
        [C]: in function '__index'
        .../.luarocks/share/lua/5.1/nn/SpatialClassNLLCriterion.lua:51: in function 'updateOutput'
        /home/pdas/.luarocks/share/lua/5.1/nn/MultiCriterion.lua:21: in function 'forward'
        ./segCriterion.lua:156: in function 'forward'
        Train.lua:245: in main chunk
        [C]: in function 'dofile'
        ...pdas/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406540

The input to the criterion is 4x16x256x256 and the target is 4x256x256, where 4 is the batch size. The Spatial criterion isn't documented either so can't find any information regarding the error. Anyone knows how to fix this?

Thank you.

Morpheus

unread,
Mar 10, 2018, 1:54:24 PM3/10/18
to torch7
I managed to fix it. For anyone else having similar issues, it's because my targets for the Spatial loss had values in [0, 1]. The module expects class labels and so should be integers.

On a related note, if you are struggling to import spatial labels in an efficient way, best way I found is to dump them in csv format, then use the script at: https://github.com/locklin/torch-things/blob/master/csv2t7.sh
to convert the csv to a Torch's serialized format and then use Torch's diskfile interface to read directly into a tensor. Best solution so far for me. Kudos to Scott (https://github.com/locklin) for the wonderful script and http://fastml.com/loading-data-in-torch-is-a-mess/ for pointing it out. Truly torch "is most suitable for brave adventurers"
Reply all
Reply to author
Forward
0 new messages