Data normalization

Tiferet Gazit

unread,

Jan 28, 2016, 10:30:28 AM1/28/16

to Caffe Users

I am using a Caffe CNN to segment CT images by labeling each individual pixel as foreground or background. My features are image patches around the target pixel, with multiple channels representing different scales. In addition, after the convolutional layers I add in a few more hand-tailored features (distance from a known object, deviation from some expected value, etc.) that may be used by the fully-connected layers.

I am trying to figure out what (if any) data normalization to use. The values of each feature have real meaning, since CT pixel values correspond to the density of the material being imaged (water is 0, air is -1000, etc). I don't think I should zero-mean each example, because I would lose meaningful information about which patches are tissue, air, blood, tumor, etc. On the other hand, feature standardization doesn't make much sense either, because each feature is just a location relative to the target pixel. In practice, when I try feature standardization, despite using 500,000 examples to compute each feature mean and variance, I get very different means and variances for different random choices of training set, as well as for different patch features (even though all the features are just CT values of locations relative to a target pixel), and in fact my prediction results on a test set end up depending meaningfully on the exact means and variances computed for a specific training set.

Of course, the additional features I add in for the fully-connected layers are not CT values, and have typical scales of their own.

I am wondering how to do data normalization well for such a problem, and whether it is even needed for deep learning with modern NNs and optimizers, or whether I should feed the raw values directly into my CNN.

Any advice?

Thank you!

Tiferet

Eli Gibson

unread,

Feb 1, 2016, 11:47:25 AM2/1/16

to Caffe Users

I am not sure what data normalization is optimal; however, I don't think I would feed the raw values to the CNN directly. One of the important reasons to normalize data is to ensure that the output of the first layer falls on an 'interesting' part of the non-linear layer (where the gradients are not infinitesimal). At the very least, I would scale and shift all the HU by the same factor, to get it into a range appropriate for the non-linearity you are using. Although Hounsfield Units have real meaning, the scale is arbitrary; if you scale it and shift it by a constant, it remains just as meaningful.

I would consider either pre-scaling/shifting your CT data, or initializing your first convolution weights accordingly. The latter might be preferable, since your final network will not need to pre-scale/shift the input.

Tiferet Gazit

unread,

Feb 2, 2016, 7:39:56 AM2/2/16

to Caffe Users

I think that's an excellent idea, but I'm not sure how to select a shifting and scaling factor that will get the data into a range appropriate for the non-linearity I am using. You are right that it would be even better to initialize the weights to compensate for the scales, because testing run-time is crucial. How do I do that?

Reply all

Reply to author

Forward