I am using a Caffe CNN to segment CT images by labeling each individual pixel as foreground or background. My features are image patches around the target pixel, with multiple channels representing different scales. In addition, after the convolutional layers I add in a few more hand-tailored features (distance from a known object, deviation from some expected value, etc.) that may be used by the fully-connected layers.
I am trying to figure out what (if any) data normalization to use. The values of each feature have real meaning, since CT pixel values correspond to the density of the material being imaged (water is 0, air is -1000, etc). I don't think I should zero-mean each example, because I would lose meaningful information about which patches are tissue, air, blood, tumor, etc. On the other hand, feature standardization doesn't make much sense either, because each feature is just a location relative to the target pixel. In practice, when I try feature standardization, despite using 500,000 examples to compute each feature mean and variance, I get very different means and variances for different random choices of training set, as well as for different patch features (even though all the features are just CT values of locations relative to a target pixel), and in fact my prediction results on a test set end up depending meaningfully on the exact means and variances computed for a specific training set.
Of course, the additional features I add in for the fully-connected layers are not CT values, and have typical scales of their own.
I am wondering how to do data normalization well for such a problem, and whether it is even needed for deep learning with modern NNs and optimizers, or whether I should feed the raw values directly into my CNN.
Any advice?
Thank you!
Tiferet