I've noticed that people often use transform_param to scale the input to [0, 1].
# common data transformations
transform_param {
# feature scaling coefficient: this maps the [0, 255] MNIST data to [0, 1]
scale: 0.00390625
}
Is there any advantage to scale the input range to [0, 1]?
It seems that this scaling strategy is not limited to deep learning, but can be applied to all the fields of machine learning. But, I can't find any reasonable reason.
Thanks for your reply in advance.