HW1 data normalization

15 views
Skip to first unread message

cs15...@gmail.com

unread,
Jan 31, 2013, 4:40:58 PM1/31/13
to
Hello everybody,

There was a slight confusion about data normalization. Here are some points:
* only features should be normalized, not labels. Normalized labels would be wrong (unless you de-normalize after making predictions)
* testing data has to be normalized too, preferably by the same shift and scale coefficients used for training normalization (per feature). In practice, its better to normalize all data at once
* the normal equation error doesnt change with normalization. I get mean square error 1/2sum(pred-label)^2 about 11 or 12, for both training and testing
* the gradient descent and or decision tree might benefit from normalization in terms of learning time to converge/finish
* two possible normalizations: (A) shift and scale = each feature has values between 0 and 1; (B) normalization by standard deviation such that the variance becomes 1.

--virgil   
Reply all
Reply to author
Forward
0 new messages