I want to use H2o Auto-encoders for anomaly detection.Initially I will not have labled data so It will be "Unsupervised mode".
But I might have few labled data records in future.So it will be like semi supervised.Should I use Autoencoders only in this case also? Looks like Autoencoder does not consider "Response Column"
It will be good If I can reuse same algorithm.
Please suggest.
Thanks,
Mahesh
Look like my data has lot of noise too?(few categorical and few continuous).There is de-noising feature in Autoencoders which is in h2o roadmap,Will that help?
As of now is there any other way to denoise that data?
My another concern is while dealing with multi-variate data,distance between points becomes very small and H2o(used in reproducible = false) mode, uses hogwild style multithreading,which supports intentional race conditions,Will it cause problem with accuracy?
Thanks,
Mahesh
My results vary very wildly If I use "reproducible=true"
Do you think it is due to excessive noise in data?
I am using following
_input_dropout_ratio = 0.2;
_activation = tanh
_l1 = 1e-4;
_l2 = 1e-5;
Also If I use dimension reduction techniques noise in data gets cancelled out?
Thanks,
Mahesh
"My results vary very wildly If I use "reproducible=false*"
False is necessary in real world with large data for performance reasons.
I guess above suggestion still holds true.
Thanks,
Mahesh