Hello.
I recently read some papers about activation function as follows:
1)Self-Normalizing Neural Network
2)Investigative study of various activation functions for speech recognition
3)SEARCHING FOR ACTIVATION FUNCTIONS
some new activation function (such as: ELU, SELU, Swish) will have a better performance than relu. Has anyone tried them in speech recognition or kaldi ? I implemented the SELU activation function in kaldi,but the result is very bad in myself data set as follows(Especially when in chain model):
1) nnet3: relu:22.88% selu: 24.68%
2)chain: relu:22.41% selu: 39.46%