Hi dear Torch 7 users/developers,
What kind of weight initialization should nn.LookupTable use? I know the Gaussian weight initialization with the std involving fan-in/fan-out is popular for Linear layer or Convolutional layer, but a simple answer for that of nn.LookupTable cannot be found by a JFGI. Also, the default way of the lua code is to use a Gaussian with std = 1, does that make sense?
I understand this topic might slightly divert from pure Torch 7, but it would be great if any one who has applied Torch 7 to NLP could talk about the experience :) Any links to Torch 7 code for NLP is welcome either!