I implemented a simple rnn using math provided by dynet and compaired its performance with the SimpleRNNBuilder. Non-builder implementation accuracy is remarkably smaller than the builder one on the same dataset, but I don't understand why.
Don't you have any ideas on wy could this happen?
Below are some meaning pieces of code. I process data with minibatching, updating weights after each minibatch processed. After each minibatch processed i call dy.update_cg(). The full code of models can be found on my gh (non-builder and builder implementations).
self._rnn = dy.SimpleRNNBuilder(self.LAYERS, self._input_dim, self._hidden_dim, self._model) self._rnn.disable_dropout() self._W = self._model.add_parameters((self._output_dim, self._hidden_dim), init=dy.GlorotInitializer())
self._trainer = dy.MomentumSGDTrainer(self._model, learning_rate=self._learning_rate) ... def _forward(self, input_vector): self._input_l = self._input_l.add_input(dy.inputVector(input_vector)) hidden_layer_output = self._input_l.output() self._context_state_vector = hidden_layer_output.npvalue() output_values = self._W * hidden_layer_output return output_values ... pred = dy.softmax(self._forward(input_vector)) loss = -dy.log(dy.pick(pred, target_vector.index(1))) ... loss.backward() self._trainer.update()
self._V = self._model.add_parameters((self._hidden_dim, self._input_dim + self._hidden_dim), init=dy.GlorotInitializer())
self._b = self._model.add_parameters((self._hidden_dim), init=dy.ConstInitializer(0)) self._W = self._model.add_parameters((self._output_dim, self._hidden_dim), init=dy.GlorotInitializer())
self._trainer = dy.MomentumSGDTrainer(self._model, learning_rate=self._learning_rate) ... self._input_l = dy.vecInput(self._input_dim + self._hidden_dim) self._context_l = dy.tanh(self._V * self._input_l + self._b) self._output_l = self._W * self._context_l ... def _forward(self, input_vector): input_vector.extend(self._context_state_vector) self._input_l.set(input_vector) output_values = self._output_l self._context_state_vector = self._context_l.npvalue() return output_values ... pred = dy.softmax(self._forward(input_vector)) loss = -dy.log(dy.pick(pred, target_vector.index(1))) ... loss.backward() self._trainer.update()