Still, I'm puzzled by an issue that seems to be related to random initilisation.
I've two different NN implementations, 1 with Breeze and one with NumPy.
When using the exact same initialisation parameters I get the same
cost after each training iteration from each implementation. So, based on this
I'd infer that the implementations work equivalently.
However, the results look very different when using random initialisation.
With respect to exact cost this is of course expected, but what I find troublesome
is that after N training iterations the cost starts approaching zero with the NumPy
code (most of of the time), whereas with the Breeze based implementation cost fails
to converge (most of the time).
With NumPy I'm simply using the following random initialisation code:
np.random.randn(n_h, n_x) * 0.01
I'm trying to emulate the same behaviour in my Scala code by sampling from a
Gaussian distribution with mean = 0 and std dev = 1 (then multiply with 0.01) as follows:
val RandSampler = breeze.stats.distributions.Rand.gaussian
DenseMatrix.rand[Double](d1, d2, RandSampler) * 0.01
Any ideas why the initialisation seem to work differently?