Hi,
I have recently been training various different MNIST architectures but three things that have come under my attention.
1.
The first being that when I want to train the full MNIST dataset of 60,000 the first round is the exact size but the first iteration will bring up a bigger number to be trained, such as 67420.
I have been using the sample MNIST provided to test this.
I have used the following lines to complete the 60,000 train size, but none have been able to explicitly keep the training size to the right amount.
epoch_size = 60000
train_size = 60000
I have also used the 50,000 number of samples and it will give me back the same sort of results, such as 56780.
I am not sure if I am doing something wrong but these are the only parameters that I can see being changed for the training size/epoch length.
This being said, would this affect my network as well as is it due to other things going on in the network that the extra size/length is being added?
2.
Because of this extra number of training samples, where i have decided to have a diaghessian period of 60000 the number is used in between epoch and not at the end as where i want it.
hessian_period = 60000
3.
Lastly, i have been playing around with various architectures but i'm still not sure about hidden units as in classifier_hidden.
For classifier_hidden it is 16 in the sample.
Does that mean that for all the layers that there are 16 hidden units or only the layers indicated with that variable.
I would assume that the layers would have there own input/output based on the architecture.
I would like to have different numbers of hidden units throughout the layers.
....
Thank you, hopefully these are proper questions (not already asked), i did try to look through the forums for the correct answers.
Carson.