Using the BVLC reference AlexNet file, I have been training a CNN against a training set I created. In order to measure the progress of training, I have been using a rough method to approximate the accuracy against the training data. My batch size on the test net is 256. I have ~4500 images. I perform 17 (4500/256) calls to solver.test_nets[0].forward() and record the value of solver.test_nets[0].blobs['accuracy'].data (the accuracy of that forward pass). I take the average across these. My thought was that I was taking 17 random samples of 256 from my validation set and getting the accuracy of these random samplings. I would expect this to closely approximate the true accuracy against the entire set. However, I later went back and wrote a script to go through each item in my LMDB and generate a confusion matrix for my entire test set. I discovered that the true accuracy of my model was significantly lower than the estimated accuracy. For example, my expected accuracy of ~75% dropped to ~50% true accuracy. This is a far worse result than I was expecting.
Have I made an incorrect assumption somewhere? What could account for the difference. I had assumed that forward() function gathered a random sample, but I'm not so sure that was the case. blobs.['accuracy'].data returned a different result (though usually within a small range) everytime, so this is why I assumed this.