To answer the general question: we shuffle the data, because caffe reads from an LMDB in a sequential way - in the same linear order each epoch. If you don't shuffle, your LMDB will likely contain large groups of very similar images; during training, first batches will all contain class 0, then the next batches class 1 and so on. The idea of SGD is that we mix examples within a batch so that each of them was an estimate of the whole dataset. A practical consequence to not shuffling might be getting your learning process stuck (it's not unlikely that your network will "learn" to recognize all images as the largest class in your dataset).