I believe it is related to a blog posted by fchollet. In that blog he described several way to have a network that works on a new dataset (different from the dataset previous model was trained on).
1. New simple network on new dataset.
2. Use existing model (VGG16) on a new dataset: via fine-tuning of the top layers only. # This is where he gets the weights #
3. Finetuning last Conv block and the top layers for better result # This is where he loads the weights #
While you have to go through the whole blog post to fully understand the process. I will summary it as follows (your case falls to 2&3 step):
1. You use VGG16(include top = false) to transform all of your images to arrays of features, save it into a file. (as explained in the blog and I have mentioned it before, it will save more time)
2. Create a small top model (with the structure you have mentioned). The input is the features before, the output is the classes (8 classes in your case). After finishing training, save the model.
3. Load the VGG16(top = false), load your top model (with its weights). Stitch them together. Freeze the model up to the last Conv block (layer1 to layer 15 or so, re-check this). Re-run the training again (this time, inputs are original images). The stitched model is your final model and the model that you need. Save it.
Good luck,