If you are new, I would recommend, as a first step, just skipping the training step altogether and just extracting pre-trained representations (features) from pool5, fc6 & fc7 (after fwd pass) and using them in 1 vs all SVMs (use scipy and this is a few lines of code). If the images are registered (parts of each image are in same spot in image), then pool5 will do better - f7 will do better where parts are not in the same place (pics of cats where cat head can be anywhere in image) - f6 will be in-between. The effectiveness of this Transfer Learning (training on one set and testing on another) takes many by *
surprise* even though (initially) it may seem a bit counterintuitive.
After you have tried this, then consider fine-tuning which can improve results but usually less than you might expect but this is highly dependent on how much training data you have.