Hello! I am trying to train a caffe mobilenet-ssd network to detect three classes (objects) following this repo:
https://github.com/chuanqi305/MobileNet-SSD. Two of the classes are my own and the other one is the "person" class from VOC dataset. First, I would like to describe the dataset I'm using before I ask my questions: I'm using 440 images with 250-300 images for both of my objects (a lot of the images have both the objects), and all the images have similar surroundings i.e. taken in the same corner of the room. I am not using any images of person as I believe passing in the pre-trained weights for the VOC model trains the network to detect the person class. Given the nature of my dataset, here are my questions:
1) Is MobileNet a good fit for such a data type, or am I likely to get better results with other networks such as AlexNet or VGG?
2) If I'm using MobileNet, what sort of hyper-parameters should I use. as in train and test batch size, number of iterations etc. to get optimum accuracy and avoid under/over fitting? 75% of the dataset (330 images) are being used for trainval.
3) Once I'm done training, what is the best way to test accuracy? Not getting much after running the test.sh script in this repo
4) Do I still have to use person images in my dataset even if I'm passing in a caffemodel pre-trained to detect person as weights?
I'm new to all this so answers to any questions will be highly appreciated, but particularly questions 1 and 2!