Training SSD on synthetic images

96 views
Skip to first unread message

Francisco Depascuali

unread,
Jun 16, 2017, 9:35:54 PM6/16/17
to Caffe Users
Hello!
I'm writing my thesis about synthetic dataset generation for training object detection. Right now I'm using SSD, a fork of caffe.

First of all, I created a mechanism to generate an arbitrary number of images by having a 3D model (Peugeot 406 in this case), a plane and a sky. 
Some of the following images are examples of it. I will call it v1.



With v1, I generated datasets of 50k~100k and it learned to detect cars in new synthetic images generated in the same way (but different from the training data). Unfortunately, it failed to classify cars in real images the most of the time.
I didn't think this to be overfitting, I just think that the network is now only able to classify synthetic images (those images are far from real).

My thinking then was, as SSD generated negatives from background, to just put a background image and the 3D model of the car and train it with that. So I did with v2:


The clear thing here is that I am not able to correctly know the orientation nor the position of the car (as the images are generated automatically).
I thought that SSD would use the positive labels for car and then sample negative ones from the background. Unfortunately, it doesn't learn nothing with this type of images.

So, my questions are:
  1. What I thought in v2 seems unfeasible. The explanation I find is that images should be as close to real images as it is possible, and this type of images are not. However, I would like to know more in scientifically why is this like this (if it is right)? 
  2. Having a plane and a sky means that the network expect always images with a plane and a sky? 
  3. I'm pretty sure that it doesn't, but is there any problem on the car always being on the center?
Thank you for reading!

Some more details:
detection_eval = 0.0943377
lot of iterations later...
detection_eval = 0.00699301
detection_eval = 0.00315732
detection_eval = 0.0909091
lot of iterations later...
detection_eval = 0.010101
detection_eval = 0.0454545
detection_eval = 0.00304136
lot of iterations later...
detection_eval = 0.00379705
detection_eval = 0.00699301
detection_eval = 0.00315732
detection_eval = 0.0909091
detection_eval = 0.00326311

50k iterations, for v2. The parameters are almost the same as ssd_pascal

It just learn to classify from peugeot 406 (two classes: background and peugeot). 
Reply all
Reply to author
Forward
0 new messages