Training SSD on synthetic images

96 views

Skip to first unread message

Francisco Depascuali

unread,

Jun 16, 2017, 9:35:54 PM6/16/17

to Caffe Users

Hello!
I'm writing my thesis about synthetic dataset generation for training object detection. Right now I'm using SSD, a fork of caffe.

First of all, I created a mechanism to generate an arbitrary number of images by having a 3D model (Peugeot 406 in this case), a plane and a sky.
Some of the following images are examples of it. I will call it v1.

With v1, I generated datasets of 50k~100k and it learned to detect cars in new synthetic images generated in the same way (but different from the training data). Unfortunately, it failed to classify cars in real images the most of the time.
I didn't think this to be overfitting, I just think that the network is now only able to classify synthetic images (those images are far from real).

My thinking then was, as SSD generated negatives from background, to just put a background image and the 3D model of the car and train it with that. So I did with v2:

The clear thing here is that I am not able to correctly know the orientation nor the position of the car (as the images are generated automatically).
I thought that SSD would use the positive labels for car and then sample negative ones from the background. Unfortunately, it doesn't learn nothing with this type of images.

So, my questions are:

What I thought in v2 seems unfeasible. The explanation I find is that images should be as close to real images as it is possible, and this type of images are not. However, I would like to know more in scientifically why is this like this (if it is right)?
Having a plane and a sky means that the network expect always images with a plane and a sky?
I'm pretty sure that it doesn't, but is there any problem on the car always being on the center?

Thank you for reading!

Some more details:

detection_eval = 0.0943377

lot of iterations later...

detection_eval = 0.00699301

detection_eval = 0.00315732

detection_eval = 0.0909091

lot of iterations later...

detection_eval = 0.010101

detection_eval = 0.0454545

detection_eval = 0.00304136

lot of iterations later...

detection_eval = 0.00379705

detection_eval = 0.00699301

detection_eval = 0.00315732

detection_eval = 0.0909091

detection_eval = 0.00326311

50k iterations, for v2. The parameters are almost the same as ssd_pascal.

It just learn to classify from peugeot 406 (two classes: background and peugeot).

Reply all

Reply to author

Forward

0 new messages