Object Detection - Resizing, Padding and Labels

nerr...@gmail.com

unread,

Aug 3, 2016, 7:03:14 AM8/3/16

to DIGITS Users

Hi,

I am wondering whether the options to pad or resize images when creating a database for object detection also re-map the bounding box coordinates for the labels? I have managed to gain a decent accuracy using the KITTI training set described in the walkthrough (with the graph inconsistencies already flagged on github) but am having difficulty applying the same method to my own data. I have tried formatting the labels as:

(Car -1.00 -1 -1.00 433.00 273.00 613.00 383.00 -1.00 -1.00 -1.00 -1000.00 -1000.00 -1000.00 -10.00)

and

(Car 0.00 0 0.00 433.00 273.00 613.00 383.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00) .

Am I correct in thinking Digits only needs the object class and 4 bounding box coordinates?

My images are of the size (750,1000,3) with all bounding box edges in the range 80<x<200. So far I have not been able to achieve an maP > 0 for anything other than the set example.

A nice addition in future versions would be a visualisation of the DB within digits containing the bounding boxes, this would help ensure the labels had been formatted correctly. Any help or pointers on where I may be going wrong would be greatly appreciated.

Thanks,

Nerraw

Greg Heinrich

unread,

Aug 3, 2016, 7:18:44 AM8/3/16

to DIGITS Users

Hi,
Padding is added to the bottom and right borders of images therefore this does not change the bounding box coordinates.
Resizing does affect bounding box coordinates and DIGITS takes care of updating them.
Those fields are ignored by DetectNet: alpha, rotation_y, occluded, dimensions, location, score. The 'truncated' field is not ignored - 0 seems like a decent default value. I am not sure how '-1' will be dealt with. Better not try!
Indeed it would be nice to visualize bounding boxes in the DB.
Check this article to see if you can find useful information: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/digits-users/XZrBgI-CcEE/66sV71QoAwAJ

Regards.

Message has been deleted

nerr...@gmail.com

unread,

Aug 3, 2016, 9:33:05 AM8/3/16

to DIGITS Users

Hi Greg,

Thanks for your quick response. From your reply I gather that the label format (Car 0.00 0 0.00 433.00 273.00 613.00 383.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00) is valid? I have tested using this format and cannot get an maP > 0. I notice that regardless of the image size I use the below values in the "train_val.prototxt" remain as default:

layer {
name: "train_transform"
type: "DetectNetTransformation"
bottom: "data"
bottom: "label"
top: "transformed_data"
top: "transformed_label"
include {
phase: TRAIN
}
transform_param {
mean_value: 127.0
}
detectnet_groundtruth_param {
stride: 16
scale_cvg: 0.40000000596
gridbox_type: GRIDBOX_MIN
min_cvg_len: 20
coverage_type: RECTANGULAR
image_size_x: 1248
image_size_y: 384
obj_norm: true
crop_bboxes: true
object_class {
src: 1
dst: 0
}
}

Should these be changed manually before training and if so are there any other variables that are image size dependant? I also notice that if I use images of the size (750,1000,3) I cannot gain any activation during inference for "Coverage" but if I resize the images to (1500,2000,3) I do gain activations, although they are not correct (this may be due to training set size?). Is there a reason images of the size (750,1000,3) are incompatible? They should be large enough for the receptive field and I have ensured the bounding box is of the correct size.

The link to an article you sent appears to link back to this thread?

Thanks,

Nerraw

Greg Heinrich

unread,

Aug 3, 2016, 10:08:03 AM8/3/16

to DIGITS Users

Sorry, here is the correct link: https://github.com/NVIDIA/DIGITS/blob/481c6ec94ae4ae045136388ec47fbafc4981f3ce/examples/object-detection/Detectnet_parallelforall.docx.pdf

nerr...@gmail.com

unread,

Aug 3, 2016, 10:54:56 AM8/3/16

to DIGITS Users

That's great thank you, upon changing the crop size to suit the image I am now at the same stage as this thread: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/digits-users/pJVTNJVrCpo .

The network is most definitely learning to identify the correct object within the image but is not creating the bounding boxes. Should I find the reason for this I shall post it and otherwise will follow the above thread.