DIGITS Object Detection Labels and Dataset

Darren Eng

unread,

Jul 28, 2016, 4:13:02 PM7/28/16

to DIGITS Users

I was working on a trivial dataset and model for object detection to see if I could correctly prepare a dataset and model. The dataset I made just contains copies of the same image and the corresponding label. The label for the photo is written as shown below:

Car 0.00 0 0.0 470.00 199.00 877.00 414.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

In another post in this group, I read that only the label, occlusion, and 2D bounding boxes are necessary for DetectNet, so I supplied 0s for all the other values necessary. I trained the model using DetectNet and the Adam solver, however, when I tested the model on the same image used during training, no bounding boxes were drawn. I know that this example is trivial, but I would have thought that if I gave it a single image and label the model would at least be able to draw a bounding box for that single image. However, the model did not draw any boxes on the image.

Is there any reason why this wouldn't work? And if not, is there something wrong with the label I provided?

Greg Heinrich

unread,

Jul 28, 2016, 4:23:51 PM7/28/16

to DIGITS Users

Indeed one would expect the model to do well on a single image training set. Can you try with a slightly smaller object, say about 100 pixels in height and width? The receptive fields of DetectNet work best for objects of that size.
During training do you see mAP rise above 0? Can you post the dimensions of your labels on the dataset job page?

Darren Eng

unread,

Jul 28, 2016, 4:39:24 PM7/28/16

to DIGITS Users

Thanks, during training the mAP was flat at 0 the whole time, even through 100 epochs. Here's the label info from the dataset jobs page: There are 101 images (all copies) and the label shape is listed as (1,2,16).

I can also try this on a smaller object to see if it makes a difference

Greg Heinrich

unread,

Aug 1, 2016, 4:10:18 AM8/1/16

to DIGITS Users

Hello, also please refer to this article for more information on DetectNet:
https://github.com/NVIDIA/DIGITS/blob/481c6ec94ae4ae045136388ec47fbafc4981f3ce/examples/object-detection/Detectnet_parallelforall.docx.pdf

Darren Eng

unread,

Aug 1, 2016, 12:51:13 PM8/1/16

to DIGITS Users

The article seems to describe a label format for DetectNet that is different from the format used by the KITTI dataset. The article's label format says DIGITS uses a grid overlay on the image, and each row in a .txt file describes a square in the grid and whether or not it contains an object. The KITTI dataset label format, on the other hand, describes the pixel coordinates of the object in the image in a single row.

Should we use the KITTI label format or the format described in that article? Or is there a way to choose which format to use?

Greg Heinrich

unread,

Aug 1, 2016, 1:10:54 PM8/1/16

to Darren Eng, DIGITS Users

Hello, are you referring to Figure 2 in the article? This merely shows that DIGITS ingests data annotations in KITTI format and after various pre-processing steps (the last one being the DetectNetTransformation layer), labels are converted into grids of values. When working with DIGITS you should provide data in KITTI format as explained on https://github.com/NVIDIA/DIGITS/blob/v4.0.0/examples/object-detection/README.md

Regards.

--
You received this message because you are subscribed to the Google Groups "DIGITS Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digits-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/digits-users/59522aac-12a2-49ce-b422-326e0468fe8e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Darren Eng

unread,

Aug 1, 2016, 2:03:02 PM8/1/16

to DIGITS Users, darr...@gmail.com

Yes that was what I was referring to, and thank you for clearing it up. I thought that this was a new format to feed DIGITS.

I tried using a smaller object like you suggested earlier and unfortunately I did not get any bounding boxes detected on the new single image model either. Do you have any other suggestions for generating a model that works?

Greg Heinrich

unread,

Aug 1, 2016, 2:35:36 PM8/1/16

to Darren Eng, DIGITS Users

Hi Darren, I am at a loss to understand why your model isn't able to train on the single-image dataset. Can you share that image (and the corresponding label you ) so I try myself? Did you try the example from the walk-through?

To view this discussion on the web visit https://groups.google.com/d/msgid/digits-users/8b3625a2-5d76-4fac-bc38-82f28b589c14%40googlegroups.com.

Darren Eng

unread,

Aug 1, 2016, 5:52:32 PM8/1/16

to DIGITS Users, darr...@gmail.com

Thanks, I did try the walkthrough and I was able to get generate a model that detected the objects in the KITTI data. The two images I tried are shown above. I tried the bottom image first, and then I tried an image with a smaller object like you suggested in the image above. The images are taken from a VR image generator. I used the car class b/c that was easiest. The labels I tried are shown below:

Top image:

Car 0.00 0 0.00 518.00 323.00 714.00 403.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Bottom image:

Car 0.00 0 0.0 470.00 199.00 877.00 414.00 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0

thanks for your help!

Greg Heinrich

unread,

Aug 3, 2016, 8:04:33 AM8/3/16

to DIGITS Users, darr...@gmail.com

Hi Darren,
still having a looking at this in between two things. I created a dataset of (1 image,1 label) using your data (i.e. the smaller image). When I train DetectNet on this dataset I get pretty good loss curves however the mAP is stuck at 0:

When I perform inference on the image the coverage map is actually pretty good, you can see that the network has spotted the object right in the middle of the image, as indicated by the red blobs. However the bounding box regressor hasn't learnt anything (its weights are uniformly random (they should look like a Gaussian after successful training):

You could try providing more images to see if it gets any better. In the meantime I'll keep thinking about this.

Auto Generated Inline Image 1

Auto Generated Inline Image 2

Darren Eng

unread,

Aug 3, 2016, 5:56:21 PM8/3/16

to DIGITS Users, darr...@gmail.com

Thanks Greg, I'll keep looking through it too.

I tried to train a single image dataset using an image from the example KITTI dataset and I wasn't able to get that to work either. Is a large dataset necessary for DetectNet to work?

nerr...@gmail.com

unread,

Aug 4, 2016, 4:00:18 AM8/4/16

to DIGITS Users, darr...@gmail.com

Same result here.

Message has been deleted

sz Mousavi

unread,

Aug 24, 2016, 9:47:11 PM8/24/16

to DIGITS Users

Hi, I have a question about kitti format, for bounding boxes what does it mean by pixel coordinates?

Peter Neher

unread,

Aug 25, 2016, 5:23:12 AM8/25/16

to DIGITS Users

upper left corner (x,y) and lower right corner

sz Mousavi

unread,

Aug 25, 2016, 12:46:08 PM8/25/16

to DIGITS Users

Ok, then why is it fractional?! Greg Heinrich said (answering my post on the matter) that It might be because the images are resized, but if that's the case why isn't it mentioned anywhere? and again even if the images are resized I think the pixel coordinates should still be integers!

Peter Neher

unread,

Aug 29, 2016, 4:53:38 AM8/29/16

to DIGITS Users

I agree with Greg. Probably a resizing result. The probably didn't want to round coordinates after resizing or simply didn't care. Why does it matter?

a.m.l...@gmail.com

unread,

Aug 31, 2016, 8:48:44 AM8/31/16

to DIGITS Users

Hi! I have some questions about this training:

1.- You said the Bounding-Box coordinates in the label are upper left corner (x,y) and lower right corner, also in the guide to make the dataset they said the same, however, in your label:

Car 0.00 0 0.0 470.00 199.00 877.00 414.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

the upper left corner are (x,y)=(470.00 , 199.00) and lower right corner are (x,y) = (877.00, 414.00)...how is this possible??

2.- I have all the labels in .xml format, anybody knows any script to change xml to kitti format? Or maybe exist one tool to labeling in Kitti format directly?

Greg Heinrich

unread,

Aug 31, 2016, 10:24:45 AM8/31/16

to DIGITS Users

Hello, in computer graphics the origin of the coordinates system is typically the top left corner, unlike in traditional geometry where it is the bottom left. I don't know of a script to convert xml to kitti format. If you would like to contribute one it would be much appreciated. Thanks.

a.m.l...@gmail.com

unread,

Aug 31, 2016, 1:43:57 PM8/31/16

to DIGITS Users

Thanks! I'm still working in that! Once i finish, if working i post that!

Gonçalo Cruz

unread,

Sep 6, 2016, 5:17:12 AM9/6/16

to DIGITS Users

Hi guys,

I have read your previous posts and I think that I have a similar problem.
My status is the following:

I tried to train the Kitti dataset and was successful.
I tried to train a subset of kitti. Loss decreased but precision, recall and mAP stay at zero even after 300 epochs.
I tried to train using my own dataset and happened the same as I described in 2.

Has anyone been successful on training a subset of kitti for instance? If so, what was the number of images and the learning rate that you have used?

Best regards,

Gonçalo Cruz

Darren Eng

unread,

Sep 6, 2016, 1:18:27 PM9/6/16

to DIGITS Users

I've been able to train a subset of the KITTI dataset for pedestrian detection. I used about 390 images but I've gotten it to work on as few as 21 images. I used the same learning rate from the tutorial (0.0001 and step down every 33% but I've also used exponential decay).

If you're trying to train it for something other than cars than you need to make sure you write dontcare, classname under Custom classes on the New Object Detection Dataset page, where classname is the name of the class you are trying to detect (eg. pedestrian). Also make sure that both dontcare, classname are both lower cased. I ran into these pitfalls when I tried to train a subset of KITTI and was able to successfully train a subset when I fixed the class names.

Gonçalo Cruz

unread,

Sep 6, 2016, 5:37:21 PM9/6/16

to DIGITS Users

Thanks for the help.

I have have started to train a subset and I already get precision, recall and mAP values increasing.

Out of curiosity, can I use some other class that is not listed on the Kitti dataset (and train from the scratch). If so, are there any special steps to be taken or should I just list the classes and have the label files with the correct label?

Greg Heinrich

unread,

Sep 6, 2016, 6:52:58 PM9/6/16

to DIGITS Users

Hello, you can work with custom classes. Have a look at this doc:
https://github.com/NVIDIA/DIGITS/tree/v4.0.0/digits/extensions/data/objectDetection#custom-class-mappings

Gonçalo Cruz

unread,

Sep 7, 2016, 6:00:32 AM9/7/16

to DIGITS Users

Thanks Greg,

Just to clarify, lets say that I wanted to detect cars, trees ans dog and assume that the label files have these classes.
I have read that class 1 is treated as a special case and lets assume that in my case, I really want to detect dogs.
I guess that in this case, in the "Custom Classes" I should enter:
dontcar, dog, car, tree
Am I right?
If on the other hand I have more classes than are listed on the documentation, will it still work? Or do I need to change the train_val.prototxt?

Thanks for the help.

Best regards,

Greg Heinrich

unread,

Sep 7, 2016, 6:14:17 AM9/7/16

to DIGITS Users

> I guess that in this case, in the "Custom Classes" I should enter:
> dontcar, dog, car, tree

Yes, you can do this. Alternatively if you prefer, in the net description you can change the ID of the class you wish to detect - see these two lines (change 'src' and leave 'dst' unchanged):
https://github.com/NVIDIA/caffe/blob/aff10a010299caf2b7535abd827ad8b72d35a4f4/examples/kitti/detectnet_network.prototxt#L82
https://github.com/NVIDIA/caffe/blob/aff10a010299caf2b7535abd827ad8b72d35a4f4/examples/kitti/detectnet_network.prototxt#L121

If in your label files you have classes that are not listed in the custom classes then the corresponding objects will all be assigned to class 'dontcare' with ID=0, which I believe is what you want. Similarly if you don't specify custom classes then the classes that are not listed in the documentation will be assigned to 'dontcare'.

Gonçalo Cruz

unread,

Sep 7, 2016, 5:06:13 PM9/7/16

to DIGITS Users

Thanks once again.
Completely clarified now.

Best regards

任佑顏

unread,

Oct 2, 2016, 3:01:00 AM10/2/16

to DIGITS Users

Hi,
I have a question with
How can you run this line
Is this use digits?

Car 0.00 0 0.0 470.00 199.00 877.00 414.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

And how to use "digits Download detection modes" to detection a image without a use digits web GUI
Thank you

Greg Heinrich

unread,

Oct 3, 2016, 4:49:48 AM10/3/16

to DIGITS Users

Hello, can you find answers to your questions in this docs:
https://github.com/NVIDIA/DIGITS/blob/a6c74bdbd662fdf763badf969a761df85d384cdc/digits/extensions/data/objectDetection/README.md
https://github.com/NVIDIA/DIGITS/blob/a6c74bdbd662fdf763badf969a761df85d384cdc/docs/API.md

Regards.

Jacob Abraszek

unread,

Oct 4, 2016, 10:11:16 AM10/4/16

to DIGITS Users

Can you post up one of your images and the label file? We are running into issues with mAP staying at 0. 1322 images in our data set and 180 in the validation set.

Reply all

Reply to author

Forward