Pedestrian detection using KITTI Dataset

Jason Mecham

unread,

Aug 9, 2016, 4:26:50 PM8/9/16

to DIGITS Users

I wanted to share my results with Pedestrian detection using the KITTI dataset because my initial attempt at it produced some lousy results. Lousy results in that mAP(val), Precision(val), and Recall (val) all ended up being zero.

It turns out that the prepare_kitti_data.py (at least the one I used) splits the data in a way that makes the pedestrian detection impossible in terms of validation. I looked at the validation images, and I couldn't find any that were really useful for pedestrian detection. If I tested the model on one of the training images it would sometimes find the pedestrian. That confirmed that I had correctly set the custom class mappings in the dataset creation form.

So then I decided to go through the manual task of creating a Mini-KITTI dataset with just the pedestrian images. This consisted of 440 training images, and 151 validation images.

When I used this dataset I got the following results. I went out to an epoch of 120, but that wasn't really necessary.

mAP (val) = 52.1297

precision (val) = 72.1355

recall (val) = 67.6826

I'm not sure if those numbers are good or bad, but at least they're not zero. :-)

My next step is to reduce the dataset to see what the minimum dataset size really should be so I can test it on my own dataset. I'm also curious if it can detect the stig (the dude with the helmet on) in the pedestrian images. There is probably not enough images for that one.

Is anyone using a special tool to create the label file for an image? Where we can draw a box around the object we're interested in and then define what the box is of?

For a few images I can just manually create the label file, but I can't imagine creating it for hundreds of images.

Jason Mecham

unread,

Aug 10, 2016, 8:56:29 PM8/10/16

to DIGITS Users

I reduced the dataset to just 428 images of the pedestrian(s) in front of the blue building, but now I'm back to getting all 0's for mAP(val), Precision(val), and Recall (val).

Of the 420 images I have 320 of them for training, and 108 for validation.

I used the same settings for the model as the original post.

So I'm still at a loss of how to use on a small dataset. Maybe it just isn't meant for this. By removing the other pedestrian images I significantly reduced the number of pedestrians present. In front of the blue building it's typically only one pedestrian. Where the other pedestrian images are typically a bunch of pedestrians in an image. In the results page I'm not seeing anything to suggest that it's the sensitivity of the bbox. The network simply doesn't seem to be activating where the pedestrian is.

Jason Mecham

unread,

Aug 11, 2016, 4:27:09 PM8/11/16

to DIGITS Users

Update:

I ran it again going out to an epoch of 120, and now I'm getting good numbers for mAP(val), Precision(val), and Recall(val).

Here are the values

mAP (val) = 88.0556

Precision (val) = 92.5948

Recall (val) = 94.5216

Darren Eng

unread,

Aug 17, 2016, 4:25:06 PM8/17/16

to DIGITS Users

Would you mind sharing the list of images used from the KITTI dataset for this? I'm trying to do this myself but keep getting a NaN error. My original filter for the images was all the images whose labels contained "Pedestrian," which totaled 1600+ and may have been too diverse to be an effective training set.

Darren Eng

unread,

Aug 18, 2016, 8:04:25 PM8/18/16

to DIGITS Users

Did you need to tweak the learning rate or anything else besides the epochs to get a nonzero mAP for the Pedestrian images in front of the building?

Jason Mecham

unread,

Aug 18, 2016, 10:50:39 PM8/18/16

to DIGITS Users

On mine all I had to do is change the epoch to go out to 120. But, you should also try a really exaggerated amount. That does keep the learning rate at .0001 for awhile longer before bringing it down.

As long as you've set the database model for "dontcare,pedestrian" then it should work. My dataset only contains the pedestrians in front of the blue building.

Darren Eng

unread,

Aug 23, 2016, 7:30:54 PM8/23/16

to DIGITS Users

At which epoch did the mAP rise above 0? I'm around 60+ and still haven't seen it go past 0

Darren Eng

unread,

Aug 29, 2016, 6:06:30 PM8/29/16

to DIGITS Users

So for the longest time I've been trying to train a model to recognize a model with no luck. My mAP was 0 for each attempt no matter how many combinations of learning rate changes and epochs I tried.

I created a new dataset with the custom classes set as "dontcare,pedestrian" instead of "DontCare,Pedestrian" like I did before and suddenly my mAP is now above 0.

So it looks like DIGITS is case sensitive to the custom classes field and if you write in custom classes that include capital letters, it'll register differently and your detection won't work.

Luke Yeager

unread,

Aug 30, 2016, 1:44:42 PM8/30/16

to DIGITS Users

Fixed now. Thanks Darren!
https://github.com/NVIDIA/DIGITS/pull/1017

Ken M Erney

unread,

Sep 9, 2016, 8:21:12 AM9/9/16

to DIGITS Users

I loaded the KITTI dataset as described in the docs and was able to train a DetectNet model to detect the default classes (i.e. cars). After 30 Epochs, it has a mAP of about 65.5045 and was pretty constant from about epoch 15 to 30. It does okay on my own image examples...getting about 1/4 of the cars in the image detected.

I then loaded another copy of the KITTI dataset but specified "dontcare,pedestrian" in the custom classes. I setup to train another detect net model with the same specs as before (i.e. learning rate 0.0001, Adam, etc.) and started training. At 5 epochs, it still shows a mAP, precision, and recall of 0. How many epochs did it take before your mAP started to rise?

Ken M Erney

unread,

Sep 9, 2016, 9:21:04 AM9/9/16

to DIGITS Users

Update... When starting the training run, I also jumpstarted the network with the GoogLeNet caffe model mentioned here....

https://github.com/NVIDIA/DIGITS/blob/v4.0.0/examples/object-detection/README.md.

I am assuming this would only help?

the quoc

unread,

May 10, 2018, 6:25:07 PM5/10/18

to DIGITS Users

Hi Jason Mecham,

I also trying to train Pedestrian detection network, can you share your dataset of 420 images? Hope to see your reply

Vào 07:56:29 UTC+7 Thứ Năm, ngày 11 tháng 8 năm 2016, Jason Mecham đã viết:

Reply all

Reply to author

Forward