SVM & Object detection

534 views
Skip to first unread message

Safeii

unread,
Apr 19, 2012, 3:51:04 PM4/19/12
to accor...@googlegroups.com
Hello :)
first at all , I wanna tell you that I'm a biggest fan of Aforge.Net and Accord.Net libraries and I always use them in my projects .
Actually I wanted to ask you about my graduation project , it's about "Image understanding" , and the first step is "object detection"
I took a look on Surf sample and SVMs sample also .
I wonder how can I save the features I get from "SURF" technice in the excel file ! because the feature ( i.e. interest point ) is a record contains some fields and a array of float (Descriptor)
and in the SVMs sample all the columns are just a vlaue , no record contains a list , for example in the XOR classification the input (one feature) is (x =0 , y=0 , G=1) but here I have a record that contains the next fields : ( Laplacian  : int ,    Orientation  : float    , Response  : float     ,Scale : float ,    X  :  float , Y   : float  , Descriptor   : [] float   )                 and  I'm not sure but if I want to make like the XOR problem I should add a field of the object name .
I found that the SVM takes a matrix , I can't figure out how can I make this compatibility between these two samples , I hope my question is clear :$

Thank you in advance
waiting for your  feedback
Safeii

                   


César

unread,
May 5, 2012, 12:15:23 PM5/5/12
to accor...@googlegroups.com
Hi Safeii,

I am glad you found the framework useful. I hope it can become increasingly useful in your projects. I haven't replied earlier because I, myself, am not very sure which is the best approach to use interest points as features in classification algorithms. An approach I have heard of is to use a bag-of-visual-words, but I am not sure how to use the SURF descriptors with that. 

If you find an answer, could you please share it with us? I am sure many people will also find it interesting.

Best regards,
Cesar


Safeii

unread,
May 6, 2012, 3:29:21 PM5/6/12
to accor...@googlegroups.com
Hi Cesar
sure I will , Actually I asked about this problem and I was told that I have to use k-means clustering for the features descriptors and then we take the center of every cluster as a presenter of many points that belongs to this cluster itself
and I'm stacked here now , I'm reading about bag of worlds also :)
seems it will help
thank you so much :D

best regards
Safi ;)

César

unread,
May 6, 2012, 8:02:37 PM5/6/12
to accor...@googlegroups.com
Hi Safeii,

Indeed, I found out that the bag-of-words is also typically computed using K-Means. I think that what you could do is:

1. Specify a number of possible clusters, let's say, 10.
 
2. Run K-Means with k=10 in all feature points collected from all images. You will obtain 10 centroids, or clusters. Let each of those clusters represent a label.
 
3. Now, for each image, classify all its feature points using K-Means. You will end up with a set of labels for each image. Use those labels to form a feature vector with 10 positions, each of them 1 or 0, in which 1 indicates the presence of a label in the image, and 0 indicates its absence. This will effectively map each image to a feature vector of fixed length (by the way, perhaps you could, instead of 0 and 1, use the number of occurrences for each feature).
 
4. Now, run SVM on those fixed length vectors.


I think this should do it, but you may have to use some more specialized kernels, such as the chi-square kernel. It is available in the framework as well, but it hasn't been tested as thoroughly as the other kernels in a real-world applications (yet). K-Means is also available, please check the documentation for some examples on how to use it.

I hope it helps!

Best regards,
César
Message has been deleted

Safeii

unread,
May 9, 2012, 2:30:17 PM5/9/12
to accor...@googlegroups.com

Hi Mr.César

I’ve did what you told me on two classes (cat & flower) but I’m getting the same result after training the machine with svm

I used 10 images for each category , and when I test it on one of the training examples it gave a right result (0 or 1 ) but with external image it gives false result  (always 0)

I don't know what's wrong !! :(

Best regards ,

Safii

César

unread,
May 9, 2012, 5:43:23 PM5/9/12
to accor...@googlegroups.com
Hi Safeii,

What parameters are you using to learn your machine? Could you send some example code of what you have done?

I am also working on an out-of-the-box Bag Of Visual Words to incorporate in Accord.NET, but I can not offer an estimation on when it will be available.

Best regards,
Cesar

Safeii

unread,
May 10, 2012, 9:25:36 AM5/10/12
to accor...@googlegroups.com
Hii Cesar
I've wrote some notes in the text file , read it before you run it
my email : safaa.a...@gmail.com
we can talk if you want
safii
Image Understanding---- for cezar .rar
notes to run it.txt

Safeii

unread,
May 16, 2012, 2:25:54 PM5/16/12
to accor...@googlegroups.com
Hello again
it has been a long time since I sent you the files :$ , Is there anything wrong ?
Safii

César

unread,
May 16, 2012, 5:27:38 PM5/16/12
to accor...@googlegroups.com
My apologies, I didn't notice your last post, with the attached files! I only saw it about 20 minutes ago.

I just took a quick look on your code and you are right, the code was a mess! :-D

But do not worry, I was able to get it... It was just the matter of choosing better values for C and Sigma. For those cases, the framework provides some heuristic values. I have replaced your line

            kernel2 = new Gaussian(6.22);

with

            DoubleRange r;
            kernel2 = Gaussian.Estimate(pixels, pixels.Length, out r);

This will ensure the sigma parameter of your Gaussian kernel starts at a nicer value. Typically, good values for sigma will be between the value range given as the output parameter "r". Now, to get a suitable value for C, try using

                smo.Complexity = SequentialMinimalOptimization.EstimateComplexity(kernel2, pixels);

Using those, I was able to correctly classify your test images with different labels (unless I did some mistake in testing, which is also possible). I have updated your sources with those modifications and inserted some code to test your test images. I am sending the code attached.

Again, sorry for the late reply. I hope those suggestions can help!

Best regards,
Cesar
Image Understanding.rar

César

unread,
May 28, 2012, 10:57:07 PM5/28/12
to accor...@googlegroups.com
Hi Safeii,

Did it work? In case you need further help, please ask.

By the way, I am planning on including a sample application in the framework to demonstrate how to use the Visual Bag of Words to achieve image classification. Would you mind if I use some of the flower and kitten images you have sent me as an example in the application?

Best regards,
César

Safeii

unread,
May 29, 2012, 7:03:27 AM5/29/12
to accor...@googlegroups.com
Hello Mr.César
so thanks for the reply :)
anyway , sure you can use them and I have sets of other object sets If you want I can send them to you
I separate the object from the background in each image to make it easier to the SURF algorithm to detect features

until now I did the object detection by the following steps :

outline of my model : Feature extraction -> Feature analysis or  Feature Selection  -> Machine learning
Train : Apply SURF on image sets of m'/any objects -> PCA -> SVM
Test : SURF -> PCA -> trained SVM -> class number

I trained just 9 objects until now and the error ratio is between 0.12 - 0.18
but when I ask my professor he said that I should use another feature selection method instead of the PCA
to reduce the dimension of the features and make it united .
because it's output always just one vector and I can't deal with the all sizes of images with just one vector .

this is my first problem , now I'm facing another challenge , I want my project to detect more than one object in the image
so I have to do Image segmentation step and get every single object in the image in a separated image and then apply the second step that I had worked on it so far .
want your help in it !! do you know an algorithm suitable with what I need , just give it an image and it gives me many images , in each there's an object ?

waiting for your feed back so soon , I have to deliver it the middle of July :(
and I have my final exams now
best regards
Safii

César

unread,
May 29, 2012, 9:13:52 AM5/29/12
to accor...@googlegroups.com
Hi Safeii,

To achieve this kind of segmentation I suppose it would be necessary to run a scaling window through the entire image and attempt detection on each of the sub-images. Basically, it requires considering a region of the image at a time and giving this region to the classifier. If the classifier detects something, then you have your object, otherwise you just keep running and scaling the window until there is a match (or the image is over).

By the way, I noticed you didn't see my last reply. I had answered it about two weeks ago, but I know sometimes it is hard to notice those new posts on Google Groups. I myself didn't notice yours. I hope the code I sent you does work. I particularly found the Chi-Square kernel to perform well on the problem.

Safeii

unread,
Jun 22, 2012, 9:15:47 PM6/22/12
to accor...@googlegroups.com
Hi Mr.César
Well I'm so so sorry for being late ! 
You know exams time ! o.O
anyway I've seen what you've done with my code , thank you so much for your help
I want to know if this code is suitable to more than two objects
It has been a long time since I wrote it , and now I'm so lost :(
the number of clusters ! would it be fine with 15 objects !!
ksvm = new MulticlassSupportVectorMachine(10, kernel2, 2);

and in this line : 
MulticlassSupportVectorLearning ml = new MulticlassSupportVectorLearning(ksvm, pixels, outputii);
what outputii stands for ? :S
I found that I had initialized it with the following :

            int[] outputii = new int[20];
            for (int i = 0; i < 10; i++)
            {
                outputii[i] = 0;
            }

            for (int i = 10; i < 20; i++)
            {
                outputii[i] = 1;
            }
WHY !!!? :$

waiting for your replay :$ 
very best regards :))
Safii

César

unread,
Jun 22, 2012, 9:58:44 PM6/22/12
to accor...@googlegroups.com
Hi Safeii!

Haha no problem, I know how tense those exams can be. Anyway, the outputii array stores the label for each of the images in your training set. If I can remember correctly you have set the first 10 labels as 0 and the labels from 10 to 20 to indicate that the first 10 images belong to class 0 (the flowers) and that the last 10 images belong to class 1 (the kittens).

If you would like to add more possible classes (let's say, flowers, kittens and dragons) then you can increase the number you had highlighted in green (2) to (3) and then initialize your outputii vector with values 0, 1, and 2. For example, if your first 10 images are from flowers, the next 10 are from kittens and the next 30 are from dragons, you must initialize the first 10 elements of your outputii vector with 0, the next 10 with 1 and the next 10 with 2.

I hope I could answer your question!

Best regards,
Cesar

Safeii

unread,
Jun 23, 2012, 4:37:01 AM6/23/12
to accor...@googlegroups.com
Hello again !!
I've never seen such a kind person like you :$ :D
I'm so so thankful :)) 
I get what you say and I tried to add a new object " birds " but the output is zero not 2 !  even on the training samples it gives a wrong answer !
could it be related to the number of clusters ? should I increase it when I increase the number of objects ? 
what do you so suggest ?

regards 
Safii :)

César

unread,
Jun 23, 2012, 5:09:08 AM6/23/12
to accor...@googlegroups.com
Send me the code again so I can take a look.

Regards,
Cesar
Message has been deleted

Safeii

unread,
Jun 24, 2012, 8:48:15 AM6/24/12
to accor...@googlegroups.com
Hello Mr.cezar 
I've tried to increase the data trainig set of the "cat" from 10 to 20 , same as "fish " and the clusters from 10 to 15 , the error is still big and the result always Fail :(
you can see that I've cleaned the code a little ! :$
regards 
Safii

project- 3objects.rar

Safeii

unread,
Jun 25, 2012, 8:30:51 PM6/25/12
to accor...@googlegroups.com
Hello , I wonder If you saw my last reply :$ :$ 
I'm still waiting for your suggestions
thanks

César

unread,
Jun 25, 2012, 10:15:14 PM6/25/12
to accor...@googlegroups.com
Hi Safeei,

No, actually I didn't notice your reply. I am not sure why I am not getting those updates... Sorry about it.

Anyways, I have created a sample application which I am planning to include with the rest of the framework. This sample performs image classification as you are trying to do. I have loaded the sample images you sent me and it seems to be working rather well. Please take a look on it to see if it helps.

To test it, first click on the button "Compute bag-of-words". It should take a while. Then click on the button "Start training". After the training finishes, images which have been correctly recognized will be marked as green. Images marked in red denotes a misclassification.

I hope it helps!

Best regards,
Cesar


Classification.rar
Message has been deleted

Safeii

unread,
Jun 26, 2012, 2:57:31 PM6/26/12
to accor...@googlegroups.com
Wow !! Actually I'm so impressed !! :D 
This code is like magic ! I mean I've debugged the code and it's s close to the one I was working on - for the steps not the mess :P - and with three lines everything is done !! :D
I've tried to add one more category  "Birds" and it worked well , I've added error label on the GUI and got 0.1 error for 4 objects , and I've tried to let it train on 12 category and got 0.9 error value !! and all the objects were classified wrong :(( 
I've two questions , the first one is about my old code , remember when I added the third category everything went wrong , it worked just for two category . what was wrong with it ?
The second question is about yours , it should take unlimited number of categories , shouldn't it ? I don't know what's wrong and why it fails to classify well but I think it needs another kernel maybe .. and it classifies the training set images , what about classifying a new image from outside the training set ? have you tried it ? I'll try it now :) 
P.S: I've uploaded the training sets of the categories , you can try them instantly :))

thanks
best regards 
Safii
Birds.7z
computers.7z
trees.7z
plates.7z
men.7z
guns.7z
fork+spoon.7z
cups.7z
chairs.7z
Message has been deleted
Message has been deleted
Message has been deleted

Safeii

unread,
Jun 27, 2012, 5:48:13 AM6/27/12
to accor...@googlegroups.com
Hello again :$
another question , what does number of clusters refer to ? I mean you put it " 8 " why ? 
I hope you are getting these updates from this group :)
regards 
Safii

Safeii

unread,
Jun 27, 2012, 10:21:30 PM6/27/12
to accor...@googlegroups.com
Hello , take a look on the execution , press open and select a new image outside the training set and see , results are so bad for new images :(
my regards 
Debug1.rar

Safeii

unread,
Jun 28, 2012, 12:45:38 PM6/28/12
to accor...@googlegroups.com
Hello
it's over-fitting problem I'm enlarging the data training set now and I'll tell you if it will work :)
regards

César

unread,
Jun 28, 2012, 12:49:56 PM6/28/12
to accor...@googlegroups.com
Hi Safeii,

I am receiving your updates but I couldn't take a look on it yet... Sorry about it. I will most likely be able to test it in the weekend. If the problem is overfitting, try choosing a larger value for C.

Best regards,
Cesar

Safeii

unread,
Jun 28, 2012, 12:54:09 PM6/28/12
to accor...@googlegroups.com
never mind , it's okay :)) I will do that .
thanks so  much 
I wish u best of luck in your work :))
byeee
Message has been deleted

Safeii

unread,
Jun 30, 2012, 1:18:37 PM6/30/12
to accor...@googlegroups.com

Hello :)
I tried to enlarge the training set and change the parameters of the Gaussian kernel , and neither of those solutions did worked  !! :(
It's incredibly over-fitting problem !!! I'm trying to solve it now !
anyway , I just wanted to let you know my last updates .
I found this awesome data set on the net :D take a look
http://pascallin.ecs.soton.ac.uk/challenges/VOC/download/101objects.tar.gz
regards 

César

unread,
Jun 30, 2012, 4:07:15 PM6/30/12
to accor...@googlegroups.com
Hi Safeii,

I think you are putting too much hope in this classification method. What we are doing is very very simple, and it may work for a few classes, but I do not think it will work reliably with an exceeding number of classes. The problem is that the images needs additional preprocessing, such as normalizing, cropping, centering, etc. For instance, the regions of interest in the images are not even cropped before we are staring the learning procedure. Some images even include text copyright notices!

If you are using this in academic research, you can just state that the method you are evaluating may need additional preprocessing before it can work  properly, and that you are assuming the images are given already preprocessed to your system. This should make things fair easier. There is nothing wrong in taking some simplifying assumptions.

And by the way, you have too few samples for each of the classes. Try diminishing the number of classes (such as considering only 3 or 4) and increasing the number of samples (such as >50 for each class).

Best regards,
Cesar

César

unread,
Jun 30, 2012, 4:13:48 PM6/30/12
to accor...@googlegroups.com
By the way, I think I discovered a bug in the SURF detector... I am investigating it. In the meantime, you could use the OpenSURF C# implementation available in  http://www.chrisevansdev.com/computer-vision-opensurf.html - it is very similar to the Accord.NET version and should not be too difficult to use instead.

Please see if you can obtain better results with it, while I work on fixing the one in Accord.NET.

Regards,
Cesar

Safeii

unread,
Jun 30, 2012, 5:28:16 PM6/30/12
to accor...@googlegroups.com
I'm not using now the data I had attached to you here , actually I'm trying to apply this classifications on 4 classes each one contains 50-67 images - from the link I sent to you -  and still not working , ok , I'll see it now , can you send me the papers or links you Relied on in implementing  BOW ?
I couldn't find any paper that explained the mathematical assertions of the regions that it takes before the surf descriptors level , all what I found was this presentation by the original author of the BOW !! 
you know what ! I'm so depressed :(
I don't know what to do , and my interview is on 11\7 
by the way , this classification is just a part from my project but it's the main core ,
I'd like to thank you so much for your help :)
my best regards
Safii 

César

unread,
Jul 1, 2012, 12:13:53 AM7/1/12
to accor...@googlegroups.com
Hi Safeii,

There was a bug, but it was not in the SURF implementation. The bug was in the sample application, the part in which it shows the feature points in the images. It didn't affect any of the computations.

Your interview is to get accepted on some academic program? If that is the case, I don't think need to have everything up and running upfront... Just some comprehension and understanding of the project.

I based the BoW implementation on what I remembered from my classes on Machine Learning. Wikipedia also has some explanation for the key ideas behind it, but I could agree that it lacks some details. Perhaps you could try searching for "bag of visual words kmeans", instead of only bag of words, to avoid confusion with the BoW model commonly found in NLP. By the way, a very similar project can be found here, please take a look on it to see if it is useful to you. The author seems to be using a very large vocabulary (up to 1000 words). My sample application only uses 8 by default. You could try increasing this number to see if performance improves.

Another thing to try is to use extended SURF features. Those can be enabled by setting

bow.Surf.ComputeDescriptors = SpeededUpRobustFeatureDescriptorType.Extended;

on the sample application, right before the bow is created. Another thing is to use a Pyramid Match kernel, but I haven't implemented this one (yet). 


Best regards,
Cesar

César

unread,
Jul 1, 2012, 12:31:19 AM7/1/12
to accor...@googlegroups.com
By the way again, if you use a large number of clusters, try to use the linear kernel first (set the kernel to polynomial and set the degree to 1). With a large number of words, sometimes a linear kernel gives surprisingly good results.

César

unread,
Jul 1, 2012, 9:28:27 PM7/1/12
to accor...@googlegroups.com
Hi Safeii,

I extended the sample application to work with the images you attached. I also created training and testing sets by taking 70-30% splits for each image class. The performance wasn't perfect, but it seemed to work, considering the limited number of samples available. The settings which I found to work best are default in the application. 

However, it is not so easy to measure its performance because we have only a few limited samples. Misclassifying just one single image would thorn performance apart by a very large percent. A possible solution would be to try cross-validation, bootstrap, etc, but I would really encourage getting more samples instead. Besides, some samples are also very different. I would be surprised if this simple method produced much better results.

I have uploaded the sample application to RapidShare (it was too large to attach to this thread). Please take a look to check if I haven't accidentally leaked samples from the training set in the testing set. If this is not the case, then apparently the method seems to work. And by the way, where did you find the "101objects.tar.gz" database (so I can credit the original source in the application)?

Best regards,
Cesar

Safeii

unread,
Jul 2, 2012, 7:04:34 AM7/2/12
to accor...@googlegroups.com
Hello =)
thanks for the bow links , they were very helpful .
I'll see what you've done
here's the link , from PASCAL VOC if you know it 
I'll tell you soon if it works ;)
regards 
Reply all
Reply to author
Forward
0 new messages