Similarity measure to make a matchebetween an image and a set of other images

629 views
Skip to first unread message

Marc Giombetti

unread,
Feb 2, 2015, 3:59:07 AM2/2/15
to boo...@googlegroups.com
Dear BoofCV community,

I am new to BoofCV and I am intimidated and amazed at the same time by the sheer amount of functionality BoofCV offers.

I want to use BoofCV to compare images of documents (e.g. a scan of a document) with a set of template images I already have.
For every new image, I want to calculate a matching score (from 0-1 with 0= no shared featues, different image - 1= exactly the same image).

I am using the following code as of the ExampleAssociatePoints
(https://github.com/lessthanoptimal/BoofCV/blob/8a601eead55aa2e9461c7a15ece21e30b4515b42/examples/src/boofcv/examples/features/ExampleAssociatePoints.java)

        DetectDescribePoint detDesc = FactoryDetectDescribe.surfFast(configDetector, null, null, imageType);

        //scoring of features - Squared euclidean is the default scorer for our description type
        ScoreAssociation scorer = FactoryAssociation.defaultScore(detDesc.getDescriptionType());

        //Association descriptor
        AssociateDescription associate = FactoryAssociation.greedy(scorer, Double.MAX_VALUE, true);

        // load and match images
        AssociatePoints app = new AssociatePoints(detDesc,associate,imageType);

        BufferedImage imageA = UtilImageIO.loadImage(image1);
        BufferedImage imageB = UtilImageIO.loadImage(image2);

        //Do the actual association
        app.associate(imageA,imageB,display);


        numberOfMatches = associate.getMatches().getSize();


For now I am comparing my image to every image in the templates list and I am counting the numberOfMatches and taking the max of it to find the best matching image. This obviously does not work when there is no matching image in the templates list.

Is there any way to get a measure of similarity, a matchValue so that i could accept only images having a matchValue > 0.8.

Many thanks for your help and kind regards
Marc


Peter A

unread,
Feb 2, 2015, 9:48:39 AM2/2/15
to boofcv
There really isn't any clean cut "right" way to do what you want.  Are the images composed purely of text or are  there pictures also?  Do you always see the entire page?

If it's pure text then converting the image to a text string using OCR might be the best way.   Unfortunately OCR isn't included in BoofCV yet.  The type of comparison you're doing tends to work best with environmental photos.

- Peter

--
You received this message because you are subscribed to the Google Groups "BoofCV" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boofcv+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
"Now, now my good man, this is no time for making enemies."    — Voltaire (1694-1778), on his deathbed in response to a priest asking that he renounce Satan.

Marc Giombetti

unread,
Feb 2, 2015, 10:14:19 AM2/2/15
to boo...@googlegroups.com
Dear Peter

Thanks for the fast reply.

I am always working on the image of the fullpage of a PDF document. I extract a JPEG image of a page of a PDF using pdfbox. I have over 100 template images of documents and these are all forms, letters, customer
complaints containing text, tables, headers, footers (company logo, address, legal info) aso. I first want to identify which form the current document has (so basically to recognize the image among my list of templates of documents)

In a second step I will need the OCR to extract information at the right place in the document (e.g. the address in the header - but I still first need to identify which is the best matching template in my library.

I have two more questions:
- Do you have any recommendation what the best approach would be to determine the similarity of a document (based on the image) and another image of a document?
- Is there a general way to compute a similarity between two images in BoofCV

Thanks a lot
Marc

Swapnil Gandhi

unread,
Apr 15, 2016, 2:33:12 PM4/15/16
to BoofCV
Mark, did u get any solution to this?

Bill Ross

unread,
Apr 15, 2016, 9:47:02 PM4/15/16
to BoofCV
I'm after the same sort of thing. If you start with  the demo ExampleClassifySceneKnn.java
and after analyzing it wind up in the source tree at ClassifierKNearestNeighborsBow.java 
  

public int classify(T image) {

 ...

                // Find the N most similar image histograms

                resultsNN.reset();

                nn.findNearest(hist,-1,numNeighbors,resultsNN);


It looks like the nearest neighbors are in resultsNN, and the following code in the class shows how to unpack them and the distances, which would be the closest you'd find to a similarity measure I think.


I started with your idea of getting the O(N^2) set of distances and sorting them, but just realized that the nearest neighbor algorithm seems to be just getting the ones you want - your task would be to pick the number of neighbors such that you'd get all the ones over your cutoff.


Bill


On Monday, February 2, 2015 at 7:14:19 AM UTC-8, Marc Giombetti wrote:
...

Bill Ross

unread,
Apr 16, 2016, 1:45:26 PM4/16/16
to BoofCV
With ExampleClassifySceneKnn.java adapted to run surfFast/cluster/histogram/match_N^2 with the default params on 8K images at ~700px on the longest dimension, 25G of memory is needed for loading the file set for clustering (laptop, 8G, Oracle JVM), while at 400px, only 8.5G is needed (amazon instance, 30G, openjdk).

Once clustering is done, the memory required is negligible. Multiplying the histogram distance-squared by 1000 to get into integer space, I'm posting the distribution of distances I'm seeing for the small pics, where x axis is the distance value, and y is the count. (The laptop is still clustering.)

I'm wondering naively if I should increase the number of words from 100 for the big image space.
Screen Shot 2016-04-16 at 10.36.17 AM.png

Bill Ross

unread,
Apr 16, 2016, 2:33:30 PM4/16/16
to BoofCV
Looking at distances == 0, I found 5 dupes I didn't know I had, and one false match on night scenes, which is attached. The remaining 30  0-distance cases were versions of the same pics. Rerunning now with words=500.
img7074a.jpg
img7101a.jpg

Bill Ross

unread,
Apr 16, 2016, 4:33:53 PM4/16/16
to BoofCV
Come to think of it, as long as all the descriptions are loaded to do the clustering and I'm doing O(N^2) anyway, if I can get a description/description match (vs. histogram/histogram derived from the clusters), I can save forever on the clustering step, and get finer-grain data. Parallel clustering would be nice too :-).

Peter A

unread,
Apr 17, 2016, 11:32:43 AM4/17/16
to boofcv
You might want to take a look at this deep learning class:


The techniques you are using are out of date and you can probably get better much results.  I'm working on code right now which can read NN generated in Torch or Caffe and run them in BoofCV's environment

--
You received this message because you are subscribed to the Google Groups "BoofCV" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boofcv+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bill Ross

unread,
Apr 17, 2016, 2:54:32 PM4/17/16
to BoofCV
fastSurf w/ 500 words gets the same matches as w/ 100 words, including the false one. Hog/500 doesn't get the false match and says only 10 version pairs are the same vs 30. Distributions of distances look the same as posted above. 12 and 16 hours of cpu respectively.

Bill Ross

unread,
Apr 20, 2016, 2:52:01 AM4/20/16
to BoofCV
I'm reading the class notes, very interesting. A statistics class in June will help. Meanwhile, I'm up for applying anything available to my data set. I should have a few demos of chaining-best-pics with the old code up this week.

Bill

On Sunday, April 17, 2016 at 8:32:43 AM UTC-7, Peter wrote:

Bill Ross

unread,
Apr 24, 2016, 12:24:20 AM4/24/16
to BoofCV
I have added 8 image match options in a hidden array next to the Home button on the top of my slide show page. You can see them by the tool tips when you hover. Left to right they are

  - color histogram HS 12x12
  - Hog 500 words @300px
  - Hog 500 words @400px
  - Sift 500 words @300px
  - Sift 100 words @300px
  - SurfFast 500 words @400px
  - SurfFast 100 words @300px
  - Hog 100 words @300px

Each gives the best (unseen) match for the algorithm and set of pictures (view). 

Everything but the experimental options selects from 1400 favorites for the 1st 100 pics:
Clicking on the picture itself gives a relaxed selection from among the top color matches.
On the red '-' gives a selection from among furthest-distance pics in color space (my own metric)
On the blue bar gives random
On the green '+' gives a keyword-based selection


Next will try hybridizing the experimental options.

Bill

Bill Ross

unread,
Apr 24, 2016, 5:52:06 AM4/24/16
to BoofCV
Hybridization is the unexpected cherry on the week's effort. I keep puzzling over why the picture choices seem so interesting when all 8 permutations of algorithms get to vote. Maybe just knowing that there is a consensus biases one toward finding meaning, even when the voters are dumb, but I'm finding a level of match that I didn't expect to find after playing with the individual algorithms and with just 3 of the 8 algorithms together - throwing them all together was the last thing to try. 

'Hybrid' is a new hidden option, to the right of the others on the top of the screen. (In small formats, these options may wrap. "Use the toolTip, Luke.") 

Trying it freshly now that it is up on the site, I see how deterministic the sequence is compared to the unhidden options. 

Using best 30 matches of each of the 8 algorithms, it groups the candidate pictures by numbers of algorithms matched. If there is nothing to differentiate based on all the best 30, I go back and throw in the next 30 for each algorithm.  Then, work down the groups, checking each group against the each favorite and so on across all the algorithms (in left to right order, i.e. color match first). If all that fails, I go to brute round robin exploration of the total lists from each algorithm, one level at a time.

A lot of this backing up and falling back is to keep from seeing repeats and falling into cycles. Best-to-best algorithms would dearly like you to see two pictures forever. 

Here are lists of per-set [picCount: nPics] for a series of hybrid associations. The value of hurting your eyes with this is you can see how often the algorithms overlap on picture choices.

E.g. in the first list of Sets, the group of pics with the most algorithms matching matched on 5 algorithms, and it consisted of 1 picture. The next group matched on 4 algorithms, and also had 1 picture. The next group, of 3-algorithm matches, had 2 pictures in it. A promiscuous 23 pictures matched on 2 algorithms, and the remaining 179 pictures were unique.

Sets: 5: 1 4: 1 3: 2 2: 23 1: 179 

Sets: 8: 1 7: 1 6: 1 5: 8 4: 6 3: 10 2: 23 1: 79 

Sets: 5: 1 4: 3 3: 9 2: 30 1: 136 

Sets: 8: 1 7: 3 6: 3 5: 1 4: 2 3: 9 2: 34 1: 85 

Sets: 8: 1 7: 2 6: 3 5: 3 4: 7 3: 12 2: 26 1: 69 

Sets: 8: 1 7: 2 6: 6 5: 1 4: 4 3: 15 2: 24 1: 68 

Sets: 5: 1 3: 5 2: 25 1: 170 

Sets: 7: 3 6: 4 5: 4 4: 7 3: 12 2: 26 1: 59 

Sets: 5: 3 4: 1 3: 9 2: 25 1: 144 

Sets: 7: 2 6: 1 5: 4 4: 9 3: 5 2: 31 1: 87 

Sets: 3: 13 2: 20 1: 161 

Sets: 7: 3 6: 2 5: 4 4: 8 3: 13 2: 22 1: 72 


Not bad for about $5 of active Amazon server cost (plus the inevitable unused time). I will recommend you to Amazon if you are the first person to add up all the numbers on this page.


Bill


"Even as a student, I had mocked the intellectual tumors grown by philosophers." -- Paul Feyerabend (philosopher)

Bill Ross

unread,
Apr 25, 2016, 12:06:08 AM4/25/16
to BoofCV
It feels like there is a core set of images that the algorithms want to show. (This is the case with keyword-based matching too.) That stats class may help with figuring it out, but at 1060/6500 pics seen in Bill's View, the best 26 matches for a given picture (using the 8 algorithms) have already been seen. With an even distribution, it seems like it would be more like the best 6 matches already seen. On the next, at least the 55 best matches have been seen, and an unseen one is found in the final, single-algorithm group holding 72 pictures. 

I wonder if this is a function of the image analysis parameters. Next is to figure out how to best characterize the clustering, so I can see if parameter changes make a difference and compare metrics.

Bill

Bill Ross

unread,
Apr 25, 2016, 5:38:49 PM4/25/16
to BoofCV
Looking at the view that includes two photographers with the hybrid matching scheme, the results are not so impressive to me. Variations on the same picture are detected nicely, but most transitions seem random. It seems the pictures in my own view are highly-enough correlated compositionally and in my mind so that it's not that hard to get an interesting juxtaposition, and/or I just like them enough that seeing them in a different order than usual is sufficient to get me excited. 

Bill Ross

unread,
Apr 26, 2016, 3:24:17 AM4/26/16
to BoofCV
Now there is button for a 100-word-based hybrid, a 500-word-based one, and the previous all (including color histogram). 
Plus there is now a RESET button at the rightmost point, which allows for erasing history, although you start with a random picture so there isn't full repeatability. With this, I am seeing the better matches again, and my faith is somewhat restored.

Peter A

unread,
Apr 26, 2016, 8:55:41 AM4/26/16
to boofcv
Are these new buttons hidden?  I can't see them.

- Peter

--
You received this message because you are subscribed to the Google Groups "BoofCV" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boofcv+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bill Ross

unread,
Apr 26, 2016, 2:00:33 PM4/26/16
to BoofCV
"I have added 8 image match options in a hidden array next to the Home button on the top of my slide show page. You can see them by the tool tips when you hover."

So now there are 11, including reset.

Bill

Bill Ross

unread,
May 22, 2016, 5:12:54 AM5/22/16
to BoofCV
I just added Gist results calculated on a scaled 256x256 central square of each image (OpenIMAJ implementation). It only took my laptop 35 minutes to calculate all the pair distances for the 8000 pics. Hacking their test case to do what I wanted was comparable to hacking the BoofCV examples.

As before, it finds close matches. Likewise, I find the results more interesting with the bigger set of pics (6500) than with the smaller one (1400), a limitation of the set sizes and selection.

There's now a 'gist'-tooltipped hidden button next to the reset one.

Bill

Bill Ross

unread,
May 23, 2016, 4:49:59 AM5/23/16
to BoofCV
Here are successive best-gist-distances for my first Gist session, showing some sort of hyperstructure within the set of pictures as viewed through the algorithm. I figure that as the number of pics went to infinity, this structure would even out. I wonder what scientific visualization technique would let one explore the space of .5 * N^2 pic-pic distances under a given metric? 
graph_first_gist_session_stripped.jpg

Bill Ross

unread,
May 23, 2016, 4:56:58 AM5/23/16
to BoofCV
It would be interesting to figure the number of dimensions required to accommodate the set of distances.

Bill Ross

unread,
May 30, 2016, 3:36:46 AM5/30/16
to BoofCV
New results exploring the different algorithms, showing degree of overlap between sequences of 500 pics starting from different pictures and doing nonrepeating best picks, using Lab/CIE94, Hue+Sat, Sift, Hog, and Surf.


Graphs plus a poem with a nice link.

Bill

Bill Ross

unread,
Jun 20, 2016, 5:32:04 PM6/20/16
to BoofCV
I've been trying to figure the number of dimensions required to hold the sets of image-image distances generated by various methods, and here is my first tentative result:


E.g. the 'dimension' goes up as the number of bins in HS in increased, and as words are added with Hog and SurfFast.

Bill

Bill Ross

unread,
Jun 23, 2016, 1:57:29 AM6/23/16
to BoofCV
I just updated that post with fractal dimensions for sorted distances for each metric, which also track expectations somewhat.

For convenience, here are the latest numbers (more will be added at stackoverflow).

Method        Frac_d  Dim       stress(100)              stress(1)
Lab_CIE94     1.1458   3   2114107376961504.750000  33238672000252052.000000
Greyscale     1.0490   8        42238951082.465477      1454262245593.781250    
HS_12x12      1.0889  19        33661589105.972816      3616806311396.510254
HS_24x24      1.1298  35        16070009781.315575      4349496176228.410645    
HS_48x48      1.1854  64         7231079366.861403      4836919775090.241211
GIST          1.2312   9        28786830336.332951       997666139720.167114
HOG_250_words 1.3114  10        10120761644.659481       150327274044.045624
HOG_500_words 1.3543  13         4740814068.779779        70999988871.696045
HOG_1k_words  1.3805  15         2364984044.641845        38619752999.224922
SIFT_1k_words 1.5706  11         1930289338.112194        18095265606.237080
SURFFAST_200w 1.3829   8         2778256463.307569        40011821579.313110
SRFFAST_250_w 1.3754   8         2591204993.421285        35829689692.319153
SRFFAST_500_w 1.4551  10         1620830296.777577        21609765416.960484
SURFFAST_1k_w 1.5023  14          949543059.290031        13039001089.887533
SURFFAST_4k_w 1.5690  19          582893432.960562         5016304129.389058

Bill
Reply all
Reply to author
Forward
0 new messages