Locating subimage in an image is slow.

Zippoxer

unread,

Jul 11, 2012, 5:39:44 PM7/11/12

to golan...@googlegroups.com

func Find(img, subimg image.Image) (int, int) {

rgba := img.(*image.RGBA)

rgba2 := subimg.(*image.RGBA)

for x := rgba.Rect.Min.X; x < rgba.Rect.Max.X - rgba2.Rect.Max.X + 1; x++ {

for y := rgba.Rect.Min.Y; y < rgba.Rect.Max.Y - rgba2.Rect.Max.Y + 1; y++ {

for x2 := rgba2.Rect.Min.X; x2 < rgba2.Rect.Max.X; x2++ {

for y2 := rgba2.Rect.Min.Y; y2 < rgba2.Rect.Max.Y; y2++ {

i := rgba.PixOffset(x + x2, y + y2)

r, g, b := rgba.Pix[i], rgba.Pix[i+1], rgba.Pix[i+2]

i = rgba2.PixOffset(x2, y2)

r2, g2, b2 := rgba2.Pix[i], rgba2.Pix[i+1], rgba2.Pix[i+2]

if r != r2 || g != g2 || b != b2 {

goto next

}

return x, y

}

return -1, -1

}

Find goes through each point of img and tries to fit subimg there. Metaphorically, it crops a temporary subimage out of img at each point of img which has the same size as subimg and compares it to subimg. It stops when the temporary subimage matches subimg and returns the point in img at which the temporary subimage starts. That's the algorithm I instinctively came up with once I thought of how machines find subimages. And it's Slow.

In a screenshot of my desktop (windows, resolution 1920x1080) it takes 1ms to find the Start button (at the bottom left of the screen) and 116ms to find the Show Desktop button (at the bottom right of the screen). I understand, that happens since Find starts from top left to bottom right, so the closer the subimage you wanna find is to the bottom right, the longer it will take for it to be found. However, a macro tool like SCAR finds the Show Desktop button within 15ms. So either my implementation of the algorithm is the problem or the algorithm is. I don't believe SCAR uses a different algorithm that makes it faster, but even if it does, I believe my algorithm can go faster than it does now with a better implementation. Any ideas?

(below are my motives, not relevant to the problem)

I do that because I want Go to replace my need of SCAR, so I'm writing a macro library that I hope to release somewhen. SCAR is for windows only, let's you write macros only in PASCAL and only executes source code (you post your macro, you post your code). A go macro library would not do any of these to you.

Zippoxer

unread,

Jul 11, 2012, 5:52:03 PM7/11/12

to golan...@googlegroups.com

That is how I test Find (images and test code attached), in case someone is interested. I just go run find_subimage.go and this is the output I currently get:

found start button at 4/1040 in 1ms

found show desktop button at 1905/1051 in 114.0065ms

find_subimage.zip

Kyle Lemons

unread,

Jul 11, 2012, 5:55:58 PM7/11/12

to Zippoxer, golan...@googlegroups.com

On Wed, Jul 11, 2012 at 2:39 PM, Zippoxer <zipp...@gmail.com> wrote:

func Find(img, subimg image.Image) (int, int) {
rgba := img.(*image.RGBA)
rgba2 := subimg.(*image.RGBA)

for x := rgba.Rect.Min.X; x < rgba.Rect.Max.X - rgba2.Rect.Max.X + 1; x++ {
for y := rgba.Rect.Min.Y; y < rgba.Rect.Max.Y - rgba2.Rect.Max.Y + 1; y++ {

for x2 := rgba2.Rect.Min.X; x2 < rgba2.Rect.Max.X; x2++ {
for y2 := rgba2.Rect.Min.Y; y2 < rgba2.Rect.Max.Y; y2++ {

i := rgba.PixOffset(x + x2, y + y2)
r, g, b := rgba.Pix[i], rgba.Pix[i+1], rgba.Pix[i+2]
i = rgba2.PixOffset(x2, y2)

r2, g2, b2 := rgba2.Pix[i], rgba2.Pix[i+1], rgba2.Pix[i+2]
if r != r2 || g != g2 || b != b2 {
goto next

}
}
}
return x, y

next:
}
}
return -1, -1

}

For an image NxM and subimage PxQ, that algorithm is O(n*m*p*q). You can probably speed it up dramatically by cutting the resolution down and/or doing row-by-row candidate selection with a linear algorithm for each row ( O(n*m) ), and then going through each candidate one subsequent row at a time (O(q)). If you know the subimage exists, you can make it even faster by stopping when you have one candidate left.

Erwin

unread,

Jul 11, 2012, 5:56:16 PM7/11/12

to Zippoxer, golan...@googlegroups.com

Find goes through each point of img and tries to fit subimg there. Metaphorically, it crops a temporary subimage out of img at each point of img which has the same size as subimg and compares it to subimg. It stops when the temporary subimage matches subimg and returns the point in img at which the temporary subimage starts. That's the algorithm I instinctively came up with once I thought of how machines find subimages. And it's Slow.

I suppose a multi-resolution approach would be faster. Imagine making smaller interpolated versions of your images first (like mip-maps), and start searching in the lowest resolution you have. Once you got a match, check the corresponding area in the next higher resolution, and so on.

Zippoxer

unread,

Jul 11, 2012, 6:26:27 PM7/11/12

to golan...@googlegroups.com, Zippoxer

I can't cut the resolution down since I don't know whether the subimage is in the bottom right of the image or top right. I don't think Find's algorithm is O(n*m*p*q), I think it's O(n*m*log(p*q)) since the comparison between the temporary subimage to the subimage stops when one of the pixels doesn't match. If I'm right, could row-by-row candidate selection still be faster?

I know row-by-row candidate selection sounds memory consuming, but I shouldn't find subimages concurrently either way so I don't expect significant memory consumption.

Jonathan Wills

unread,

Jul 11, 2012, 6:38:10 PM7/11/12

to Zippoxer, golan...@googlegroups.com

On Wed, Jul 11, 2012 at 3:26 PM, Zippoxer <zipp...@gmail.com> wrote:

I can't cut the resolution down since I don't know whether the subimage is in the bottom right of the image or top right. I don't think Find's algorithm is O(n*m*p*q), I think it's O(n*m*log(p*q)) since the comparison between the temporary subimage to the subimage stops when one of the pixels doesn't match. If I'm right, could row-by-row candidate selection still be faster?

Any linear time find will take some extra storage but should be able to run faster than what you're doing now. If you use Boyer-Moore you'll need M*N storage, but you will probably do your finds in sublinear (that is, sub O(M*N)) time. If you use Aho-Corasick then you'll only need P*Q storage, but you'll only be able to do this in linear (O(M*N)) time. Aho-Corasick might be your best bet if you go this route, since it also lets you do all of these searches simultaneously.

I've made a simple interface to both of these algorithms, it's available here if you'd like to use it:

https://github.com/runningwild/stringz

Job van der Zwan

unread,

Jul 11, 2012, 6:55:35 PM7/11/12

to golan...@googlegroups.com, Zippoxer

Regardless of what you end up using, it's usually faster to make the outer loop scan over Y, and the enclosed loop scan over X, because it reduces cache misses:

http://www.1024cores.net/home/parallel-computing/cache-oblivious-algorithms

... although I'm not sure if that still works if you use PixOffset to determine i - you might need to inline and rewrite that logic as well.

Rob 'Commander' Pike

unread,

Jul 11, 2012, 6:58:33 PM7/11/12

to Job van der Zwan, golan...@googlegroups.com, Zippoxer

Since you know it's an RGBA, don't bother with the 2d loop. Be smarter about unrolling. All that PixOffset stuff is crazy. Also, if you don't want to do full Boyer-Moore, it's probably worth a fast check for some particular pixel in the image. The first is the easiest but the middle pixel might be the best.

-rob

Glenn Brown

unread,

Jul 12, 2012, 12:34:05 AM7/12/12

to Zippoxer, Glenn Brown, golan...@googlegroups.com

Subimage location is "convolution" in Digital Signal Processing lingo. The fast algorithm is
max(idct(mul(dct(image),dct(subimage)))), where dct() is the Discrete Cosine Transform in 2 Dimensions, mul() multiplies each element in each image, idct is the inverse of dct(), and max() locates the maxumum. I believe it's O(n*m log(n*m)) where n and m are the dimension of the image.

--Glenn

Message has been deleted

Jonathan Wills

unread,

Jul 12, 2012, 1:58:08 PM7/12/12

to Glenn Brown, Zippoxer, golan...@googlegroups.com

You mean correlation, which would be max(idct(mul(dct(image),conjugate(dct(subimage))))), not convolution. And this won't work for this use case, at least not like that. Looking for the max works when you look for a signal in noise because noise typically integrates to roughly zero over any interval. If you were looking for a grey block in an image that included a grey block and a white block, for example, you'd get a max indicating a match in the white block just because the values for white pixels are larger than they are for grey.

Nigel Tao

unread,

Jul 12, 2012, 11:43:27 PM7/12/12

to Zippoxer, golan...@googlegroups.com, Job van der Zwan

If you haven't already read
http://golang.org/doc/articles/image_package.html, please do so. It
describes how an Image's bounds do not necessarily contain the origin
(0, 0).

On 12 July 2012 22:08, Zippoxer <zipp...@gmail.com> wrote:
> for i := 0; i <= len(rgba.Pix)-rgba2.Stride; i += 4 {

I think that this can give you a false positive if rgba's Stride is
larger than rgba.Rect's width. You may be searching in the Pix slice
outside of an image bounds. Suppose that your rgba variable is the
blue image in
http://golang.org/doc/articles/image-package-05.png
which is found in the article at
http://golang.org/doc/articles/image_package.html

> for row := 0; row < rgba2.Rect.Max.Y; row++ {

I think that the upper bound is wrong if rgab2's Rect.Min is not the
origin. It should be rgba2.Rect.Max.Y - rgba2.Rect.Min.Y.

> for j := 0; j <= rgba2.Stride-4; j += 4 {
> if rgba.Pix[n+j] != rgba2.Pix[n2+j] {
> goto next
> }
> }

I think you're only examining the red values of each RGBA pixel, and
ignoring the blue, green and alpha.

I haven't tested this, but try something like

// contains returns whether the image m0 contains m1. If it returns true, it
// also returns the point in m0 at which m1 appears.
//
// TODO: Boyer-Moore search could be much faster.
func contains(m0, m1 *image.RGBA) (image.Point, bool) {
s1 := m1.Rect.Size()
s0 := m0.Rect.Size().Sub(s1)
for y0 := 0; y0 < s0.Y; y0++ {
loopx0:
for x0 := 0; x0 < s0.X; x0++ {
i0 := y0*m0.Stride + x0*4
j0 := i0 + s1.X*4
i1 := 0
j1 := s1.X * 4
for y1 := 0; y1 < s1.Y; y1++ {
if !bytes.Equal(m0.Pix[i0:j0], m1.Pix[i1:j1]) {
continue loopx0
}
i0 += m0.Stride
j0 += m0.Stride
i1 += m1.Stride
j1 += m1.Stride
}
return m0.Rect.Min.Add(image.Pt(x0, y0)), true
}
}
return image.Point{}, false
}

Reply all

Reply to author

Forward