Lowe Scale-space Local Extrema Detection

13 views

Skip to first unread message

Ben Plotnick

unread,

Dec 1, 2011, 6:41:33 PM12/1/11

to cis58...@googlegroups.com

(Lowe 2004) talks about breaking up each octave (doubling of sigma) into a number of scales, convolving those with Gaussians, then taking the difference in adjacent gaussians. Then they downsample and repeat. It seems that they are doing something similar to what is explained in the slides [i.e. D(x,y,sigma) = (G(x,y,k*sigma) - G(x,y,sigma)) * I(x,y)], but with k = 2^(1/s).

My questions are 1) Can we simply make s = 1 (or should we make this a parameter)? and 2) How many octaves should we be spanning (or should we make this too a parameter)? and most importantly 3) How do we do this in a way that we don't need to make num_octaves*s copies of the image?(!)

Thanks,

-Ben

--
Ben Plotnick
University of Pennsylvania
M.S.E Electrical Engineering | May 2012
bplo...@seas.upenn.edu

Andrew Yeager

unread,

Dec 2, 2011, 1:19:33 PM12/2/11

to cis58...@googlegroups.com

1) The downsampling at each octave is just an optimization to save time and space. For each scale, you're increasing sigma, but once sigma has doubled, you could imagine convolving with a sigma of half the size, but on a downsampled image where distances are basically cut in half. You could interpolate the smaller images to reconstruct the image you would have gotten by using a full-size image with double the sigma, so nothing is really lost in the down-sizing.

Setting s=1 then is basically saying that if you were to work without down-sizing you're doubling sigma each time. There's nothing necessarily wrong with that, except that experimentally their paper found that 3 scales per octave produced better results.

2) I suppose a parameter would be good for this as well. The scale invariance arises because you're finding local-maxima across space (x and y locations) and scale. So if you have the same logo at 2 different sizes, the idea is that while at each scale the signal will look different, that the signal of the smaller logo should look like the signal of larger logo taken at a larger value of sigma. Because of this, the hope is that the local maxima across scales will will find the same locations in the images, just be located at different sigma to account for the size change. So the octaves you use basically determine how much larger and smaller of a logo you'd be trying to match.

3) The downsampling will mean that while you have many copies, if s = 3 then every third time you're shrinking the size to store the image by a factor of 4, so the total pyramid should use less than (4/3)N*s. (Geometric series N + 1/4N + 1/16N+..., then s copies for each scale).

4) You can use an existing implementation of SIFT for this project. The link on the course webpage is supposed to link here:

http://people.csail.mit.edu/albert/ladypack/wiki/index.php?title=Known_implementations_of_SIFT

Though it has been broken until now.

Andrew

--
You received this message because you are subscribed to the Google Groups "cis581.upenn" group.
To post to this group, send email to cis58...@googlegroups.com.
To unsubscribe from this group, send email to cis581upenn...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cis581upenn?hl=en.

Reply all

Reply to author

Forward

0 new messages