'Magic' kernel for image zoom resampling?

792 views
Skip to first unread message

jpcos...@hotmail.com

unread,
Jun 7, 2006, 5:37:04 AM6/7/06
to
Hi folks,

I've recently generated a solution to an image zooming problem, and
can't find anything quite like it on our good old web or usenet. I'm
trying to figure out if I've stumbled on something that might actually
be useful to other people. :)

My need was to downsample an image by a factor of 8 in each direction
(i.e. 1 downsampled pixel for every 64 original image pixels), and then
upsample the result back to the original resolution. I understand all
the information theoretical aspects of this process; I was trying to
figure out if there was a kernel that did both tasks well (i.e., the
final result was smooth and without stellations or other strange
artifacts), and also did them fast.

After some truly terrible attempts (even though I thought I had sinc
functions well and truly under my belt 20 years ago), I found that the
following recursive algorithm works amazingly well:

1. To downsample or upsample by a factor of 2, use a kernel

1 3 3 1
3 9 9 3
3 9 9 3
1 3 3 1

placed over every second pixel in each direction. (Total normalization
of 64 is relevant for downsampling. For upsampling, only divide by 16,
because every upsampled pixel gets a contribution from four downsampled
pixels' kernels).

2. To downsample or upsample by a factor of 2^k, perform the above
resampling recursively, k times.

The results for upsampling are astoundingly good, really astoundingly
good, and better than what I could get Photoshop to do. (Some years ago
I did my fair share of 'extreme zooming' in some work on the
photographic evidence of the JFK assassination, so I'm very familiar
with all the strange artifacts that can pop out, and the disappointment
that accompanies it.)

I can upload a few images to my website if need be, to demonstrate what
I am talking about (or, equivalently, point me to some test images and
I'll show you what comes out).

For my particular purpose (8:8 downsampling and upsampling), applying
this 'magic' kernel three times yields a kernel that is only 22 x 22 in
size (if you want to precompute the kernel and apply it once, as I
ultimately want to, rather than actually perfoming the doubling process
three times). (Every time you apply it, double the kernel width or
height and add 2, so 2 x 4 + 2 = 10, and 2 x 10 + 2 = 22.) That's
pretty good, considering that, for 800% zooming, it's already 16 pixels
from nearest pixel to nearest pixel, in each dimension.

Of course, if you wanted to resample by something that is not a power
of 2, then you'd need to use the 'magic' kernel for the nearest power
of 2, and then use something more traditional for the final small
adjustment in resolution. That's no major problem, because getting to
the final result from the nearest power of 2 is never worse than
between a 71% and 141% zoom, and just about any resampling method does
a decent job in this range.

My question is: Is this 'magic kernel' something new, or is this trick
known?

The closest I could find on the net is the 'stair interpolation' trick,
which uses Photoshop's bicubic for successive 110% increases, which is
sort of, kind of, the same idea, but not quite. The other resampling
kernels I could find on the net look much more like what I was trying
to do in the first place, but nothing like what I ended up with.

The 'magic' kernel sort of reminds me of the Fast Fourier Transform,
which also gets all sorts of amazing efficiencies with powers of 2, and
then needs a bit of a (non-critical) fudge if you don't quite have a
power of 2.

Oh, and by the way, I know how I arrived at the 'magic' kernel above
(for another aspect of the project that needed just 2:2 downsampling
and upsampling), and it has some nice mathematical properties (it is
'separable' in the sense that it is the product of an 'x' 1-3-3-1 and a
'y' 1-3-3-1, and its normalization is always a power of 2, which means
everything is integer look-ups with bit-shifts, which makes me
extremely happy), but I have no mathematical proof at all of why it
works so damn well when applied to itself recursively.

Anyhow, thanks in advance for any words of advice. And apologies in
advance if I've rediscovered someone's magic kernel, as is bound to be
the case. :)

John Costella

_______________________________________________
Dr. John P. Costella
BE(Elec)(Hons) BSc(Hons) PhD(Physics) GradDipEd
Melbourne, Victoria, Australia
jpcos...@hotmail.com
cost...@bigpond.com
assassinationscience.com/johncostella

John Costella

unread,
Jun 7, 2006, 8:14:41 AM6/7/06
to
Sorry, it was recommended that I cross-post here, but that means that
replies in the original group do not appear here.

Here is some clarifying information on the algorithm:
----------------------

> "For upsampling, only divide by 16,
> because every upsampled pixel gets a contribution from four downsampled
> pixels' kernels)."

> Huh? Both kernels should be normalized. Also, for enlargement, I expect
> that you are interlacing this kernel with identity so all of the original
> pixels are retained.


Both are normalized, but I was probably too obscure in my description.
And no, I am not interlacing this kernel with identity for enlargement.


I'd better explain it more clearly. Start with downsampling. Label the
pixels of the original image (x,y). For a box filter, the top-left
downsampled pixel averages (0,0), (1,0), (0,1) and (1,1), and the
result is notionally centered at (0.5,0.5). I am saying that, instead
of this, compute it as

(-1,-1) + 3(0,-1) + 3(1,-1) + (2,-1)
+ 3(-1, 0) + 9(0, 0) + 9(1, 0) + 3(2, 0)
+ 3(-1, 1) + 9(0, 1) + 9(1, 1) + 3(2, 1)
+ (-1, 2) + 3(0, 2) + 3(1, 2) + (2, 2)

and divide the result by 64. OK, ignore the fact that we have gone one
pixel outside the image; that's an edge effect that we can deal with.
You can see the original 2 x 2 'tile' is in the middle of this kernel.

Now, for the next downsampled value to the right, shift everything
right by two pixels. The box filter would just average out (2,0),
(3,0), (2,1) and (3,1), and the result is notionally centered at
(2.5,0.5). I'm saying to compute

(1,-1) + 3(2,-1) + 3(3,-1) + (4,-1)
+ 3(1, 0) + 9(2, 0) + 9(3, 0) + 3(4, 0)
+ 3(1, 1) + 9(2, 1) + 9(3, 1) + 3(4, 1)
+ (1, 2) + 3(2, 2) + 3(3, 2) + (4, 2)

and again divide the result by 64. Again you can see the original 2 x 2

tile in the middle of the calculation. And so on.

For upsampling, it's simplest to imagine that we're reversing the above

process. Label the downsampled pixels [x,y], where x and y are the
'notional' centered positions listed above, i.e., we calculated
[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place
a copy of the kernel, multplied by the weight of the pixel, and divide
by 16. If you work it through, you'll see that each upsampled pixel
receives a contribution of 9 times the nearest downsampled pixel (the
one whose corner it's 'in'), 3 times the next two nearest ones (other
adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.
That's a total of 16, so you divide by 16. For example,

(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16

Again, we have edge effects in that we've gone outside the image, but
we can deal with that.

> Your 1D separated kernel, 0.125*[1 3 3 1], seems like an approximate
> Gaussian and this has been covered in literature, but maybe not for your
> particular stand. dev..

Well, yeah, [1 3 3 1] is an approximation to just about anything if you

try hard enough. :)

You can make the two terms (0.5,3) and (1.5,1) fit a Gaussian by making

its standard deviation 1 / sqrt( ln( 3 ) ) = 0.954.... The next two
terms would then be 1/3 and 1/81, so what you're telling me is that
it's a 'truncated Gaussian'? I suppose so, but that doesn't really tell

me why it works so well in two dimensions. (More on this below.)

I looked at the site you recommended in your next reply. I know all
about the information theoretical aspects of 1-dimensional resampling,
and have played with those things in many different forms for many
years (and continue to, in other areas). The problems come with a
two-dimensional grid, as far as I can tell.

None of the Gaussians listed on that site have coefficients that match
the 'magic' kernel. That's not to say that this isn't a fruitful way to

look at it, but at this stage I'm not quite sure.

For interest, I've uploaded a simple graph of the 800% kernel to

http://www.assassinationscience.com/johncostella/kernel800percent.jpg

Looking at the numbers, the kernel is still the product of two 1-d 'x'
and 'y' kernels. The 1-d kernel is actually three parabolas:

[0] 1 3 6 10 15 21 28 [36] 42 46 48 48 46 42 [36] 28 21 15 10 6 3 1 [0]


where the joins between the parabolas are marked by the [36] terms.
That it is three parabolas can be proved from second differences:

[0] 1 2 3 4 5 6 7 [8] 6 4 2 0 -2 -4 -6 [-8] -7 -6 -5 -4 -3 -2 -1 [0]
1 1 1 1 1 1 1 1 | -2 -2 -2 -2 -2 -2 -2 -2 | 1 1 1 1 1 1 1 1

The 400% kernel follows the same pattern: it is the product of

[0] 1 3 6 [10] 12 12 [10] 6 3 1 [0]

the differences of which give

[0] 1 2 3 [4] 2 0 -2 [-4] -3 -2 -1 [0]
1 1 1 1 | -2 -2 -2 -2 | 1 1 1 1

I guess then the 200% one can be written

[0] 1 [3] [3] 1 [0]

which on differencing gives

[0] 1 [2] 0 [-2] -1 [0]
1 1 | -2 -2 | 1 1

And I suppose that's the algorithm for generating the 2^k kernel:

A. Form a list of k 1's followed by k -2's followed by k 1's.
B. Starting at 0, form the cumulative sum of the first list.
C. Starting at 0, form the cumulative sum of the second list.
D. Use the 3rd list as the 1-d kernel.
E. Form the 2-d kernel as the product of the two 1-d 'x' and 'y'
kernels.

So for 1600% magnification, the steps would be:

A. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
B. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 14 12 10 8 6 4 2 0 -2 -4 -6
-8 -10 -12 -14 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
C. 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136 150 162 172 180 186
190 192 192 190 186 180 172 162 150 136 120 105 91 78 66 55 45 36 28 21

15 10 6 3 1

and so on.

This makes for a pretty efficient implementation. The kernel dimensions

are 3(2^k)-2 by 3(2^k)-2 (i.e. 4 by 4 for 2^1, 10 by 10 for 2^2, 22 by
22 for 2^3, 46 by 46 for 2^4, etc.). The 1-d kernel has 3(2^(k-1))-1
unique elements (i.e. 2 for 2^1, 5 for 2^2, 11 for 2^3, 23 for 2^4),
one of which is always 1, so you only need 3(2^(k-1))-2 integer
multiplication lookup tables (e.g. for 2^k=16, you only need 22 lookup
tables), because at any position multiplying by the kernel coefficient
is the same as multiplying by the 'x' 1-d kernel element and then the
'y' 1-d kernel element. The above algorithm gives you them all. The
normalization is 2^(6k).

> Regarding size reductions, the most commonly used COMMERCIAL method is to
> use a discrete kernel derived from an expanded version of the same
> continuous kernel, e.g., bicubic, used for enlargement. If the size
> factor is 1/N, then the continuous kernel is expanded by N. Of course,
> the compact support of the discrete reduction kernel is expanded. Your
> use of the same kernels for both operations violates this.

Yeah, I know I'm not doing the normal thing, but it seems to work. :)

John

aruzinsky

unread,
Jun 7, 2006, 10:07:51 AM6/7/06
to
"For upsampling, it's simplest to imagine that we're reversing the above

process. Label the downsampled pixels [x,y], where x and y are the
'notional' centered positions listed above, i.e., we calculated
[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place
a copy of the kernel, multplied by the weight of the pixel, and divide
by 16. If you work it through, you'll see that each upsampled pixel
receives a contribution of 9 times the nearest downsampled pixel (the
one whose corner it's 'in'), 3 times the next two nearest ones (other
adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.
That's a total of 16, so you divide by 16. For example,

(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16"

Again, you lost me here. Please, stop belaboring everything else and
explain only the above. Since your kernel is separable, you should be able
to give a 1D example to explain your enlargement process.

John Costella

unread,
Jun 8, 2006, 7:08:03 AM6/8/06
to
aruzinsky wrote:
> Again, you lost me here. Please, stop belaboring everything else and
> explain only the above. Since your kernel is separable, you should be able
> to give a 1D example to explain your enlargement process.

Good idea. Sorry.

Take the original 1-d 'image' to be

a b c d e ...

and the enlarged image to be

A B C D E F G ...

Ignore edge effects, and put b half-way between A and B, c half-way
between C and D, d half-way between D and E, etc. Then

A = ( a + 3b ) / 4
B = ( 3b + c ) / 4
C = ( b + 3c ) / 4
D = ( 3c + d ) / 4
E = ( c + 3d ) / 4
F = ( 3d + e ) / 4
G = ( d + 3e ) / 4

etc. You can see, for example, that 'c' contributes {1,3,3,1} / 4 to
{B,C,D,E}; 'd' contributes {1,3,3,1} / 4 to {D,E,F,G}; and so on.

To downsample from the A B C ... back to the a b c ... you just apply
this in reverse:

c = ( B + 3C + 3D + E ) / 8
d = ( D + 3E + 3F + G ) / 8

etc.

In the general case of m=2^k, where the block filter would just choose
the nearest tile of m pixels, the algorithm actually chooses the center
tile of m pixels, plus the tile of m pixels to the left, plus the tile
of m pixels to the right. The left and right tiles grow parabolically
from 0 at the edges (that's why the kernel is of width 3m-2 rather than
3m, because the ends always vanish), and the center tile is an inverted
parabola of twice the curvature. Every pixel in the enlarged image
receives contributions from three original pixels; the two end
parabolas plus the middle parabola add up to unity, for every position.

For example, for m=4, the kernel is

0 1 3 6|10 12 12 10|6 3 1 0

and if you add up the three parabolas, you get
{0,1,3,6}
+{10,12,12,10}
+{6,3,1,0}
={16,16,16,16}.

In this case (again ignoring edge problems) you get

A = ( 6a + 10b + 0c ) / 16
B = ( 3a + 12b + 1c ) / 16
C = ( 1a + 12b + 3c ) / 16
D = ( 0a + 10b + 6c ) / 16

E = ( 6b + 10c + 0d ) / 16
F = ( 3b + 12c + 1d ) / 16
G = ( 1b + 12c + 3d ) / 16
H = ( 0b + 10c + 6d ) / 16

etc.

To downsample, you reverse the process, so that

c = ( 0A + 1B + 3C + 6D + 10E + 12F + 12G + 10H + 6I + 3J + 1K +0L ) /
64

etc.

Does this make more sense now?

Apologies again for being unclear,
John

jim

unread,
Jun 8, 2006, 9:08:39 AM6/8/06
to

Hello John,
I am pretty sure you are not the first to use this kernel, altho many
people avoid even lehgth kernels so they would never use it.

In an ideal world to downsample by 2 you would first filter by a
perfect halfband filter and then discard every other sample. It appears
that you have settled on a filter with 1D length 4 and comprised of
integer coefficients. Given those 2 constraints I would say there is a
reasonable argument that the coefficients you have selected are as close
to the perfect halfband filter as you can possibly get. They may even be
as close as you can get even without the constraint that the
coefficients be integers.

If you were to allow larger kernels then length 4 I think you could do
better. Then a 1D kernel of [-1 0 3 4 3 0 -1]/4 would be considerably
closer to the perfect halfband response then the one you have chosen.
Notice that only has one additional coefficient to multiply and the
normalizing factor is still a power of 2. Of course some folks will not
like the effects created by negative coefficients, but you can't please
everyone.

By the way, I don't fully understand the explanation of the upsampling
technique. Is it equivalent to inserting a zero at every other sample
and then filtering with your magic kernel or are you doing something
completely different?

-jim

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

aruzinsky

unread,
Jun 8, 2006, 10:33:52 AM6/8/06
to
"Take the original 1-d 'image' to be

a b c d e ...

and the enlarged image to be

A B C D E F G ...

Ignore edge effects, and put b half-way between A and B, c half-way
between C and D, d half-way between D and E, etc. Then

A = ( a + 3b ) / 4
B = ( 3b + c ) / 4
C = ( b + 3c ) / 4
D = ( 3c + d ) / 4
E = ( c + 3d ) / 4
F = ( 3d + e ) / 4
G = ( d + 3e ) / 4

etc. You can see, for example, that 'c' contributes {1,3,3,1} / 4 to
{B,C,D,E}; 'd' contributes {1,3,3,1} / 4 to {D,E,F,G}; and so on.

To downsample from the A B C ... back to the a b c ... you just apply
this in reverse:

c = ( B + 3C + 3D + E ) / 8
d = ( D + 3E + 3F + G ) / 8"

You should have stopped here. Except for a scalar multiplier and border
effects, your enlargement and reductions operations are matrix transposes
of each other.

Let x = [A B C D E F G ...]T, y = [a b c d e ...]T and y = Ax where

A = (1/8)*

1 3 3 1
0 0 1 3 3 1
0 0 0 0 1 3 3 1
....

AT = (1/8)*

1 0 0 .
3 0 0 .
3 1 0 .
1 3 0 .
0 3 1 .
0 1 3 .
0 0 3 .
0 0 1 .

(leading zeros omitted)

Your description of enlargement with the same "kernel", 0.125*[1 3 3 1],
through me off because in the case of your enlargement, the
quasi-convolution kernels are 0.25*[1 3] and 0.25*[3 1]. Also,
conceptualizing AT as a "reversal" of A seems inappropriate. A more
appropriate "reversal" would be a regularized pseudoinverse, (ATA +
C)-1AT, where C incorporates regularization constraints.

As I previously mentioned to you, I have observed that when discrete
reduction kernels are derived from expanded versions of the same
continuous kernel from which an enlargement kernel is derived, the two
operations are represented by the matrix transpose. This suggests to me
that your method has an underlying continuous kernel. Once this
continuous kernel is found, you can use it to derive kernels for arbitrary
scales. I suggest you look at the kernels embedded in A^k to see if the
points all fall on the same continuous function.

aruzinsky

unread,
Jun 8, 2006, 11:11:04 AM6/8/06
to
"I suggest you look at the kernels embedded in A^k to see if the points all
fall on the same continuous function."

Since A is not conformable with itself, I meant A1*A2*...*Ak. Also, I add
that it is not necessary to completely characterize the underlying
continuous function. Once enough discrete points are determined, you can
interpolate between these points.

|||new...@nezumi.demon.co.uk

unread,
Jun 8, 2006, 1:15:41 PM6/8/06
to

John Costella wrote:
> aruzinsky wrote:
> > Again, you lost me here. Please, stop belaboring everything else and
> > explain only the above. Since your kernel is separable, you should be able
> > to give a 1D example to explain your enlargement process.
>
> Good idea. Sorry.
>
> Take the original 1-d 'image' to be
>
> a b c d e ...
>
> and the enlarged image to be
>
> A B C D E F G ...
>
> Ignore edge effects, and put b half-way between A and B, c half-way
> between C and D, d half-way between D and E, etc. Then
>
> A = ( a + 3b ) / 4
> B = ( 3b + c ) / 4
> C = ( b + 3c ) / 4
> D = ( 3c + d ) / 4
> E = ( c + 3d ) / 4
> F = ( 3d + e ) / 4
> G = ( d + 3e ) / 4
>
> etc. You can see, for example, that 'c' contributes {1,3,3,1} / 4 to
> {B,C,D,E}; 'd' contributes {1,3,3,1} / 4 to {D,E,F,G}; and so on.

This is equivalent to the kernel 1 3 3 1 convolved with the original
image interpersed with alternate zeroes. Subject to normalisation this
is the result of using a 4 point sinc interpolation evaluated at
-3pi/4, -pi/4, pi/4, 3pi/4 on the original data to interpolate into
pixel values at -pi/4 pi/4 relative to the original pixels.


>
> To downsample from the A B C ... back to the a b c ... you just apply
> this in reverse:
>
> c = ( B + 3C + 3D + E ) / 8
> d = ( D + 3E + 3F + G ) / 8
>
> etc.

ITYM c' = ( -B + 3C +3D - E) /8
viz
c' = ( - ( 3b + c ) + 3 ( b + 3c ) + 3( 3c + d ) - ( c + 3d
)) / 4
c' = ( -3b +3b -c + 9c + 9c -c + 3d -3d)/4

-1 3 3 -1

which is the sinc interpolation kernel for -3pi/2, -pi/2, pi/2, 3pi/2
Reflecting the fact that we have doubled the linear scaling.
(and ignoring normalisation factors)

> Does this make more sense now?

It is a 4 point sinc interpolation and its inverse.

Regards,
Martin Brown

John Costella

unread,
Jun 9, 2006, 7:20:20 AM6/9/06
to
Thanks all -- it is making much more sense to me now. I started off
with filters and sinc functions, didn't do a good job of it, and went
back to something simple, that worked better that I expected it to. The
explanation that it is a truncated sinc makes it make a helluva lot
more sense -- many thanks.

I've also followed up on the excellent suggestion to determine the
continuous version of the discrete case, to figure out what I'm
actually using. It's becoming a lot clearer now, although I'm still
trying to figure out the best way to express it mathematically.

The key seems to be that, in one dimension, for upsampling by a factor
of m = 2^k, each pixel of the original image contributes to three
pixels in the upsampled image, or in other words that the kernel, from
the upsampled image's point of view, is 3m pixels wide. The kernel is
made up of three parabolas, each of width m pixels, which smoothly join
together. The parabolas on either end reduce to zero at the extremes.
The curvature of the center parabola is twice the magnitude, and of
opposite sign, of the other two.

The key net effect of these qualities seems to be the property

k(x-m) + k(x) + k(x+m) = const.

for -m <= x <= m. This is what ensures that the convolution of each
pixel with the kernel "joins together" with all the other pixels, like
jigsaw pieces, without any systematic artifacts.

Put another way, if you place a copy of the kernel every m pixels
apart, and add them together, you get a constant function across all
pixels.

It is not clear to me whether it is possible to do this for an
arbitrary type of function. In fact, it's not even clear to me that it
is easy to do for functions of finite support. In any case, I believe
that this is why the kernel does such a good job of avoiding visually
obvious artifacts.

And, of course, the best thing about the kernel is that it is so
lightning fast to implement, not to mention positive definite. :)

John

Reply all
Reply to author
Forward
0 new messages