'Magic' kernel for image zoom resampling?

1260 views
Skip to first unread message

jpcos...@hotmail.com

unread,
Jun 6, 2006, 8:38:15 AM6/6/06
to
Hi folks,

I've recently generated a solution to an image zooming problem, and
can't find anything quite like it on our good old web or usenet. I'm
trying to figure out if I've stumbled on something that might actually
be useful to other people. :)

My need was to downsample an image by a factor of 8 in each direction
(i.e. 1 downsampled pixel for every 64 original image pixels), and then
upsample the result back to the original resolution. I understand all
the information theoretical aspects of this process; I was trying to
figure out if there was a kernel that did both tasks well (i.e., the
final result was smooth and without stellations or other strange
artifacts), and also did them fast.

After some truly terrible attempts (even though I thought I had sinc
functions well and truly under my belt 20 years ago), I found that the
following recursive algorithm works amazingly well:

1. To downsample or upsample by a factor of 2, use a kernel

1 3 3 1
3 9 9 3
3 9 9 3
1 3 3 1

placed over every second pixel in each direction. (Total normalization
of 64 is relevant for downsampling. For upsampling, only divide by 16,
because every upsampled pixel gets a contribution from four downsampled
pixels' kernels).

2. To downsample or upsample by a factor of 2^k, perform the above
resampling recursively, k times.

The results for upsampling are astoundingly good, really astoundingly
good, and better than what I could get Photoshop to do. (Some years ago
I did my fair share of 'extreme zooming' in some work on the
photographic evidence of the JFK assassination, so I'm very familiar
with all the strange artifacts that can pop out, and the disappointment
that accompanies it.)

I can upload a few images to my website if need be, to demonstrate what
I am talking about (or, equivalently, point me to some test images and
I'll show you what comes out).

For my particular purpose (8:8 downsampling and upsampling), applying
this 'magic' kernel three times yields a kernel that is only 22 x 22 in
size (if you want to precompute the kernel and apply it once, as I
ultimately want to, rather than actually perfoming the doubling process
three times). (Every time you apply it, double the kernel width or
height and add 2, so 2 x 4 + 2 = 10, and 2 x 10 + 2 = 22.) That's
pretty good, considering that, for 800% zooming, it's already 16 pixels
from nearest pixel to nearest pixel, in each dimension.

Of course, if you wanted to resample by something that is not a power
of 2, then you'd need to use the 'magic' kernel for the nearest power
of 2, and then use something more traditional for the final small
adjustment in resolution. That's no major problem, because getting to
the final result from the nearest power of 2 is never worse than
between a 71% and 141% zoom, and just about any resampling method does
a decent job in this range.

My question is: Is this 'magic kernel' something new, or is this trick
known?

The closest I could find on the net is the 'stair interpolation' trick,
which uses Photoshop's bicubic for successive 110% increases, which is
sort of, kind of, the same idea, but not quite. The other resampling
kernels I could find on the net look much more like what I was trying
to do in the first place, but nothing like what I ended up with.

The 'magic' kernel sort of reminds me of the Fast Fourier Transform,
which also gets all sorts of amazing efficiencies with powers of 2, and
then needs a bit of a (non-critical) fudge if you don't quite have a
power of 2.

Oh, and by the way, I know how I arrived at the 'magic' kernel above
(for another aspect of the project that needed just 2:2 downsampling
and upsampling), and it has some nice mathematical properties (it is
'separable' in the sense that it is the product of an 'x' 1-3-3-1 and a
'y' 1-3-3-1, and its normalization is always a power of 2, which means
everything is integer look-ups with bit-shifts, which makes me
extremely happy), but I have no mathematical proof at all of why it
works so damn well when applied to itself recursively.

Anyhow, thanks in advance for any words of advice. And apologies in
advance if I've rediscovered someone's magic kernel, as is bound to be
the case. :)

John Costella

_______________________________________________
Dr. John P. Costella
BE(Elec)(Hons) BSc(Hons) PhD(Physics) GradDipEd
Melbourne, Victoria, Australia
jpcos...@hotmail.com
cost...@bigpond.com
assassinationscience.com/johncostella

aruzinsky

unread,
Jun 6, 2006, 11:58:52 AM6/6/06
to
For some test images with a large number of enlargement results, see
http://www.general-cathexis.com/interpolation.html

"For upsampling, only divide by 16,
because every upsampled pixel gets a contribution from four downsampled
pixels' kernels)."

Huh? Both kernels should be normalized. Also, for enlargement, I expect
that you are interlacing this kernel with identity so all of the original
pixels are retained.

Your 1D separated kernel, 0.125*[1 3 3 1], seems like an approximate
Gaussian and this has been covered in literature, but maybe not for your
particular stand. dev..

Regarding size reductions, the most commonly used COMMERCIAL method is to
use a discrete kernel derived from an expanded version of the same
continuous kernel, e.g., bicubic, used for enlargement. If the size
factor is 1/N, then the continuous kernel is expanded by N. Of course,
the compact support of the discrete reduction kernel is expanded. Your
use of the same kernels for both operations violates this.

I have not seen any theoretical treatment of why this is done, but it is
my personal observation that if a linear enlargement operation is put in
matrix form A, then the reduction is 1/(N*N) * AT (transpose) except near
the image borders. This provides a way to derive discrete reduction
kernels from discrete enlargement kernels and vice versa.

aruzinsky

unread,
Jun 6, 2006, 12:41:47 PM6/6/06
to
To elaborate on, "Your 1D separated kernel, 0.125*[1 3 3 1], seems like an

approximate Gaussian and this has been covered in literature, but maybe
not for your particular stand. dev..", see:

http://www.worldserver.com/turk/computergraphics/ResamplingFilters.pdf

Bart van der Wolf

unread,
Jun 6, 2006, 10:06:40 PM6/6/06
to

<jpcos...@hotmail.com> wrote in message
news:1149597495.8...@y43g2000cwc.googlegroups.com...
SNIP

> Anyhow, thanks in advance for any words of advice. And apologies
> in advance if I've rediscovered someone's magic kernel, as is bound
> to be the case. :)

John, you might want to also post your question to
<news://sci.image.processing>, some knowledgeable folks there and the
tone of voice is reasonable. I've yet to test your kernel on e.g. a
zone-plate type of target, like I used for my informal/empirical
verdict on proper down-sampling:
<http://www.xs4all.nl/~bvdwolf/main/foto/down_sample/down_sample.htm>.
Objectively quantifying the resulting quality is probably beyond my
capabilities, unless you have a suggestion for further research.

Obviously this was intended to inform about down-sampling pitfalls
only, since down-sampling is becoming more of a common process as the
average sensor array increases in size/sampling density.

As an aside, last night I saw a feature on TV following a new approx.
3 hour documentary on the JFK assassination by (apparently) James
Files rather than Lee Harvey Oswald (who was in the documentary
assumed to be an undercover CIA operative), based on the extensive
research of fellow Dutchman Wim Dankbaar, a Dutch researcher
fascinated by the conspiracy, murder, and cover-up of this factual
coup-d'etat. Lot's of new (to me anyway) details were uncovered,
including an interview with James Files who revealed a lot of
verifiable specifics about how he as a backup Mob hitman fired the
lethal head shot with a mercury filled bullet with a "Remington Arms
Fireball XP-100" from behind the fence at the grassy knoll
(http://www.jfkmurdersolved.com/filestruth.htm).

Bart

Kevin McMurtrie

unread,
Jun 7, 2006, 12:02:07 AM6/7/06
to
Factor of two resampling is trivial. It's the other ratios that tend to
produce beats where pixels go in and out of phase with the original.
Various algorithms attempt to reduce the beats without blurring the
image too much. One trick is to find image edges and nudge them into
alignment with the destination pixels, but that only creates a different
kind of distortion.

jpcos...@hotmail.com

unread,
Jun 7, 2006, 7:33:05 AM6/7/06
to
aruzinsky wrote:
> For some test images with a large number of enlargement results, see
> http://www.general-cathexis.com/interpolation.html

Thanks for the link -- an interesting comparison. However, I don't
agree with the first step applied in the process, which was to reduce
the original image using a box filter. That's not valid, in my opinion.

I tried to download the original, undownsampled image, but the link did
not work. Do you know where I could get it?

I must emphasize that my goal was to get a smooth upsampling free of
artifacts, not to attempt to give the upsampled image an appearance of
more 'detail' than information theory tells us that it really has. Some
of the other algorithms have other goals/

> "For upsampling, only divide by 16,
> because every upsampled pixel gets a contribution from four downsampled
> pixels' kernels)."
>
> Huh? Both kernels should be normalized. Also, for enlargement, I expect
> that you are interlacing this kernel with identity so all of the original
> pixels are retained.

Both are normalized, but I was probably too obscure in my description.
And no, I am not interlacing this kernel with identity for enlargement.


I'd better explain it more clearly. Start with downsampling. Label the
pixels of the original image (x,y). For a box filter, the top-left
downsampled pixel averages (0,0), (1,0), (0,1) and (1,1), and the
result is notionally centered at (0.5,0.5). I am saying that, instead
of this, compute it as

(-1,-1) + 3(0,-1) + 3(1,-1) + (2,-1)
+ 3(-1, 0) + 9(0, 0) + 9(1, 0) + 3(2, 0)
+ 3(-1, 1) + 9(0, 1) + 9(1, 1) + 3(2, 1)
+ (-1, 2) + 3(0, 2) + 3(1, 2) + (2, 2)

and divide the result by 64. OK, ignore the fact that we have gone one
pixel outside the image; that's an edge effect that we can deal with.
You can see the original 2 x 2 'tile' is in the middle of this kernel.

Now, for the next downsampled value to the right, shift everything
right by two pixels. The box filter would just average out (2,0),
(3,0), (2,1) and (3,1), and the result is notionally centered at
(2.5,0.5). I'm saying to compute

(1,-1) + 3(2,-1) + 3(3,-1) + (4,-1)
+ 3(1, 0) + 9(2, 0) + 9(3, 0) + 3(4, 0)
+ 3(1, 1) + 9(2, 1) + 9(3, 1) + 3(4, 1)
+ (1, 2) + 3(2, 2) + 3(3, 2) + (4, 2)

and again divide the result by 64. Again you can see the original 2 x 2
tile in the middle of the calculation. And so on.

For upsampling, it's simplest to imagine that we're reversing the above
process. Label the downsampled pixels [x,y], where x and y are the
'notional' centered positions listed above, i.e., we calculated
[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place
a copy of the kernel, multplied by the weight of the pixel, and divide
by 16. If you work it through, you'll see that each upsampled pixel
receives a contribution of 9 times the nearest downsampled pixel (the
one whose corner it's 'in'), 3 times the next two nearest ones (other
adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.
That's a total of 16, so you divide by 16. For example,

(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16

Again, we have edge effects in that we've gone outside the image, but
we can deal with that.

> Your 1D separated kernel, 0.125*[1 3 3 1], seems like an approximate
> Gaussian and this has been covered in literature, but maybe not for your
> particular stand. dev..

Well, yeah, [1 3 3 1] is an approximation to just about anything if you
try hard enough. :)

You can make the two terms (0.5,3) and (1.5,1) fit a Gaussian by making
its standard deviation 1 / sqrt( ln( 3 ) ) = 0.954.... The next two
terms would then be 1/3 and 1/81, so what you're telling me is that
it's a 'truncated Gaussian'? I suppose so, but that doesn't really tell
me why it works so well in two dimensions. (More on this below.)

I looked at the site you recommended in your next reply. I know all
about the information theoretical aspects of 1-dimensional resampling,
and have played with those things in many different forms for many
years (and continue to, in other areas). The problems come with a
two-dimensional grid, as far as I can tell.

None of the Gaussians listed on that site have coefficients that match
the 'magic' kernel. That's not to say that this isn't a fruitful way to
look at it, but at this stage I'm not quite sure.

For interest, I've uploaded a simple graph of the 800% kernel to

http://www.assassinationscience.com/johncostella/kernel800percent.jpg

Looking at the numbers, the kernel is still the product of two 1-d 'x'
and 'y' kernels. The 1-d kernel is actually three parabolas:

[0] 1 3 6 10 15 21 28 [36] 42 46 48 48 46 42 [36] 28 21 15 10 6 3 1 [0]

where the joins between the parabolas are marked by the [36] terms.
That it is three parabolas can be proved from second differences:

[0] 1 2 3 4 5 6 7 [8] 6 4 2 0 -2 -4 -6 [-8] -7 -6 -5 -4 -3 -2 -1 [0]
1 1 1 1 1 1 1 1 | -2 -2 -2 -2 -2 -2 -2 -2 | 1 1 1 1 1 1 1 1

The 400% kernel follows the same pattern: it is the product of

[0] 1 3 6 [10] 12 12 [10] 6 3 1 [0]

the differences of which give

[0] 1 2 3 [4] 2 0 -2 [-4] -3 -2 -1 [0]
1 1 1 1 | -2 -2 -2 -2 | 1 1 1 1

I guess then the 200% one can be written

[0] 1 [3] [3] 1 [0]

which on differencing gives

[0] 1 [2] 0 [-2] -1 [0]
1 1 | -2 -2 | 1 1

And I suppose that's the algorithm for generating the 2^k kernel:

A. Form a list of k 1's followed by k -2's followed by k 1's.
B. Starting at 0, form the cumulative sum of the first list.
C. Starting at 0, form the cumulative sum of the second list.
D. Use the 3rd list as the 1-d kernel.
E. Form the 2-d kernel as the product of the two 1-d 'x' and 'y'
kernels.

So for 1600% magnification, the steps would be:

A. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
B. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 14 12 10 8 6 4 2 0 -2 -4 -6
-8 -10 -12 -14 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
C. 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136 150 162 172 180 186
190 192 192 190 186 180 172 162 150 136 120 105 91 78 66 55 45 36 28 21
15 10 6 3 1

and so on.

This makes for a pretty efficient implementation. The kernel dimensions
are 3(2^k)-2 by 3(2^k)-2 (i.e. 4 by 4 for 2^1, 10 by 10 for 2^2, 22 by
22 for 2^3, 46 by 46 for 2^4, etc.). The 1-d kernel has 3(2^(k-1))-1
unique elements (i.e. 2 for 2^1, 5 for 2^2, 11 for 2^3, 23 for 2^4),
one of which is always 1, so you only need 3(2^(k-1))-2 integer
multiplication lookup tables (e.g. for 2^k=16, you only need 22 lookup
tables), because at any position multiplying by the kernel coefficient
is the same as multiplying by the 'x' 1-d kernel element and then the
'y' 1-d kernel element. The above algorithm gives you them all. The
normalization is 2^(6k).

> Regarding size reductions, the most commonly used COMMERCIAL method is to
> use a discrete kernel derived from an expanded version of the same
> continuous kernel, e.g., bicubic, used for enlargement. If the size
> factor is 1/N, then the continuous kernel is expanded by N. Of course,
> the compact support of the discrete reduction kernel is expanded. Your
> use of the same kernels for both operations violates this.

Yeah, I know I'm not doing the normal thing, but it seems to work. :)

John

jpcos...@hotmail.com

unread,
Jun 7, 2006, 8:00:31 AM6/7/06
to
Bart van der Wolf wrote:
[...]

> I've yet to test your kernel on e.g. a
> zone-plate type of target, like I used for my informal/empirical
> verdict on proper down-sampling:
> <http://www.xs4all.nl/~bvdwolf/main/foto/down_sample/down_sample.htm>.

Bart,

That's an interesting test you have applied, and you're right in that
it highlights obvious pitfalls. However, distinguishing the 'best' ones
is tricky. I note that your original GIF image has faint Moire rings
away from the center itself, as can be seen if you zoom it and look
carefully. I guess what you're really testing is 'relative badness'.
(And I guess the way that you generated your 'antialiased' original
image preassumes that you know how to do this in the first place!)

It's also tricky looking at your site, because the results are
presented at actual size, and my LCD screen is producing Moire rings on
some of the ones you claim to be the 'best'.

But in any case, I used the 'magic' kernel on the original image, to
downsample it to 25%. I've posted the result to

http://www.assassinationscience.com/johncostella/rings1_on_4.gif

To me, it looks comparable to the best results you obtained. Given that
it's remarkably simple to implement (see my additional posting today,
above, outlining a very simple algorithm for generating an arbitrary
2^k kernel), I'm happy with the result. However, again, I'm a little
reserved about your original image, and also about whether a zone plate
is necessarily the best test of downsampling, given the difficulties in
generating it in the first place.

John

jpcos...@hotmail.com

unread,
Jun 7, 2006, 8:09:05 AM6/7/06
to

Kevin,

You may be right about a factor of 2, but what about a factor of 2^k? I
know that when I ask Photoshop to zoom something 800%, it doesn't do a
very good job. I am really looking at the case of high zoom (like 800%
or 1600% or 25600%). If you are telling me that 2^k is trivial, then I
do not agree.

I'm not really touching on the issue of resampling by a factor between
70% and 141%. You're right that frequencies in the original image will
beat in a resampling, but there is nothing much you can do about that
(unless, as you say, you distort the original). I'm taking it as
granted that resampling in this range has its own issues, but basically
gives acceptable results except in singular cases. It's the poor
performance for large factors that I'm targeting.

John

Jani Huhtanen

unread,
Jun 7, 2006, 9:02:28 AM6/7/06
to
jpcos...@hotmail.com wrote:

>
> Kevin McMurtrie wrote:
>> Factor of two resampling is trivial. It's the other ratios that tend to
>> produce beats where pixels go in and out of phase with the original.
>> Various algorithms attempt to reduce the beats without blurring the
>> image too much. One trick is to find image edges and nudge them into
>> alignment with the destination pixels, but that only creates a different
>> kind of distortion.
>
> Kevin,
>
> You may be right about a factor of 2, but what about a factor of 2^k?

2^k is as trivial as 2. Actually any ratio (n/m)^k where n and m are small
integers are quite trivial. Just look for polyphase filtering.

> John

--
Jani Huhtanen
Tampere University of Technology, Pori

aruzinsky

unread,
Jun 7, 2006, 11:04:58 AM6/7/06
to
"Thanks for the link -- an interesting comparison. However, I don't
agree with the first step applied in the process, which was to reduce
the original image using a box filter. That's not valid, in my opinion."

There are two reduced images, clown.png and PDI-Target574x900.PNG. Only
PDI-Target574x900.PNG was produced with a box kernel. A 3rd order Lanczos
kernel was used to produce clown.png and this is the image I would like to
see your enlargement process applied to.

"I tried to download the original, undownsampled image, but the link did
not work. Do you know where I could get it?"

Sorry, about that. I must have accidentally deleted it and I have
problems uploading > 1 mb, so it will be awhile before I can replace it.
Meanwhile, if you google "pdi target," you will find sources all over the
internet, e.g., http://www.pbase.com/npb/tools. Supposedly there is a
TIFF version, but I can't find it.

"For upsampling, it's simplest to imagine that we're reversing the above

process. Label the downsampled pixels [x,y], where x and y are the
'notional' centered positions listed above, i.e., we calculated
[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place
a copy of the kernel, multplied by the weight of the pixel, and divide
by 16. If you work it through, you'll see that each upsampled pixel
receives a contribution of 9 times the nearest downsampled pixel (the
one whose corner it's 'in'), 3 times the next two nearest ones (other
adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.
That's a total of 16, so you divide by 16. For example,

(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16"

Again, you lost me here. Please, stop belaboring everything else and
explain only the above. Since your kernel is separable, you should be
able
to give a simple 1D example to explain your enlargement process.


aruzinsky

unread,
Jun 7, 2006, 12:21:48 PM6/7/06
to
"It's the poor
performance for large factors that I'm targeting."

In repetitive enlargements, the accuracies of earliest are the most
important because the enlargements become progressively smoother (with
less high frequencies). If the first enlargement is 2X, it matters very
little what methods are used next. Practically, you can use your method
for 2X followed by bicubic or even bilinear to arbitrary size. The final
result will be practically the same as that produced by your method alone.

jpcos...@hotmail.com

unread,
Jun 8, 2006, 7:26:34 AM6/8/06
to
aruzinsky wrote:
> There are two reduced images, clown.png and PDI-Target574x900.PNG. Only
> PDI-Target574x900.PNG was produced with a box kernel. A 3rd order Lanczos
> kernel was used to produce clown.png and this is the image I would like to
> see your enlargement process applied to.

Sorry, I misunderstood the text on the page. I'd still be interested to
have the original clown image, so that I can assess both the
downsampling and upsampling aspects. Nevertheless, I have run the
enlarger on the clown.png on your page.

At the moment I only have a 24-bit implementation of the image doubler,
and I have to apply it twice to get 4X magnification. This means that
the dynamic range won't be as good as if it was done in one step in
48-bit. Nevertheless, here is what comes out:

http://www.assassinationscience.com/johncostella/clown_by_4.png

For the test implementation, I avoided the edge problems by trimming
off the pixels that would require extension of the original image. This
will of course be fixed when I actually use it for my original purpose.

> Again, you lost me here. Please, stop belaboring everything else and
> explain only the above. Since your kernel is separable, you should be
> able
> to give a simple 1D example to explain your enlargement process.

Sorry, I saw this question of yours on the re-post on
sci.image.processing, and answered it there. Hopefully it is now much
clearer.

Apologies for my confusion and unclarity,
John

jpcos...@hotmail.com

unread,
Jun 8, 2006, 7:34:21 AM6/8/06
to
aruzinsky wrote:
> In repetitive enlargements, the accuracies of earliest are the most
> important because the enlargements become progressively smoother (with
> less high frequencies). If the first enlargement is 2X, it matters very
> little what methods are used next. Practically, you can use your method
> for 2X followed by bicubic or even bilinear to arbitrary size. The final
> result will be practically the same as that produced by your method alone.

Not true. I used Photoshop to do bicubic enlargement by 400% on an
image that had been enlarged 200% using the 'magic' kernel. Whilst it
is better than 800% bicubic of the original, it still has visible
artifacts that are not present on the enlargement obtained from using
the 'magic' kernel three times.

If you doubt this, I can post some of the images in question.

John

jpcos...@hotmail.com

unread,
Jun 8, 2006, 7:42:33 AM6/8/06
to
Jani Huhtanen wrote:
> 2^k is as trivial as 2. Actually any ratio (n/m)^k where n and m are small
> integers are quite trivial. Just look for polyphase filtering.

Jani, I looked up polyphase filtering, but it's not jumping out at me.
Could you provide a simple example, say for 2 or 4?

Thanks
John

Reply all
Reply to author
Forward
0 new messages