792 views

Skip to first unread message

Jun 7, 2006, 5:37:04 AM6/7/06

to

Hi folks,

I've recently generated a solution to an image zooming problem, and

can't find anything quite like it on our good old web or usenet. I'm

trying to figure out if I've stumbled on something that might actually

be useful to other people. :)

My need was to downsample an image by a factor of 8 in each direction

(i.e. 1 downsampled pixel for every 64 original image pixels), and then

upsample the result back to the original resolution. I understand all

the information theoretical aspects of this process; I was trying to

figure out if there was a kernel that did both tasks well (i.e., the

final result was smooth and without stellations or other strange

artifacts), and also did them fast.

After some truly terrible attempts (even though I thought I had sinc

functions well and truly under my belt 20 years ago), I found that the

following recursive algorithm works amazingly well:

1. To downsample or upsample by a factor of 2, use a kernel

1 3 3 1

3 9 9 3

3 9 9 3

1 3 3 1

placed over every second pixel in each direction. (Total normalization

of 64 is relevant for downsampling. For upsampling, only divide by 16,

because every upsampled pixel gets a contribution from four downsampled

pixels' kernels).

2. To downsample or upsample by a factor of 2^k, perform the above

resampling recursively, k times.

The results for upsampling are astoundingly good, really astoundingly

good, and better than what I could get Photoshop to do. (Some years ago

I did my fair share of 'extreme zooming' in some work on the

photographic evidence of the JFK assassination, so I'm very familiar

with all the strange artifacts that can pop out, and the disappointment

that accompanies it.)

I can upload a few images to my website if need be, to demonstrate what

I am talking about (or, equivalently, point me to some test images and

I'll show you what comes out).

For my particular purpose (8:8 downsampling and upsampling), applying

this 'magic' kernel three times yields a kernel that is only 22 x 22 in

size (if you want to precompute the kernel and apply it once, as I

ultimately want to, rather than actually perfoming the doubling process

three times). (Every time you apply it, double the kernel width or

height and add 2, so 2 x 4 + 2 = 10, and 2 x 10 + 2 = 22.) That's

pretty good, considering that, for 800% zooming, it's already 16 pixels

from nearest pixel to nearest pixel, in each dimension.

Of course, if you wanted to resample by something that is not a power

of 2, then you'd need to use the 'magic' kernel for the nearest power

of 2, and then use something more traditional for the final small

adjustment in resolution. That's no major problem, because getting to

the final result from the nearest power of 2 is never worse than

between a 71% and 141% zoom, and just about any resampling method does

a decent job in this range.

My question is: Is this 'magic kernel' something new, or is this trick

known?

The closest I could find on the net is the 'stair interpolation' trick,

which uses Photoshop's bicubic for successive 110% increases, which is

sort of, kind of, the same idea, but not quite. The other resampling

kernels I could find on the net look much more like what I was trying

to do in the first place, but nothing like what I ended up with.

The 'magic' kernel sort of reminds me of the Fast Fourier Transform,

which also gets all sorts of amazing efficiencies with powers of 2, and

then needs a bit of a (non-critical) fudge if you don't quite have a

power of 2.

Oh, and by the way, I know how I arrived at the 'magic' kernel above

(for another aspect of the project that needed just 2:2 downsampling

and upsampling), and it has some nice mathematical properties (it is

'separable' in the sense that it is the product of an 'x' 1-3-3-1 and a

'y' 1-3-3-1, and its normalization is always a power of 2, which means

everything is integer look-ups with bit-shifts, which makes me

extremely happy), but I have no mathematical proof at all of why it

works so damn well when applied to itself recursively.

Anyhow, thanks in advance for any words of advice. And apologies in

advance if I've rediscovered someone's magic kernel, as is bound to be

the case. :)

John Costella

_______________________________________________

Dr. John P. Costella

BE(Elec)(Hons) BSc(Hons) PhD(Physics) GradDipEd

Melbourne, Victoria, Australia

jpcos...@hotmail.com

cost...@bigpond.com

assassinationscience.com/johncostella

Jun 7, 2006, 8:14:41 AM6/7/06

to

Sorry, it was recommended that I cross-post here, but that means that

replies in the original group do not appear here.

replies in the original group do not appear here.

Here is some clarifying information on the algorithm:

----------------------

> "For upsampling, only divide by 16,

> because every upsampled pixel gets a contribution from four downsampled

> pixels' kernels)."

> Huh? Both kernels should be normalized. Also, for enlargement, I expect

> that you are interlacing this kernel with identity so all of the original

> pixels are retained.

Both are normalized, but I was probably too obscure in my description.

And no, I am not interlacing this kernel with identity for enlargement.

I'd better explain it more clearly. Start with downsampling. Label the

pixels of the original image (x,y). For a box filter, the top-left

downsampled pixel averages (0,0), (1,0), (0,1) and (1,1), and the

result is notionally centered at (0.5,0.5). I am saying that, instead

of this, compute it as

(-1,-1) + 3(0,-1) + 3(1,-1) + (2,-1)

+ 3(-1, 0) + 9(0, 0) + 9(1, 0) + 3(2, 0)

+ 3(-1, 1) + 9(0, 1) + 9(1, 1) + 3(2, 1)

+ (-1, 2) + 3(0, 2) + 3(1, 2) + (2, 2)

and divide the result by 64. OK, ignore the fact that we have gone one

pixel outside the image; that's an edge effect that we can deal with.

You can see the original 2 x 2 'tile' is in the middle of this kernel.

Now, for the next downsampled value to the right, shift everything

right by two pixels. The box filter would just average out (2,0),

(3,0), (2,1) and (3,1), and the result is notionally centered at

(2.5,0.5). I'm saying to compute

(1,-1) + 3(2,-1) + 3(3,-1) + (4,-1)

+ 3(1, 0) + 9(2, 0) + 9(3, 0) + 3(4, 0)

+ 3(1, 1) + 9(2, 1) + 9(3, 1) + 3(4, 1)

+ (1, 2) + 3(2, 2) + 3(3, 2) + (4, 2)

and again divide the result by 64. Again you can see the original 2 x 2

tile in the middle of the calculation. And so on.

For upsampling, it's simplest to imagine that we're reversing the above

process. Label the downsampled pixels [x,y], where x and y are the

'notional' centered positions listed above, i.e., we calculated

[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place

a copy of the kernel, multplied by the weight of the pixel, and divide

by 16. If you work it through, you'll see that each upsampled pixel

receives a contribution of 9 times the nearest downsampled pixel (the

one whose corner it's 'in'), 3 times the next two nearest ones (other

adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.

That's a total of 16, so you divide by 16. For example,

(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16

Again, we have edge effects in that we've gone outside the image, but

we can deal with that.

> Your 1D separated kernel, 0.125*[1 3 3 1], seems like an approximate

> Gaussian and this has been covered in literature, but maybe not for your

> particular stand. dev..

Well, yeah, [1 3 3 1] is an approximation to just about anything if you

try hard enough. :)

You can make the two terms (0.5,3) and (1.5,1) fit a Gaussian by making

its standard deviation 1 / sqrt( ln( 3 ) ) = 0.954.... The next two

terms would then be 1/3 and 1/81, so what you're telling me is that

it's a 'truncated Gaussian'? I suppose so, but that doesn't really tell

me why it works so well in two dimensions. (More on this below.)

I looked at the site you recommended in your next reply. I know all

about the information theoretical aspects of 1-dimensional resampling,

and have played with those things in many different forms for many

years (and continue to, in other areas). The problems come with a

two-dimensional grid, as far as I can tell.

None of the Gaussians listed on that site have coefficients that match

the 'magic' kernel. That's not to say that this isn't a fruitful way to

look at it, but at this stage I'm not quite sure.

For interest, I've uploaded a simple graph of the 800% kernel to

http://www.assassinationscience.com/johncostella/kernel800percent.jpg

Looking at the numbers, the kernel is still the product of two 1-d 'x'

and 'y' kernels. The 1-d kernel is actually three parabolas:

[0] 1 3 6 10 15 21 28 [36] 42 46 48 48 46 42 [36] 28 21 15 10 6 3 1 [0]

where the joins between the parabolas are marked by the [36] terms.

That it is three parabolas can be proved from second differences:

[0] 1 2 3 4 5 6 7 [8] 6 4 2 0 -2 -4 -6 [-8] -7 -6 -5 -4 -3 -2 -1 [0]

1 1 1 1 1 1 1 1 | -2 -2 -2 -2 -2 -2 -2 -2 | 1 1 1 1 1 1 1 1

The 400% kernel follows the same pattern: it is the product of

[0] 1 3 6 [10] 12 12 [10] 6 3 1 [0]

the differences of which give

[0] 1 2 3 [4] 2 0 -2 [-4] -3 -2 -1 [0]

1 1 1 1 | -2 -2 -2 -2 | 1 1 1 1

I guess then the 200% one can be written

[0] 1 [3] [3] 1 [0]

which on differencing gives

[0] 1 [2] 0 [-2] -1 [0]

1 1 | -2 -2 | 1 1

And I suppose that's the algorithm for generating the 2^k kernel:

A. Form a list of k 1's followed by k -2's followed by k 1's.

B. Starting at 0, form the cumulative sum of the first list.

C. Starting at 0, form the cumulative sum of the second list.

D. Use the 3rd list as the 1-d kernel.

E. Form the 2-d kernel as the product of the two 1-d 'x' and 'y'

kernels.

So for 1600% magnification, the steps would be:

A. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2

-2 -2 -2 -2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

B. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 14 12 10 8 6 4 2 0 -2 -4 -6

-8 -10 -12 -14 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1

C. 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136 150 162 172 180 186

190 192 192 190 186 180 172 162 150 136 120 105 91 78 66 55 45 36 28 21

15 10 6 3 1

and so on.

This makes for a pretty efficient implementation. The kernel dimensions

are 3(2^k)-2 by 3(2^k)-2 (i.e. 4 by 4 for 2^1, 10 by 10 for 2^2, 22 by

22 for 2^3, 46 by 46 for 2^4, etc.). The 1-d kernel has 3(2^(k-1))-1

unique elements (i.e. 2 for 2^1, 5 for 2^2, 11 for 2^3, 23 for 2^4),

one of which is always 1, so you only need 3(2^(k-1))-2 integer

multiplication lookup tables (e.g. for 2^k=16, you only need 22 lookup

tables), because at any position multiplying by the kernel coefficient

is the same as multiplying by the 'x' 1-d kernel element and then the

'y' 1-d kernel element. The above algorithm gives you them all. The

normalization is 2^(6k).

> Regarding size reductions, the most commonly used COMMERCIAL method is to

> use a discrete kernel derived from an expanded version of the same

> continuous kernel, e.g., bicubic, used for enlargement. If the size

> factor is 1/N, then the continuous kernel is expanded by N. Of course,

> the compact support of the discrete reduction kernel is expanded. Your

> use of the same kernels for both operations violates this.

Yeah, I know I'm not doing the normal thing, but it seems to work. :)

John

Jun 7, 2006, 10:07:51 AM6/7/06

to

"For upsampling, it's simplest to imagine that we're reversing the above

process. Label the downsampled pixels [x,y], where x and y are the

'notional' centered positions listed above, i.e., we calculated

[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place

a copy of the kernel, multplied by the weight of the pixel, and divide

by 16. If you work it through, you'll see that each upsampled pixel

receives a contribution of 9 times the nearest downsampled pixel (the

one whose corner it's 'in'), 3 times the next two nearest ones (other

adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.

That's a total of 16, so you divide by 16. For example,

(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16"

Again, you lost me here. Please, stop belaboring everything else and

explain only the above. Since your kernel is separable, you should be able

to give a 1D example to explain your enlargement process.

Jun 8, 2006, 7:08:03 AM6/8/06

to

aruzinsky wrote:

> Again, you lost me here. Please, stop belaboring everything else and

> explain only the above. Since your kernel is separable, you should be able

> to give a 1D example to explain your enlargement process.

> Again, you lost me here. Please, stop belaboring everything else and

> explain only the above. Since your kernel is separable, you should be able

> to give a 1D example to explain your enlargement process.

Good idea. Sorry.

Take the original 1-d 'image' to be

a b c d e ...

and the enlarged image to be

A B C D E F G ...

Ignore edge effects, and put b half-way between A and B, c half-way

between C and D, d half-way between D and E, etc. Then

A = ( a + 3b ) / 4

B = ( 3b + c ) / 4

C = ( b + 3c ) / 4

D = ( 3c + d ) / 4

E = ( c + 3d ) / 4

F = ( 3d + e ) / 4

G = ( d + 3e ) / 4

etc. You can see, for example, that 'c' contributes {1,3,3,1} / 4 to

{B,C,D,E}; 'd' contributes {1,3,3,1} / 4 to {D,E,F,G}; and so on.

To downsample from the A B C ... back to the a b c ... you just apply

this in reverse:

c = ( B + 3C + 3D + E ) / 8

d = ( D + 3E + 3F + G ) / 8

etc.

In the general case of m=2^k, where the block filter would just choose

the nearest tile of m pixels, the algorithm actually chooses the center

tile of m pixels, plus the tile of m pixels to the left, plus the tile

of m pixels to the right. The left and right tiles grow parabolically

from 0 at the edges (that's why the kernel is of width 3m-2 rather than

3m, because the ends always vanish), and the center tile is an inverted

parabola of twice the curvature. Every pixel in the enlarged image

receives contributions from three original pixels; the two end

parabolas plus the middle parabola add up to unity, for every position.

For example, for m=4, the kernel is

0 1 3 6|10 12 12 10|6 3 1 0

and if you add up the three parabolas, you get

{0,1,3,6}

+{10,12,12,10}

+{6,3,1,0}

={16,16,16,16}.

In this case (again ignoring edge problems) you get

A = ( 6a + 10b + 0c ) / 16

B = ( 3a + 12b + 1c ) / 16

C = ( 1a + 12b + 3c ) / 16

D = ( 0a + 10b + 6c ) / 16

E = ( 6b + 10c + 0d ) / 16

F = ( 3b + 12c + 1d ) / 16

G = ( 1b + 12c + 3d ) / 16

H = ( 0b + 10c + 6d ) / 16

etc.

To downsample, you reverse the process, so that

c = ( 0A + 1B + 3C + 6D + 10E + 12F + 12G + 10H + 6I + 3J + 1K +0L ) /

64

etc.

Does this make more sense now?

Apologies again for being unclear,

John

Jun 8, 2006, 9:08:39 AM6/8/06

to

Hello John,

I am pretty sure you are not the first to use this kernel, altho many

people avoid even lehgth kernels so they would never use it.

In an ideal world to downsample by 2 you would first filter by a

perfect halfband filter and then discard every other sample. It appears

that you have settled on a filter with 1D length 4 and comprised of

integer coefficients. Given those 2 constraints I would say there is a

reasonable argument that the coefficients you have selected are as close

to the perfect halfband filter as you can possibly get. They may even be

as close as you can get even without the constraint that the

coefficients be integers.

If you were to allow larger kernels then length 4 I think you could do

better. Then a 1D kernel of [-1 0 3 4 3 0 -1]/4 would be considerably

closer to the perfect halfband response then the one you have chosen.

Notice that only has one additional coefficient to multiply and the

normalizing factor is still a power of 2. Of course some folks will not

like the effects created by negative coefficients, but you can't please

everyone.

By the way, I don't fully understand the explanation of the upsampling

technique. Is it equivalent to inserting a zero at every other sample

and then filtering with your magic kernel or are you doing something

completely different?

-jim

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----

http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups

----= East and West-Coast Server Farms - Total Privacy via Encryption =----

Jun 8, 2006, 10:33:52 AM6/8/06

to

"Take the original 1-d 'image' to be

a b c d e ...

and the enlarged image to be

A B C D E F G ...

Ignore edge effects, and put b half-way between A and B, c half-way

between C and D, d half-way between D and E, etc. Then

A = ( a + 3b ) / 4

B = ( 3b + c ) / 4

C = ( b + 3c ) / 4

D = ( 3c + d ) / 4

E = ( c + 3d ) / 4

F = ( 3d + e ) / 4

G = ( d + 3e ) / 4

etc. You can see, for example, that 'c' contributes {1,3,3,1} / 4 to

{B,C,D,E}; 'd' contributes {1,3,3,1} / 4 to {D,E,F,G}; and so on.

To downsample from the A B C ... back to the a b c ... you just apply

this in reverse:

c = ( B + 3C + 3D + E ) / 8

d = ( D + 3E + 3F + G ) / 8"

You should have stopped here. Except for a scalar multiplier and border

effects, your enlargement and reductions operations are matrix transposes

of each other.

Let x = [A B C D E F G ...]T, y = [a b c d e ...]T and y = Ax where

A = (1/8)*

1 3 3 1

0 0 1 3 3 1

0 0 0 0 1 3 3 1

....

AT = (1/8)*

1 0 0 .

3 0 0 .

3 1 0 .

1 3 0 .

0 3 1 .

0 1 3 .

0 0 3 .

0 0 1 .

(leading zeros omitted)

Your description of enlargement with the same "kernel", 0.125*[1 3 3 1],

through me off because in the case of your enlargement, the

quasi-convolution kernels are 0.25*[1 3] and 0.25*[3 1]. Also,

conceptualizing AT as a "reversal" of A seems inappropriate. A more

appropriate "reversal" would be a regularized pseudoinverse, (ATA +

C)-1AT, where C incorporates regularization constraints.

As I previously mentioned to you, I have observed that when discrete

reduction kernels are derived from expanded versions of the same

continuous kernel from which an enlargement kernel is derived, the two

operations are represented by the matrix transpose. This suggests to me

that your method has an underlying continuous kernel. Once this

continuous kernel is found, you can use it to derive kernels for arbitrary

scales. I suggest you look at the kernels embedded in A^k to see if the

points all fall on the same continuous function.

Jun 8, 2006, 11:11:04 AM6/8/06

to

"I suggest you look at the kernels embedded in A^k to see if the points all

fall on the same continuous function."

fall on the same continuous function."

Since A is not conformable with itself, I meant A1*A2*...*Ak. Also, I add

that it is not necessary to completely characterize the underlying

continuous function. Once enough discrete points are determined, you can

interpolate between these points.

Jun 8, 2006, 1:15:41 PM6/8/06

to

John Costella wrote:

> aruzinsky wrote:

> > Again, you lost me here. Please, stop belaboring everything else and

> > explain only the above. Since your kernel is separable, you should be able

> > to give a 1D example to explain your enlargement process.

>

> Good idea. Sorry.

>

> Take the original 1-d 'image' to be

>

> a b c d e ...

>

> and the enlarged image to be

>

> A B C D E F G ...

>

> Ignore edge effects, and put b half-way between A and B, c half-way

> between C and D, d half-way between D and E, etc. Then

>

> A = ( a + 3b ) / 4

> B = ( 3b + c ) / 4

> C = ( b + 3c ) / 4

> D = ( 3c + d ) / 4

> E = ( c + 3d ) / 4

> F = ( 3d + e ) / 4

> G = ( d + 3e ) / 4

>

> etc. You can see, for example, that 'c' contributes {1,3,3,1} / 4 to

> {B,C,D,E}; 'd' contributes {1,3,3,1} / 4 to {D,E,F,G}; and so on.

This is equivalent to the kernel 1 3 3 1 convolved with the original

image interpersed with alternate zeroes. Subject to normalisation this

is the result of using a 4 point sinc interpolation evaluated at

-3pi/4, -pi/4, pi/4, 3pi/4 on the original data to interpolate into

pixel values at -pi/4 pi/4 relative to the original pixels.

>

> To downsample from the A B C ... back to the a b c ... you just apply

> this in reverse:

>

> c = ( B + 3C + 3D + E ) / 8

> d = ( D + 3E + 3F + G ) / 8

>

> etc.

ITYM c' = ( -B + 3C +3D - E) /8

viz

c' = ( - ( 3b + c ) + 3 ( b + 3c ) + 3( 3c + d ) - ( c + 3d

)) / 4

c' = ( -3b +3b -c + 9c + 9c -c + 3d -3d)/4

-1 3 3 -1

which is the sinc interpolation kernel for -3pi/2, -pi/2, pi/2, 3pi/2

Reflecting the fact that we have doubled the linear scaling.

(and ignoring normalisation factors)

> Does this make more sense now?

It is a 4 point sinc interpolation and its inverse.

Regards,

Martin Brown

Jun 9, 2006, 7:20:20 AM6/9/06

to

Thanks all -- it is making much more sense to me now. I started off

with filters and sinc functions, didn't do a good job of it, and went

back to something simple, that worked better that I expected it to. The

explanation that it is a truncated sinc makes it make a helluva lot

more sense -- many thanks.

with filters and sinc functions, didn't do a good job of it, and went

back to something simple, that worked better that I expected it to. The

explanation that it is a truncated sinc makes it make a helluva lot

more sense -- many thanks.

I've also followed up on the excellent suggestion to determine the

continuous version of the discrete case, to figure out what I'm

actually using. It's becoming a lot clearer now, although I'm still

trying to figure out the best way to express it mathematically.

The key seems to be that, in one dimension, for upsampling by a factor

of m = 2^k, each pixel of the original image contributes to three

pixels in the upsampled image, or in other words that the kernel, from

the upsampled image's point of view, is 3m pixels wide. The kernel is

made up of three parabolas, each of width m pixels, which smoothly join

together. The parabolas on either end reduce to zero at the extremes.

The curvature of the center parabola is twice the magnitude, and of

opposite sign, of the other two.

The key net effect of these qualities seems to be the property

k(x-m) + k(x) + k(x+m) = const.

for -m <= x <= m. This is what ensures that the convolution of each

pixel with the kernel "joins together" with all the other pixels, like

jigsaw pieces, without any systematic artifacts.

Put another way, if you place a copy of the kernel every m pixels

apart, and add them together, you get a constant function across all

pixels.

It is not clear to me whether it is possible to do this for an

arbitrary type of function. In fact, it's not even clear to me that it

is easy to do for functions of finite support. In any case, I believe

that this is why the kernel does such a good job of avoiding visually

obvious artifacts.

And, of course, the best thing about the kernel is that it is so

lightning fast to implement, not to mention positive definite. :)

John

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu