1260 views

Skip to first unread message

Jun 6, 2006, 8:38:15 AM6/6/06

to

Hi folks,

I've recently generated a solution to an image zooming problem, and

can't find anything quite like it on our good old web or usenet. I'm

trying to figure out if I've stumbled on something that might actually

be useful to other people. :)

My need was to downsample an image by a factor of 8 in each direction

(i.e. 1 downsampled pixel for every 64 original image pixels), and then

upsample the result back to the original resolution. I understand all

the information theoretical aspects of this process; I was trying to

figure out if there was a kernel that did both tasks well (i.e., the

final result was smooth and without stellations or other strange

artifacts), and also did them fast.

After some truly terrible attempts (even though I thought I had sinc

functions well and truly under my belt 20 years ago), I found that the

following recursive algorithm works amazingly well:

1. To downsample or upsample by a factor of 2, use a kernel

1 3 3 1

3 9 9 3

3 9 9 3

1 3 3 1

placed over every second pixel in each direction. (Total normalization

of 64 is relevant for downsampling. For upsampling, only divide by 16,

because every upsampled pixel gets a contribution from four downsampled

pixels' kernels).

2. To downsample or upsample by a factor of 2^k, perform the above

resampling recursively, k times.

The results for upsampling are astoundingly good, really astoundingly

good, and better than what I could get Photoshop to do. (Some years ago

I did my fair share of 'extreme zooming' in some work on the

photographic evidence of the JFK assassination, so I'm very familiar

with all the strange artifacts that can pop out, and the disappointment

that accompanies it.)

I can upload a few images to my website if need be, to demonstrate what

I am talking about (or, equivalently, point me to some test images and

I'll show you what comes out).

For my particular purpose (8:8 downsampling and upsampling), applying

this 'magic' kernel three times yields a kernel that is only 22 x 22 in

size (if you want to precompute the kernel and apply it once, as I

ultimately want to, rather than actually perfoming the doubling process

three times). (Every time you apply it, double the kernel width or

height and add 2, so 2 x 4 + 2 = 10, and 2 x 10 + 2 = 22.) That's

pretty good, considering that, for 800% zooming, it's already 16 pixels

from nearest pixel to nearest pixel, in each dimension.

Of course, if you wanted to resample by something that is not a power

of 2, then you'd need to use the 'magic' kernel for the nearest power

of 2, and then use something more traditional for the final small

adjustment in resolution. That's no major problem, because getting to

the final result from the nearest power of 2 is never worse than

between a 71% and 141% zoom, and just about any resampling method does

a decent job in this range.

My question is: Is this 'magic kernel' something new, or is this trick

known?

The closest I could find on the net is the 'stair interpolation' trick,

which uses Photoshop's bicubic for successive 110% increases, which is

sort of, kind of, the same idea, but not quite. The other resampling

kernels I could find on the net look much more like what I was trying

to do in the first place, but nothing like what I ended up with.

The 'magic' kernel sort of reminds me of the Fast Fourier Transform,

which also gets all sorts of amazing efficiencies with powers of 2, and

then needs a bit of a (non-critical) fudge if you don't quite have a

power of 2.

Oh, and by the way, I know how I arrived at the 'magic' kernel above

(for another aspect of the project that needed just 2:2 downsampling

and upsampling), and it has some nice mathematical properties (it is

'separable' in the sense that it is the product of an 'x' 1-3-3-1 and a

'y' 1-3-3-1, and its normalization is always a power of 2, which means

everything is integer look-ups with bit-shifts, which makes me

extremely happy), but I have no mathematical proof at all of why it

works so damn well when applied to itself recursively.

Anyhow, thanks in advance for any words of advice. And apologies in

advance if I've rediscovered someone's magic kernel, as is bound to be

the case. :)

John Costella

_______________________________________________

Dr. John P. Costella

BE(Elec)(Hons) BSc(Hons) PhD(Physics) GradDipEd

Melbourne, Victoria, Australia

jpcos...@hotmail.com

cost...@bigpond.com

assassinationscience.com/johncostella

Jun 6, 2006, 11:58:52 AM6/6/06

to

For some test images with a large number of enlargement results, see

http://www.general-cathexis.com/interpolation.html

http://www.general-cathexis.com/interpolation.html

"For upsampling, only divide by 16,

because every upsampled pixel gets a contribution from four downsampled

pixels' kernels)."

Huh? Both kernels should be normalized. Also, for enlargement, I expect

that you are interlacing this kernel with identity so all of the original

pixels are retained.

Your 1D separated kernel, 0.125*[1 3 3 1], seems like an approximate

Gaussian and this has been covered in literature, but maybe not for your

particular stand. dev..

Regarding size reductions, the most commonly used COMMERCIAL method is to

use a discrete kernel derived from an expanded version of the same

continuous kernel, e.g., bicubic, used for enlargement. If the size

factor is 1/N, then the continuous kernel is expanded by N. Of course,

the compact support of the discrete reduction kernel is expanded. Your

use of the same kernels for both operations violates this.

I have not seen any theoretical treatment of why this is done, but it is

my personal observation that if a linear enlargement operation is put in

matrix form A, then the reduction is 1/(N*N) * AT (transpose) except near

the image borders. This provides a way to derive discrete reduction

kernels from discrete enlargement kernels and vice versa.

Jun 6, 2006, 12:41:47 PM6/6/06

to

To elaborate on, "Your 1D separated kernel, 0.125*[1 3 3 1], seems like an

approximate Gaussian and this has been covered in literature, but maybe

not for your particular stand. dev..", see:

approximate Gaussian and this has been covered in literature, but maybe

http://www.worldserver.com/turk/computergraphics/ResamplingFilters.pdf

Jun 6, 2006, 10:06:40 PM6/6/06

to

<jpcos...@hotmail.com> wrote in message

news:1149597495.8...@y43g2000cwc.googlegroups.com...

SNIP

> Anyhow, thanks in advance for any words of advice. And apologies

> in advance if I've rediscovered someone's magic kernel, as is bound

> to be the case. :)

John, you might want to also post your question to

<news://sci.image.processing>, some knowledgeable folks there and the

tone of voice is reasonable. I've yet to test your kernel on e.g. a

zone-plate type of target, like I used for my informal/empirical

verdict on proper down-sampling:

<http://www.xs4all.nl/~bvdwolf/main/foto/down_sample/down_sample.htm>.

Objectively quantifying the resulting quality is probably beyond my

capabilities, unless you have a suggestion for further research.

Obviously this was intended to inform about down-sampling pitfalls

only, since down-sampling is becoming more of a common process as the

average sensor array increases in size/sampling density.

As an aside, last night I saw a feature on TV following a new approx.

3 hour documentary on the JFK assassination by (apparently) James

Files rather than Lee Harvey Oswald (who was in the documentary

assumed to be an undercover CIA operative), based on the extensive

research of fellow Dutchman Wim Dankbaar, a Dutch researcher

fascinated by the conspiracy, murder, and cover-up of this factual

coup-d'etat. Lot's of new (to me anyway) details were uncovered,

including an interview with James Files who revealed a lot of

verifiable specifics about how he as a backup Mob hitman fired the

lethal head shot with a mercury filled bullet with a "Remington Arms

Fireball XP-100" from behind the fence at the grassy knoll

(http://www.jfkmurdersolved.com/filestruth.htm).

Bart

Jun 7, 2006, 12:02:07 AM6/7/06

to

Factor of two resampling is trivial. It's the other ratios that tend to

produce beats where pixels go in and out of phase with the original.

Various algorithms attempt to reduce the beats without blurring the

image too much. One trick is to find image edges and nudge them into

alignment with the destination pixels, but that only creates a different

kind of distortion.

produce beats where pixels go in and out of phase with the original.

Various algorithms attempt to reduce the beats without blurring the

image too much. One trick is to find image edges and nudge them into

alignment with the destination pixels, but that only creates a different

kind of distortion.

Jun 7, 2006, 7:33:05 AM6/7/06

to

aruzinsky wrote:

> For some test images with a large number of enlargement results, see

> http://www.general-cathexis.com/interpolation.html

> For some test images with a large number of enlargement results, see

> http://www.general-cathexis.com/interpolation.html

Thanks for the link -- an interesting comparison. However, I don't

agree with the first step applied in the process, which was to reduce

the original image using a box filter. That's not valid, in my opinion.

I tried to download the original, undownsampled image, but the link did

not work. Do you know where I could get it?

I must emphasize that my goal was to get a smooth upsampling free of

artifacts, not to attempt to give the upsampled image an appearance of

more 'detail' than information theory tells us that it really has. Some

of the other algorithms have other goals/

> "For upsampling, only divide by 16,

> because every upsampled pixel gets a contribution from four downsampled

> pixels' kernels)."

>

> Huh? Both kernels should be normalized. Also, for enlargement, I expect

> that you are interlacing this kernel with identity so all of the original

> pixels are retained.

Both are normalized, but I was probably too obscure in my description.

And no, I am not interlacing this kernel with identity for enlargement.

I'd better explain it more clearly. Start with downsampling. Label the

pixels of the original image (x,y). For a box filter, the top-left

downsampled pixel averages (0,0), (1,0), (0,1) and (1,1), and the

result is notionally centered at (0.5,0.5). I am saying that, instead

of this, compute it as

(-1,-1) + 3(0,-1) + 3(1,-1) + (2,-1)

+ 3(-1, 0) + 9(0, 0) + 9(1, 0) + 3(2, 0)

+ 3(-1, 1) + 9(0, 1) + 9(1, 1) + 3(2, 1)

+ (-1, 2) + 3(0, 2) + 3(1, 2) + (2, 2)

and divide the result by 64. OK, ignore the fact that we have gone one

pixel outside the image; that's an edge effect that we can deal with.

You can see the original 2 x 2 'tile' is in the middle of this kernel.

Now, for the next downsampled value to the right, shift everything

right by two pixels. The box filter would just average out (2,0),

(3,0), (2,1) and (3,1), and the result is notionally centered at

(2.5,0.5). I'm saying to compute

(1,-1) + 3(2,-1) + 3(3,-1) + (4,-1)

+ 3(1, 0) + 9(2, 0) + 9(3, 0) + 3(4, 0)

+ 3(1, 1) + 9(2, 1) + 9(3, 1) + 3(4, 1)

+ (1, 2) + 3(2, 2) + 3(3, 2) + (4, 2)

and again divide the result by 64. Again you can see the original 2 x 2

tile in the middle of the calculation. And so on.

For upsampling, it's simplest to imagine that we're reversing the above

process. Label the downsampled pixels [x,y], where x and y are the

'notional' centered positions listed above, i.e., we calculated

[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place

a copy of the kernel, multplied by the weight of the pixel, and divide

by 16. If you work it through, you'll see that each upsampled pixel

receives a contribution of 9 times the nearest downsampled pixel (the

one whose corner it's 'in'), 3 times the next two nearest ones (other

adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.

That's a total of 16, so you divide by 16. For example,

(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16

Again, we have edge effects in that we've gone outside the image, but

we can deal with that.

> Your 1D separated kernel, 0.125*[1 3 3 1], seems like an approximate

> Gaussian and this has been covered in literature, but maybe not for your

> particular stand. dev..

Well, yeah, [1 3 3 1] is an approximation to just about anything if you

try hard enough. :)

You can make the two terms (0.5,3) and (1.5,1) fit a Gaussian by making

its standard deviation 1 / sqrt( ln( 3 ) ) = 0.954.... The next two

terms would then be 1/3 and 1/81, so what you're telling me is that

it's a 'truncated Gaussian'? I suppose so, but that doesn't really tell

me why it works so well in two dimensions. (More on this below.)

I looked at the site you recommended in your next reply. I know all

about the information theoretical aspects of 1-dimensional resampling,

and have played with those things in many different forms for many

years (and continue to, in other areas). The problems come with a

two-dimensional grid, as far as I can tell.

None of the Gaussians listed on that site have coefficients that match

the 'magic' kernel. That's not to say that this isn't a fruitful way to

look at it, but at this stage I'm not quite sure.

For interest, I've uploaded a simple graph of the 800% kernel to

http://www.assassinationscience.com/johncostella/kernel800percent.jpg

Looking at the numbers, the kernel is still the product of two 1-d 'x'

and 'y' kernels. The 1-d kernel is actually three parabolas:

[0] 1 3 6 10 15 21 28 [36] 42 46 48 48 46 42 [36] 28 21 15 10 6 3 1 [0]

where the joins between the parabolas are marked by the [36] terms.

That it is three parabolas can be proved from second differences:

[0] 1 2 3 4 5 6 7 [8] 6 4 2 0 -2 -4 -6 [-8] -7 -6 -5 -4 -3 -2 -1 [0]

1 1 1 1 1 1 1 1 | -2 -2 -2 -2 -2 -2 -2 -2 | 1 1 1 1 1 1 1 1

The 400% kernel follows the same pattern: it is the product of

[0] 1 3 6 [10] 12 12 [10] 6 3 1 [0]

the differences of which give

[0] 1 2 3 [4] 2 0 -2 [-4] -3 -2 -1 [0]

1 1 1 1 | -2 -2 -2 -2 | 1 1 1 1

I guess then the 200% one can be written

[0] 1 [3] [3] 1 [0]

which on differencing gives

[0] 1 [2] 0 [-2] -1 [0]

1 1 | -2 -2 | 1 1

And I suppose that's the algorithm for generating the 2^k kernel:

A. Form a list of k 1's followed by k -2's followed by k 1's.

B. Starting at 0, form the cumulative sum of the first list.

C. Starting at 0, form the cumulative sum of the second list.

D. Use the 3rd list as the 1-d kernel.

E. Form the 2-d kernel as the product of the two 1-d 'x' and 'y'

kernels.

So for 1600% magnification, the steps would be:

A. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2

-2 -2 -2 -2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

B. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 14 12 10 8 6 4 2 0 -2 -4 -6

-8 -10 -12 -14 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1

C. 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136 150 162 172 180 186

190 192 192 190 186 180 172 162 150 136 120 105 91 78 66 55 45 36 28 21

15 10 6 3 1

and so on.

This makes for a pretty efficient implementation. The kernel dimensions

are 3(2^k)-2 by 3(2^k)-2 (i.e. 4 by 4 for 2^1, 10 by 10 for 2^2, 22 by

22 for 2^3, 46 by 46 for 2^4, etc.). The 1-d kernel has 3(2^(k-1))-1

unique elements (i.e. 2 for 2^1, 5 for 2^2, 11 for 2^3, 23 for 2^4),

one of which is always 1, so you only need 3(2^(k-1))-2 integer

multiplication lookup tables (e.g. for 2^k=16, you only need 22 lookup

tables), because at any position multiplying by the kernel coefficient

is the same as multiplying by the 'x' 1-d kernel element and then the

'y' 1-d kernel element. The above algorithm gives you them all. The

normalization is 2^(6k).

> Regarding size reductions, the most commonly used COMMERCIAL method is to

> use a discrete kernel derived from an expanded version of the same

> continuous kernel, e.g., bicubic, used for enlargement. If the size

> factor is 1/N, then the continuous kernel is expanded by N. Of course,

> the compact support of the discrete reduction kernel is expanded. Your

> use of the same kernels for both operations violates this.

Yeah, I know I'm not doing the normal thing, but it seems to work. :)

John

Jun 7, 2006, 8:00:31 AM6/7/06

to

Bart van der Wolf wrote:

[...]

> I've yet to test your kernel on e.g. a

> zone-plate type of target, like I used for my informal/empirical

> verdict on proper down-sampling:

> <http://www.xs4all.nl/~bvdwolf/main/foto/down_sample/down_sample.htm>.

[...]

> I've yet to test your kernel on e.g. a

> zone-plate type of target, like I used for my informal/empirical

> verdict on proper down-sampling:

> <http://www.xs4all.nl/~bvdwolf/main/foto/down_sample/down_sample.htm>.

Bart,

That's an interesting test you have applied, and you're right in that

it highlights obvious pitfalls. However, distinguishing the 'best' ones

is tricky. I note that your original GIF image has faint Moire rings

away from the center itself, as can be seen if you zoom it and look

carefully. I guess what you're really testing is 'relative badness'.

(And I guess the way that you generated your 'antialiased' original

image preassumes that you know how to do this in the first place!)

It's also tricky looking at your site, because the results are

presented at actual size, and my LCD screen is producing Moire rings on

some of the ones you claim to be the 'best'.

But in any case, I used the 'magic' kernel on the original image, to

downsample it to 25%. I've posted the result to

http://www.assassinationscience.com/johncostella/rings1_on_4.gif

To me, it looks comparable to the best results you obtained. Given that

it's remarkably simple to implement (see my additional posting today,

above, outlining a very simple algorithm for generating an arbitrary

2^k kernel), I'm happy with the result. However, again, I'm a little

reserved about your original image, and also about whether a zone plate

is necessarily the best test of downsampling, given the difficulties in

generating it in the first place.

John

Jun 7, 2006, 8:09:05 AM6/7/06

to

Kevin,

You may be right about a factor of 2, but what about a factor of 2^k? I

know that when I ask Photoshop to zoom something 800%, it doesn't do a

very good job. I am really looking at the case of high zoom (like 800%

or 1600% or 25600%). If you are telling me that 2^k is trivial, then I

do not agree.

I'm not really touching on the issue of resampling by a factor between

70% and 141%. You're right that frequencies in the original image will

beat in a resampling, but there is nothing much you can do about that

(unless, as you say, you distort the original). I'm taking it as

granted that resampling in this range has its own issues, but basically

gives acceptable results except in singular cases. It's the poor

performance for large factors that I'm targeting.

John

Jun 7, 2006, 9:02:28 AM6/7/06

to

jpcos...@hotmail.com wrote:

>

> Kevin McMurtrie wrote:

>> Factor of two resampling is trivial. It's the other ratios that tend to

>> produce beats where pixels go in and out of phase with the original.

>> Various algorithms attempt to reduce the beats without blurring the

>> image too much. One trick is to find image edges and nudge them into

>> alignment with the destination pixels, but that only creates a different

>> kind of distortion.

>

> Kevin,

>

> You may be right about a factor of 2, but what about a factor of 2^k?

2^k is as trivial as 2. Actually any ratio (n/m)^k where n and m are small

integers are quite trivial. Just look for polyphase filtering.

> John

--

Jani Huhtanen

Tampere University of Technology, Pori

Jun 7, 2006, 11:04:58 AM6/7/06

to

"Thanks for the link -- an interesting comparison. However, I don't

agree with the first step applied in the process, which was to reduce

the original image using a box filter. That's not valid, in my opinion."

agree with the first step applied in the process, which was to reduce

the original image using a box filter. That's not valid, in my opinion."

There are two reduced images, clown.png and PDI-Target574x900.PNG. Only

PDI-Target574x900.PNG was produced with a box kernel. A 3rd order Lanczos

kernel was used to produce clown.png and this is the image I would like to

see your enlargement process applied to.

"I tried to download the original, undownsampled image, but the link did

not work. Do you know where I could get it?"

Sorry, about that. I must have accidentally deleted it and I have

problems uploading > 1 mb, so it will be awhile before I can replace it.

Meanwhile, if you google "pdi target," you will find sources all over the

internet, e.g., http://www.pbase.com/npb/tools. Supposedly there is a

TIFF version, but I can't find it.

"For upsampling, it's simplest to imagine that we're reversing the above

process. Label the downsampled pixels [x,y], where x and y are the

'notional' centered positions listed above, i.e., we calculated

[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place

a copy of the kernel, multplied by the weight of the pixel, and divide

by 16. If you work it through, you'll see that each upsampled pixel

receives a contribution of 9 times the nearest downsampled pixel (the

one whose corner it's 'in'), 3 times the next two nearest ones (other

adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.

That's a total of 16, so you divide by 16. For example,

(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16"

Again, you lost me here. Please, stop belaboring everything else and

explain only the above. Since your kernel is separable, you should be

able

to give a simple 1D example to explain your enlargement process.

Jun 7, 2006, 12:21:48 PM6/7/06

to

"It's the poor

performance for large factors that I'm targeting."

performance for large factors that I'm targeting."

In repetitive enlargements, the accuracies of earliest are the most

important because the enlargements become progressively smoother (with

less high frequencies). If the first enlargement is 2X, it matters very

little what methods are used next. Practically, you can use your method

for 2X followed by bicubic or even bilinear to arbitrary size. The final

result will be practically the same as that produced by your method alone.

Jun 8, 2006, 7:26:34 AM6/8/06

to

aruzinsky wrote:

> There are two reduced images, clown.png and PDI-Target574x900.PNG. Only

> PDI-Target574x900.PNG was produced with a box kernel. A 3rd order Lanczos

> kernel was used to produce clown.png and this is the image I would like to

> see your enlargement process applied to.

> There are two reduced images, clown.png and PDI-Target574x900.PNG. Only

> PDI-Target574x900.PNG was produced with a box kernel. A 3rd order Lanczos

> kernel was used to produce clown.png and this is the image I would like to

> see your enlargement process applied to.

Sorry, I misunderstood the text on the page. I'd still be interested to

have the original clown image, so that I can assess both the

downsampling and upsampling aspects. Nevertheless, I have run the

enlarger on the clown.png on your page.

At the moment I only have a 24-bit implementation of the image doubler,

and I have to apply it twice to get 4X magnification. This means that

the dynamic range won't be as good as if it was done in one step in

48-bit. Nevertheless, here is what comes out:

http://www.assassinationscience.com/johncostella/clown_by_4.png

For the test implementation, I avoided the edge problems by trimming

off the pixels that would require extension of the original image. This

will of course be fixed when I actually use it for my original purpose.

> Again, you lost me here. Please, stop belaboring everything else and

> explain only the above. Since your kernel is separable, you should be

> able

> to give a simple 1D example to explain your enlargement process.

Sorry, I saw this question of yours on the re-post on

sci.image.processing, and answered it there. Hopefully it is now much

clearer.

Apologies for my confusion and unclarity,

John

Jun 8, 2006, 7:34:21 AM6/8/06

to

aruzinsky wrote:

> In repetitive enlargements, the accuracies of earliest are the most

> important because the enlargements become progressively smoother (with

> less high frequencies). If the first enlargement is 2X, it matters very

> little what methods are used next. Practically, you can use your method

> for 2X followed by bicubic or even bilinear to arbitrary size. The final

> result will be practically the same as that produced by your method alone.

> In repetitive enlargements, the accuracies of earliest are the most

> important because the enlargements become progressively smoother (with

> less high frequencies). If the first enlargement is 2X, it matters very

> little what methods are used next. Practically, you can use your method

> for 2X followed by bicubic or even bilinear to arbitrary size. The final

> result will be practically the same as that produced by your method alone.

Not true. I used Photoshop to do bicubic enlargement by 400% on an

image that had been enlarged 200% using the 'magic' kernel. Whilst it

is better than 800% bicubic of the original, it still has visible

artifacts that are not present on the enlargement obtained from using

the 'magic' kernel three times.

If you doubt this, I can post some of the images in question.

John

Jun 8, 2006, 7:42:33 AM6/8/06

to

Jani Huhtanen wrote:

> 2^k is as trivial as 2. Actually any ratio (n/m)^k where n and m are small

> integers are quite trivial. Just look for polyphase filtering.

> 2^k is as trivial as 2. Actually any ratio (n/m)^k where n and m are small

> integers are quite trivial. Just look for polyphase filtering.

Jani, I looked up polyphase filtering, but it's not jumping out at me.

Could you provide a simple example, say for 2 or 4?

Thanks

John

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu