New transform for video

Mihai Cartoaje

unread,

Jun 9, 2009, 10:05:16 PM6/9/09

to

A description of my new transform for video compression is available:
http://libima.com/video.htm

Thomas Richter

unread,

Jun 9, 2009, 10:48:23 PM6/9/09

to

Mihai Cartoaje wrote:
> A description of my new transform for video compression is available:
> http://libima.com/video.htm

A couple of comments:

o) Your presentation should say which type of entropy coding you're
using. Your patent statement mentions the SPIHT, though are you also
using the SPIHT in your comparison?

o) You seem to refer to a "H.265" standard, though there is none - H.265
is under discussion at best, so which code are you referring to?
Published were? Sounds fishy to me at best.

o) If you want to compete with any other code, you should show PSNR or
SSIM plots, none of which are present on your side. This is the standard
method for comparing codecs and measuring their performance. A "is 0.7
dB better" is such a general statement that it makes no sense. At which
rate? For which image?

o) I would recommend to compare to *imaging* standards like JPEG 2000 if
you compress images. If so, make sure you run the JPEG 2000
implementation of your choice in the right mode (i.e. visually vs. PSNR
optimal).

o) Your transformation is essentially a hierarchical lifting scheme, and
as such you could represent that in JPEG 2000 part 2, so why not do that?

o) You should probably know that wavelet filters (as yours is one,
essentially) are not very well-suited for video, which up to today
depends on blocks for motion compensation. Wavelets do not work well on
blocks, and react badly on the sharp edges created by the motion
compensation. So do you really mean "for video"? This might work ok for
still images - have you actually tried for video?

From a legal perspective,

o) I don't think you can patent the transformation as such as it is
covered by a publication already, so prior art exists for the
transformation and the entropy coding,

o) Up to my knowledge, the claims must be complete. I don't think you
can refer in the claims to the detailed description, as the *claims* are
which are finally tested in a potential case in a court. Thus, I would
believe that your application is likely to be denied for this formal
reason - but please check with your lawyer on this, I'm not an expert in
patent law.

My advice would be, if you want to test your ingenuity, go and publish
your ideas at a scientific conference to get it reviewed by experts in
the field. In the US, you then still have a grace period of one year to
file it as a patent should it prove effective. Otherwise, you might have
already invested money into the application without knowing whether it
is worth it.

So long,
Thomas

Industrial One

unread,

Jun 9, 2009, 11:29:33 PM6/9/09

to

On Jun 9, 10:48 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
> o) Your presentation should say which type of entropy coding you're
> using. Your patent statement mentions the SPIHT, though are you also
> using the SPIHT in your comparison?
>
> o) You seem to refer to a "H.265" standard, though there is none - H.265
> is under discussion at best, so which code are you referring to?
> Published were? Sounds fishy to me at best.

http://iphome.hhi.de/suehring/tml/download/KTA/jm11.0kta2.3.zip

Mihai Cartoaje

unread,

Jun 10, 2009, 1:38:44 AM6/10/09

to

On Jun 9, 10:48 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:

> o) Your presentation should say which type of entropy coding you're
> using. Your patent statement mentions the SPIHT, though are you also
> using the SPIHT in your comparison?

Yes.

> o) You seem to refer to a "H.265" standard, though there is none - H.265
> is under discussion at best, so which code are you referring to?
> Published were? Sounds fishy to me at best.

KTA 2.2r1
http://iphome.hhi.de/suehring/tml/download/KTA/

> o) If you want to compete with any other code, you should show PSNR or
> SSIM plots, none of which are present on your side. This is the standard
> method for comparing codecs and measuring their performance. A "is 0.7
> dB better" is such a general statement that it makes no sense. At which
> rate? For which image?

I don't have time to do that now because I am also writing an image
codec. The rate and images on the site.

> o) I would recommend to compare to *imaging* standards like JPEG 2000 if
> you compress images. If so, make sure you run the JPEG 2000
> implementation of your choice in the right mode (i.e. visually vs. PSNR
> optimal).

The proposal is for video.

> o) Your transformation is essentially a hierarchical lifting scheme, and
> as such you could represent that in JPEG 2000 part 2, so why not do that?

I'll take a look.

> o) You should probably know that wavelet filters (as yours is one,
> essentially) are not very well-suited for video, which up to today
> depends on blocks for motion compensation. Wavelets do not work well on
> blocks, and react badly on the sharp edges created by the motion
> compensation. So do you really mean "for video"? This might work ok for
> still images - have you actually tried for video?

I haven't implemented it for video with complex inter prediction
because I am busy writing
an image codec.

S+P can be done on regions of any shape so this includes blocks. Here
is a detailed example.

Suppose that the region is,

s0 s1 s2 s3 t4 t5 s6 s7

with the s samples inside the region and the t samples outside the
region. To take a numerical example:

1 3 5 7 t4 t5 13 15

after one level of the S transform:

2 -2 6 -2 t4 t5 14 -2

since we don't have t4 for prediction, we can predict s3 as

s3 -= (s0 - s2) / 4

after one pass of P we get,

2 -1 6 -1 t4 t5 14 -2

after one more level of S we get,

4 -1 -4 -1 14 t5 s6 -2

t4 is outside the region but I use it as a buffer.

After one pass of P we get,

s2 -= (s0 - t4) / 4

4 -1 -2 -1 14 t5 s6 -2

after one more level of S we get,

9 -1 -2 -1 -10 t5 s6 -2

the value of s6 can be reconstructed because it was copied into t4.

> o) I don't think you can patent the transformation as such as it is
> covered by a publication already, so prior art exists for the
> transformation and the entropy coding,

My transform is an improvement to S+P. It is a packet transform of S+P
and one level of Haar and a reordering of the coefficients. S+P may or
may not be a wavelet transform depending on how the prediction of
diagonal terms is done. The way I implemented it it is not a wavelet
transform. I implemented one level of the S transform on the whole
image, followed by a P step that uses 2-dimensional data to make a
prediction.

> o) Up to my knowledge, the claims must be complete. I don't think you
> can refer in the claims to the detailed description, as the *claims* are
> which are finally tested in a potential case in a court. Thus, I would
> believe that your application is likely to be denied for this formal
> reason - but please check with your lawyer on this, I'm not an expert in
> patent law.

Thanks. I believe that claim 1 is novel and since it doesn't refer to
the description section it is sufficient.

Industrial One

unread,

Jun 10, 2009, 6:18:52 AM6/10/09

to

On Jun 10, 5:38 am, Mihai Cartoaje <mcarto...@gmail.com> wrote:
> On Jun 9, 10:48 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
>
> > o) Your presentation should say which type of entropy coding you're
> > using. Your patent statement mentions the SPIHT, though are you also
> > using the SPIHT in your comparison?
>
> Yes.
>
> > o) You seem to refer to a "H.265" standard, though there is none - H.265
> > is under discussion at best, so which code are you referring to?
> > Published were? Sounds fishy to me at best.
>

> KTA 2.2r1http://iphome.hhi.de/suehring/tml/download/KTA/

Are you an MPEG employee?

Thomas Richter

unread,

Jun 10, 2009, 10:26:55 AM6/10/09

to

Industrial One wrote:

>> Thanks. I believe that claim 1 is novel and since it doesn't refer to
>> the description section it is sufficient.
>
> Are you an MPEG employee?

Nobody is. MPEG doesn't hire people. MPEG is (like JPEG) a group of
volunteers.

Greetings,
Thomas

Thomas Richter

unread,

Jun 10, 2009, 10:37:44 AM6/10/09

to

Mihai Cartoaje wrote:
> On Jun 9, 10:48 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
>
>> o) Your presentation should say which type of entropy coding you're
>> using. Your patent statement mentions the SPIHT, though are you also
>> using the SPIHT in your comparison?
>
> Yes.
>
>> o) You seem to refer to a "H.265" standard, though there is none - H.265
>> is under discussion at best, so which code are you referring to?
>> Published were? Sounds fishy to me at best.
>
> KTA 2.2r1
> http://iphome.hhi.de/suehring/tml/download/KTA/

Why is that a good reference if you only test I-frame compression?

>> o) If you want to compete with any other code, you should show PSNR or
>> SSIM plots, none of which are present on your side. This is the standard
>> method for comparing codecs and measuring their performance. A "is 0.7
>> dB better" is such a general statement that it makes no sense. At which
>> rate? For which image?
>
> I don't have time to do that now because I am also writing an image
> codec. The rate and images on the site.

?? Come on, if you claim to be better, you need to provide evidence for
that. I'm not saying you're not. I'm only saying that you need to do
your homework.

>> o) I would recommend to compare to *imaging* standards like JPEG 2000 if
>> you compress images. If so, make sure you run the JPEG 2000
>> implementation of your choice in the right mode (i.e. visually vs. PSNR
>> optimal).
>
> The proposal is for video.

Then there's a lot missing, at least (besides tests), namely the motion
prediction and motion compensation - unless you have a novel design how
to handle motion without blocks.

>> o) You should probably know that wavelet filters (as yours is one,
>> essentially) are not very well-suited for video, which up to today
>> depends on blocks for motion compensation. Wavelets do not work well on
>> blocks, and react badly on the sharp edges created by the motion
>> compensation. So do you really mean "for video"? This might work ok for
>> still images - have you actually tried for video?
>
> I haven't implemented it for video with complex inter prediction
> because I am busy writing
> an image codec.

See above - please do that.

> S+P can be done on regions of any shape so this includes blocks. Here
> is a detailed example.
>
> Suppose that the region is,
>
> s0 s1 s2 s3 t4 t5 s6 s7
>
> with the s samples inside the region and the t samples outside the
> region. To take a numerical example:
>
> 1 3 5 7 t4 t5 13 15
>
> after one level of the S transform:
>
> 2 -2 6 -2 t4 t5 14 -2

Sorry, I don't understand. What do you do at the block boundaries? You
need to predict from somewhere. Again, if you stop the filter there (at
block boundaries) no matter what type of "extension" you pick, it will
impact the performance of the filter. It's a general problem of all such
filter types.

> since we don't have t4 for prediction, we can predict s3 as
>
> s3 -= (s0 - s2) / 4

If I now would only know where the S's are located after the
reordering... (-;

>> o) I don't think you can patent the transformation as such as it is
>> covered by a publication already, so prior art exists for the
>> transformation and the entropy coding,
>
> My transform is an improvement to S+P. It is a packet transform of S+P
> and one level of Haar and a reordering of the coefficients. S+P may or
> may not be a wavelet transform depending on how the prediction of
> diagonal terms is done.

Close enough as a hierarchical transform at least.

> The way I implemented it it is not a wavelet
> transform. I implemented one level of the S transform on the whole
> image, followed by a P step that uses 2-dimensional data to make a
> prediction.

Sorry, but if I follow your algorithm, it operates on rows and columns
separately, so it is a separable transform, i.e. its 2D operation is
simply a tensor product of two identical one-dimensional filters. Isn't
that correct? What do I miss?

>> o) Up to my knowledge, the claims must be complete. I don't think you
>> can refer in the claims to the detailed description, as the *claims* are
>> which are finally tested in a potential case in a court. Thus, I would
>> believe that your application is likely to be denied for this formal
>> reason - but please check with your lawyer on this, I'm not an expert in
>> patent law.
>
> Thanks. I believe that claim 1 is novel and since it doesn't refer to
> the description section it is sufficient.

Don't tell me, I'm not checking it. It's something you need to clarify
with an expert in patents - I'm not. All I'm saying is: Be careful, this
doesn't look right to me.

So long,
Thomas

Industrial One

unread,

Jun 10, 2009, 11:38:09 AM6/10/09

to

Okay, I'll rephrase that: Mihai, are you a member of the MPEG
committee? Are you developing the tech for H.265?

dlima...@gmail.com

unread,

Jun 10, 2009, 4:13:01 PM6/10/09

to

On Jun 9, 10:05 pm, Mihai Cartoaje <mcarto...@gmail.com> wrote:
> A description of my new transform for video compression is available:http://libima.com/video.htm

I ran some comparisons with the images you provided versus existing
standards and my work. For what it's worth these are the hard
numbers.

PSNR (dB)
33.81 IJG JPEG
35.69 "New transform"
36.01 Kakadu JPEG 2000
36.39 H.265
37.38 DLI

MS-SSIM
0.97923 IJG JPEG
0.98346 "New transform"
0.98395 Kakadu JPEG 2000
0.98421 H.265
0.98802 DLI

For image compression, the new transform is outperformed by JPEG
2000. From the other postings in this thread I gather the new
transform hasn't been applied in a video compression framework yet.
It's claim of being a video compression transform is from a design
standpoint, being compatible with block based compression. Not meant
to discourage further work, but it's use with video compression is
likely to result in similar performance versus current video codecs as
the image compression results.

Take care,
Dennis

Mihai Cartoaje

unread,

Jun 10, 2009, 10:11:37 PM6/10/09

to

I wrote,

> after one more level of S we get,
>
> 9 -1 -2 -1 -10 t5 s6 -2

This needs an additional step to replace the buffered coefficient
inside the region:

9 -1 -2 -1 t4 t5 -10 -2

Mihai Cartoaje

unread,

Jun 10, 2009, 10:17:59 PM6/10/09

to

Thomas Richter wrote:

> ?? Come on, if you claim to be better, you need to provide evidence for
> that. I'm not saying you're not. I'm only saying that you need to do
> your homework.

I wrote "0.7 dB below H.265 KTA 2.2r1".

> Then there's a lot missing, at least (besides tests), namely the motion
> prediction and motion compensation - unless you have a novel design how
> to handle motion without blocks.

The new transform is compatible with bocks. If it is done on a 4x4
array the synthesis functions are almost like the DCT.

> Sorry, I don't understand. What do you do at the block boundaries? You
> need to predict from somewhere. Again, if you stop the filter there (at
> block boundaries) no matter what type of "extension" you pick, it will
> impact the performance of the filter. It's a general problem of all such
> filter types.

It's a general problem of all transforms at edges. Regions can be any
shape. A region can be a block, or a set of connected blocks, or the
whole frame.

Something like this: an inter frame can be divided into 2 regions: one
region is coded independently, the other region is inter predicted.
The region which is inter predicted is divided into subregions each
subregion correspoding to particular motion vectors. The S step can be
done on each region independently. The P step would be done without
crossing subregion boundaries.

So if in a frame the camera pans and all blocks have the same motion
vectors, then the whole frame can be coded as one region and there
would be no block artifacts.

> Sorry, but if I follow your algorithm, it operates on rows and columns
> separately, so it is a separable transform, i.e. its 2D operation is
> simply a tensor product of two identical one-dimensional filters. Isn't
> that correct? What do I miss?

The new transform can be described in 1 dimension or in 2 dimensions.
In the application the 1-dimension description is used because it is
simpler. In Libima it is implemented in 2 dimensions. See
transform_new in file transformic.c

Thomas Richter

unread,

Jun 10, 2009, 10:53:19 PM6/10/09

to

Mihai Cartoaje wrote:
> Thomas Richter wrote:
>
>> ?? Come on, if you claim to be better, you need to provide evidence for
>> that. I'm not saying you're not. I'm only saying that you need to do
>> your homework.
>
> I wrote "0.7 dB below H.265 KTA 2.2r1".

This statement is of such generality that it is likely wrong. Again, for
which quality range? For which image types? Look, the performance of any
lossy compression algorithm is defined by its rate-distortion curve,
that means, you need to give distortion values for target rates.

>> Then there's a lot missing, at least (besides tests), namely the motion
>> prediction and motion compensation - unless you have a novel design how
>> to handle motion without blocks.
>
> The new transform is compatible with bocks. If it is done on a 4x4
> array the synthesis functions are almost like the DCT.

Which is fine, but which performance degradations are implied by doing
that? Similarly, yes I can run the 9/7 wavelet filter on 4x4 blocks, but
do I want to do that? So again, if that is an applicable approach, what
is the R/D curve if you operate on blocks? What is it if you don't? The
difference might be huge. What are you doing on the specific image you
claim to be 0.7 dB better (at which rate, in first place)? The data you
provide is not sufficient to decide whether that's better or not.

>> Sorry, I don't understand. What do you do at the block boundaries? You
>> need to predict from somewhere. Again, if you stop the filter there (at
>> block boundaries) no matter what type of "extension" you pick, it will
>> impact the performance of the filter. It's a general problem of all such
>> filter types.
>
> It's a general problem of all transforms at edges. Regions can be any
> shape. A region can be a block, or a set of connected blocks, or the
> whole frame.
>
> Something like this: an inter frame can be divided into 2 regions: one
> region is coded independently, the other region is inter predicted.
> The region which is inter predicted is divided into subregions each
> subregion correspoding to particular motion vectors. The S step can be
> done on each region independently. The P step would be done without
> crossing subregion boundaries.
>
> So if in a frame the camera pans and all blocks have the same motion
> vectors, then the whole frame can be coded as one region and there
> would be no block artifacts.

That holds for any transform in case a global motion compensation is
possible. It's still a very specific case.

>> Sorry, but if I follow your algorithm, it operates on rows and columns
>> separately, so it is a separable transform, i.e. its 2D operation is
>> simply a tensor product of two identical one-dimensional filters. Isn't
>> that correct? What do I miss?
>
> The new transform can be described in 1 dimension or in 2 dimensions.
> In the application the 1-dimension description is used because it is
> simpler. In Libima it is implemented in 2 dimensions. See
> transform_new in file transformic.c

Ok, to ask this again, is this a separable transform, yes or no? I
suppose yes, but maybe I'm wrong.

So long,
Thomas

Thomas Richter

unread,

Jun 10, 2009, 10:56:51 PM6/10/09

to

dlima...@gmail.com wrote:
> On Jun 9, 10:05 pm, Mihai Cartoaje <mcarto...@gmail.com> wrote:
>> A description of my new transform for video compression is available:http://libima.com/video.htm
>
> I ran some comparisons with the images you provided versus existing
> standards and my work. For what it's worth these are the hard
> numbers.
>
> PSNR (dB)
> 33.81 IJG JPEG
> 35.69 "New transform"
> 36.01 Kakadu JPEG 2000
> 36.39 H.265
> 37.38 DLI
>
> MS-SSIM
> 0.97923 IJG JPEG
> 0.98346 "New transform"
> 0.98395 Kakadu JPEG 2000
> 0.98421 H.265
> 0.98802 DLI

That looks what I would expect - or at least it looks more realistic.
you're using Kakadu in a PSNR optimal mode here I suppose?

> For image compression, the new transform is outperformed by JPEG
> 2000. From the other postings in this thread I gather the new
> transform hasn't been applied in a video compression framework yet.
> It's claim of being a video compression transform is from a design
> standpoint, being compatible with block based compression. Not meant
> to discourage further work, but it's use with video compression is
> likely to result in similar performance versus current video codecs as
> the image compression results.

That's what I'm saying: It still needs to be verified, otherwise it's
just a claim.

So long,
Thomas

DL

unread,

Jun 11, 2009, 12:33:54 AM6/11/09

to

On Jun 10, 10:56 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:

> dlimagec...@gmail.com wrote:
> > On Jun 9, 10:05 pm, Mihai Cartoaje <mcarto...@gmail.com> wrote:
> >> A description of my new transform for video compression is available:http://libima.com/video.htm
>
> > I ran some comparisons with the images you provided versus existing
> > standards and my work. For what it's worth these are the hard
> > numbers.
>
> > PSNR (dB)
> > 33.81 IJG JPEG
> > 35.69 "New transform"
> > 36.01 Kakadu JPEG 2000
> > 36.39 H.265
> > 37.38 DLI
>
> > MS-SSIM
> > 0.97923 IJG JPEG
> > 0.98346 "New transform"
> > 0.98395 Kakadu JPEG 2000
> > 0.98421 H.265
> > 0.98802 DLI
>
> That looks what I would expect - or at least it looks more realistic.
> you're using Kakadu in a PSNR optimal mode here I suppose?

Yes, for grayscale images Kakadu only has the one mode I believe. The
image used is a grayscale Lena.

Take care,
Dennis

Mihai Cartoaje

unread,

Jun 11, 2009, 9:55:02 PM6/11/09

to

I wrote,

> > Sorry, I don't understand. What do you do at the block boundaries? You
> > need to predict from somewhere. Again, if you stop the filter there (at
> > block boundaries) no matter what type of "extension" you pick, it will
> > impact the performance of the filter. It's a general problem of all such
> > filter types.
>
> It's a general problem of all transforms at edges. Regions can be any
> shape. A region can be a block, or a set of connected blocks, or the
> whole frame.

The new transform has the option of stopping or not stopping at block
boundaries. If a block has different motion vectors than its
neighbors, then it stops at block boundaries. If a bock has the same
motion vectors as a neighbor, then the new transform can take
advantage of correlations along the shared boundary.

Thomas Richter

unread,

Jun 12, 2009, 12:01:27 AM6/12/09

to

Mihai Cartoaje wrote:

> The new transform has the option of stopping or not stopping at block
> boundaries. If a block has different motion vectors than its
> neighbors, then it stops at block boundaries. If a bock has the same
> motion vectors as a neighbor, then the new transform can take
> advantage of correlations along the shared boundary.

Well, I agree, but that's nothing special. See, "wavelets" also have
that property, namely I can define them in such a way that they "stop at
a block boundary". Instead of saying "I'm flipping coefficients across
edges", I could also say "I'm modifying the filter in the vicinity of
the edge", which is completely equivalent.

However, it doesn't help either. You do not need to convince me that you
can create a transform that stops on block boundaries. You need to
convince me that doing so doesn't impact performance. (-;

That said, you should probably really look into 15444-2 (JPEG 2000 part
2), or the publications quoted there-in. You'll find a couple of good
modifications for wavelet filters there that *do* improve the
performance on edges. The idea there is that, basically, the problem
arises from the high-passes and you need to modify the high-pass filter
a bit to avoid the trouble. It is also noted there that things work
considerably better if edges are at "even coefficient indices", which -
due to the conventions taken by JPEG 2000 - implies that they are at
low-pass coefficients.

You will also find there that such techniques are covered by patents,
which is one of the reasons why they are there, and not in part 1. I'm
not certain, but I seem to remember that Siemens was one of the patent
holders, and it was a general technique that could be applied to all WSS
filters. But that's from memory.

Greetings,
Thomas

Mihai Cartoaje

unread,

Jun 12, 2009, 12:14:44 AM6/12/09

to

> difference might be huge. What are you doing on the specific image you
> claim to be 0.7 dB better (at which rate, in first place)? The data you

The entire image is coded as one region.

> Ok, to ask this again, is this a separable transform, yes or no? I
> suppose yes, but maybe I'm wrong.

It depends on how the prediction is done. With integers the transform
is not separable because it is difficult to do a normalized 1D Haar
transform.

Thomas Richter

unread,

Jun 12, 2009, 10:10:39 PM6/12/09

to

Ok, thus I presume that "up to implementation details" it is separable
(that said, the 5/3 integer in JPEG 2000 is neither separable in a
precise sense due to the implied rounding, but this is not quite what I
meant.) (-:

So long,
Thomas