3dimensional transform coding

Dario Salvi

unread,

Jul 12, 2003, 10:37:37 AM7/12/03

to

Hello,

I was thinking this today: why don' t we use a 3dimensional transform to
code a video stream ?

I mean the mpeg standard uses a 2dimensional dct on pixel blocks and then
quantizes it (as in Jpeg), now is there any format taht transorms a pixel
cube and then quantizes it ?

Thanks a lot,

Dario

Thomas Richter

unread,

Jul 12, 2003, 11:01:14 AM7/12/03

to

Dario Salvi wrote:

> I mean the mpeg standard uses a 2dimensional dct on pixel blocks and then
> quantizes it (as in Jpeg), now is there any format taht transorms a pixel
> cube and then quantizes it ?

There are actually some experimental video codecs which use a 3D
transformation for the motion estimation. However, time and space are
not fully symmetric in video, thus it is not quite clear whether this
idea is as clever as it may sound. A good still-image compression in x-y
direction is most likely a bad motion predition in t direction.

So long,
Thomas

Dario Salvi

unread,

Jul 12, 2003, 2:52:28 PM7/12/03

to

Thomas Richter wrote:
> There are actually some experimental video codecs which use a 3D
> transformation for the motion estimation.

Can you post some link ?

> However, time and space are
> not fully symmetric in video, thus it is not quite clear whether this
> idea is as clever as it may sound.

My professor said that the problem initially was the delay: if you want to
transform an 8x8x8 pixel cube you have to wait 8 frames.
But actually the mpeg waits 8 or more (can't remember exactly now) frames to
make the motion compensation.

>A good still-image compression in
> x-y direction is most likely a bad motion predition in t direction.

How this happens ?
Does it depend on the choice of the quantization coefficients?

Can it be just a matter of technology ? Maybe they should find a better
quantizer.

Thanks in advance,

Dario Salvi

unread,

Jul 13, 2003, 7:57:20 AM7/13/03

to

I've found these good links:

http://www.itecohio.net/Presentations/TAF%20recipients/Video%20Compression_Zheng1.ppt
They use a 3dimensional wavelet

http://www.icspat.com/papers/395mfi.pdf
They use a 3dimensional dct

http://www.elektrorevue.cz/clanky/03009/english.htm
Another good work about 3d dct

Real-Time Video Compression: Techniques and Algorithms (Kluwer International
Series in Engineering and Computer Science, 376)
by Raymond Westwater, Borko Furht (Contributor), Borivoje Furht

A good book about this subject

http://research.microsoft.com/asia/dload_files/g-imedia/spli/IMG-3.pdf
They use a 3dimensional Shape Adaptive Wavelet

http://lcavwww.epfl.ch/~sbaiz/jsac98.pdf
Here they use a 3dimensional motion estimating to do the motion compensation

Thomas Richter

unread,

Jul 13, 2003, 11:19:44 AM7/13/03

to

Dario Salvi wrote:

>>There are actually some experimental video codecs which use a 3D
>>transformation for the motion estimation.
>
>
> Can you post some link ?

Unfortunately, I don't have them handy, but I could ask some people that
should know better than I do.

>>However, time and space are
>>not fully symmetric in video, thus it is not quite clear whether this
>>idea is as clever as it may sound.
>
>
> My professor said that the problem initially was the delay: if you want to
> transform an 8x8x8 pixel cube you have to wait 8 frames.

This is not a primary problem. Similar problems hit you all the time for
video compression, e.g. for "B-frame" encoding. You always have a delay.

>>A good still-image compression in
>>x-y direction is most likely a bad motion predition in t direction.
>
>
> How this happens ?
> Does it depend on the choice of the quantization coefficients?

The basic law of compression: A good compression requires a good model
for the data it should compress. Now, is it sensitive to speak over
moving objects in terms of "frequencies" or "wavelets"? Less so. You'd
rather should find the objects in the previous and the current frame and
should just transmit the distance the objects moved. This is what
classical "motion prediction" tries to do, and because this model fits
natural video scenes so well, it comes that this model is so popular.
For the same reason, it makes sense to encode audio information in
frequency space (i.e, use DCT for it).

This doesn't mean necessarely that these compression schemes are bad;
they might have advantages I'm not aware of. But at least, you'd to work
harder to make the better than classical motion prediction. (-;

> Can it be just a matter of technology ? Maybe they should find a better
> quantizer.

Depends on how you define what a "quantizer" is. In my traditional
p.o.v., the quantizer (= discretizer of measured values, no transform
quantization considered) is the least problem. The transformation of the
data into a space that can be suitable quantized without huge quality
impact is the goal.

So long,
Thomas

Dario Salvi

unread,

Jul 14, 2003, 6:05:57 AM7/14/03

to

Thomas Richter wrote:

> Unfortunately, I don't have them handy, but I could ask some people
> that should know better than I do.

I think it would be interesting to read something about it.
It would be even more interesting starting a project... If there is someone
interested in this topic can contact me.

> The basic law of compression: A good compression requires a good model
> for the data it should compress. Now, is it sensitive to speak over
> moving objects in terms of "frequencies" or "wavelets"? Less so. You'd
> rather should find the objects in the previous and the current frame
> and should just transmit the distance the objects moved. This is what
> classical "motion prediction" tries to do, and because this model fits
> natural video scenes so well, it comes that this model is so popular.
> For the same reason, it makes sense to encode audio information in
> frequency space (i.e, use DCT for it).
>
> This doesn't mean necessarely that these compression schemes are bad;
> they might have advantages I'm not aware of. But at least, you'd to
> work harder to make the better than classical motion prediction. (-;

Motion preditiction is a well known subject, maybe finding a different
scheme, based on a 3dimensional transform, would lead us to better
performance...IMHO It is obviuos that the motion preditcion is easier to
study, because a lot of work is already done !

I know that the reason why dct is used is because it is a good transform to
decorrelate data.
In theory the best transform is the klt, but it is not used because it is
based on pdf of the data which is not known and has to be estimated.

I don't know if the dct is a good decorrelator always, or only on certain
type of data.

The mtoion prediction is based on an object oriented model, this model would
perform, in theory, the best compression, but the reality is not always
"object oriented", and the complexity is very high. I know that a lot of
work is done now to find a good object representation of video objects, but
I think that it's too early: it is now too difficult to find objects in
video stream. So maybe it is a good idea to explore the classical method,
the one with the transform, a little bit more.

I don't think that the object oriented model is a better one: well it should
be in the case you are able to find objects in a video stream. Otherwise why
don't we use the same model to compress single images too ?

They are two completely different kind of approach: one (the dct or any
other transform) is based on a generic mathematical model, the other (object
oriented) is based on an informatic model, it is more semantic, and says
less about "how" to operate.
Research has to be done in both directions.
This is what I know and what I think

> Depends on how you define what a "quantizer" is.

>The transformation of
> the data into a space that can be suitable quantized without huge
> quality impact is the goal.

The quantizer for me is the one that quantize the transformed space: the one
that reduce the information and allow the compression...
All the data loss is there, and it is not simple to find a good quantizer I
think. All the work is there: finding a good compromise between compression
and quality.

Dario

Camping Gaz

unread,

Aug 7, 2003, 12:30:12 AM8/7/03

to

no :)

in mpeg, the sequence of frames

I B B B P1 B B B P2

is stored as

I P1 B B B P2 B B B

so that the delay is no more than 1 frame
otherwise you'd have "buffer overflow" anyway

and yes the delay problem is indeed a serious issue :)

"Dario Salvi" <dars...@iol.it> a écrit dans le message de
news:MNYPa.168881$Ny5.4...@twister2.libero.it...