Charles A. Poynton
Sun Microsystems Computer Corp.
(c) April 27, 1992
This note describes video luma (Y), and outlines six video-related colour
spaces: (Y, B-Y, R-Y), YPbPr, YCbCr, Kodak PhotoYCC[tm], YUV and YIQ.
All of these spaces share the same luma (Y) signal; the other two signals
are "colour differences" that differ only in scaling (and in the case of
IQ, a rotation). The different scalings are appropriate for different
Often the symbol YUV is used loosely to represent generic
colour-difference space. The scaling associated with YUV space is not
correct for component digital video, and YCbCr should almost always be
To clear up some initial confusion, the symbol "Y" as used in video does
**NOT** have the same meaning as the symbol Y defined by the CIE and used
in colour science (such as in the CIE XYZ and CIE xyY colour spaces).
Both are associated with brightness, but in colour science, luminance --
denoted Y -- is proportional to linear-light intensity. Video originates
with linear light, but applies a power function with an exponent of about
0.45 to implement "gamma-correction". I call the video Y symbol "luma" to
differentiate it from CIE "luminance".
Human vision treats brightness information very differently than colour
information, and this is fundamentally the reason that both colour science
and television identify a special channel to convey brightness-related
Video uses U and V components to represent colour differences, but these
are essentially unrelated to colour science (u, v) or (u', v') or
(u*, v*). All of these pairs can loosely be described as representing
"chroma" but they are numerically and functionally different. Video YUV
is neither based on nor superceded by any of the CIE colour-difference
Video originates with linear-light ("tristimulus") RGB primary components,
conventionally contained in the range 0 (black) to +1 (white).
From the RGB triple, three gamma-corrected primary signals are computed;
each is essentially the 0.45-power of the corresponding tristimulus value,
similar to a square-root function.
In a practical system such as a television camera, however, in order to
minimize noise in the dark regions of the picture it is necessary to limit
the slope (gain) of the curve near black. It is now standard to limit gain
to 4.5 below a tristimulus value of +0.018, and to stretch the remainder of
the curve to place the Y-intercept at -0.099 in order to maintain function
and tangent continuity at the breakpoint:
Rgamma = (1.099 * pow(R,0.45)) - 0.099
Ggamma = (1.099 * pow(G,0.45)) - 0.099
Bgamma = (1.099 * pow(B,0.45)) - 0.099
Luma is then computed as a weighted sum of the gamma-corrected primaries:
Y = 0.299*Rgamma + 0.587*Ggamma + 0.114*Bgamma
The three coefficients in this equation correspond to the sensitivity of
human vision to each of the RGB primaries standardized for video. For
example, the low value of the blue coefficient is a consequence of
saturated blue colours being perceived as having low brightness.
The luma coefficients are also a function of the white point (or
"chromaticity of reference white"). Computer users commonly have a white
point with a colour temperature in the range of 9300 K, which contains
twice as much blue as the daylight reference CIE D65 used in television.
This is reflected in pictures and monitors that look too blue.
Although television primaries have changed over the years since the
adoption of the NTSC standard in 1953, the coefficients of the luma
equation for 525 and 625 line video have remained unchanged. For
HDTV, the primaries are different and the luma coefficients have been
standardized with somewhat different values.
The human visual system has much less acuity for spatial variation of
colour than for brightness. Rather than conveying RGB, it is advantageous
to convey luma in one channel, and colour information that has had luma
"removed" in the two other channels. In an analog system, the two colour
channels can have less bandwidth, typically one-third that of luma. In a
digital system each of the two colour channels can have considerably
less data rate (or data capacity) than luma.
Green dominates the luma channel: about 59% of the luma signal comprises
green information. Therefore it is sensible -- and advantageous, for
signal-to-noise reasons -- to base the two colour channels on blue and
red. The simplest way to "remove" luma from each of these is to subtract
it to form the "difference" between a primary colour and luma. Hence, the
basic video colour-difference pair is (B-Y), (R-Y) [pronounced "B
minus Y, R minus Y"].
The (B-Y) signal reaches its extreme values at blue (R=0, G=0, B=1;
Y=0.114; B-Y=+0.886) and at yellow (R=1, G=1, B=0; Y=0.886; B-Y=-0.886).
Similarly, the extrema of (R-Y), +-0.701, occur at red and cyan. These
are inconvenient values for both digital and analog systems. The colour
spaces YPbPr, YCbCr, PhotoYCC and YUV are simply scaled versions of (Y,
B-Y, R-Y) that place the extrema of the colour difference channels at more
If three components are to be conveyed in three separate channels with
identical unity excursions, then the Pb and Pr colour difference
components are used:
Pb = (0.5/0.886) * (Bgamma - Y)
Pr = (0.5/0.701) * (Rgamma - Y)
These scale factors limit the excursion of EACH colour difference
component to -0.5 .. +0.5 with respect to unity Y excursion: 0.886 is just
unity less the luma coefficient of blue. In the analog domain Y is usually
0 mV (black) to 700 mV (white), and Pb and Pr are usually +- 350 mV.
YPbPr is part of the CCIR Rec. 709 HDTV standard, although different luma
coefficients are used, and it is denoted E'Pb and E'Pr with subscript
arrangement too complicated to be written here.
YPbPr is employed by component analog video equipment such as M-II and
BetaCam; Pb and Pr bandwidth is half that of luma.
The international standard CCIR Recommendation 601-1 specifies eight-bit
digital coding for component video, with black at luma code 16 and white
at luma code 235, and chroma in eight-bit two's complement form centred on
128 with a peak at code 224. This coding has a slightly smaller excursion
for luma than for chroma: luma has 219 "risers" compared to 224 for Cb and
Cr. The notation CbCr distinguishes this set from PbPr where the luma and
chroma excursions are identical.
For Rec. 601-1 coding in eight bits per component,
Y_8b = 16 + 219 * Y
Cb_8b = 128 + 112 * (0.5/0.886) * (Bgamma - Y)
Cr_8b = 128 + 112 * (0.5/0.701) * (Rgamma - Y)
Some computer applications place black at luma code 0 and white at luma
code 255. In this case, the scaling and offsets above can be changed
accordingly, although broadcast-quality video requires the accommodation
for headroom and footroom provided in the CCIR 601-1 equations.
CCIR Rec. 601-1 calls for two-to-one horizontal subsampling of Cb and Cr,
to achieve 2/3 the data rate of RGB with virtually no perceptible
penalty. This is denoted 4:2:2. A few digital video systems have utilized
horizontal subsampling by a factor of four, denoted 4:1:1. JPEG and MPEG
normally subsample Cb and Cr two-to-one horizontally and also two-to-one
vertically, to get 1/2 the data rate of RGB. No standard nomenclature has
been adopted to describe vertical subsampling. To get good results using
subsampling you should not just drop and replicate pixels, but implement
proper decimation and interpolation filters.
YCbCr coding is employed by D-1 component digital video equipment.
Kodak PhotoYCC [tm]
Kodak's PhotoYCC colour space (for PhotoCD) is similar to YCbCr, except
that Y is coded with lots of headroom and no footroom, and the scaling of
Cb and Cr is different from that of Rec. 601-1 in order to accommodate a
wider colour gamut:
Y_8bit = (255/1.402) * Y
C1_8bit = 156 + 111.40 * (Bgamma - Y)
C2_8bit = 137 + 135.64 * (Rgamma - Y)
The C1 and C2 components are subsequently subsampled by factors of two
horizontally and vertically, but that subsampling should be considered a
feature of the compression process and not of the colour space.
In composite NTSC, PAL or S-video systems, it is necessary to scale (B-Y)
and (R-Y) so that the composite NTSC or PAL signal -- luma plus modulated
chroma -- is contained within the range -1/3 to +4/3. These limits reflect
the capability of composite signal recording or transmission channel. The
scale factors are obtained by two simultaneous equations involving both
B-Y and R-Y, because the limits of the composite excursion are reached at
combinations of B-Y and R-Y that are intermediate to primary colours. The
scale factors are as follows:
U = 0.493 * (B - Y)
V = 0.877 * (R - Y)
U and V components are typically modulated into a "chroma" component:
C = U*cos(t) + V*sin(t)
where t represents the ~3.58 MHz NTSC colour subcarrier. PAL coding is
similar, except that the V component switches Phase on Alternate Lines
(+-1), and the subcarrier is at a different frequency, about 4.43 MHz.
It is conventional for an NTSC luma signal in a composite environment
(NTSC or S-video) to have "7.5% setup":
Y_setup = (3/40) + (37/40) * Y
A PAL signal has zero setup.
The two signals Y (or Y_setup) and C can be conveyed separately across an
S-video interface, or Y and C can be combined ("encoded") into composite
NTSC or PAL:
NTSC = Y_setup + C
PAL = Y + C
U and V are only appropriate for composite transmission as 1-wire NTSC or
PAL, or 2-wire S-video. The UV scaling (or the IQ set, described below) is
incorrect when the signal is conveyed as three separate components.
Certain component video equipment has connectors labelled "YUV" that in
fact convey YPbPr signals.
The U and V signals above must be carried with equal bandwidth, albeit
less than that of luma. However, the human visual system has less spatial
acuity for magenta-green transitions than it does for red-cyan. Thus, if
signals I and Q are formed from a 123 degree rotation of U and V
respectively [sic], the Q signal can be more severely filtered than I --
to about 600 kHz, compared to about 1.3 MHz -- without being perceptible
to a viewer at typical TV viewing distance. YIQ is equivalent to YUV with
a 33 degree rotation -- AND AN AXIS FLIP -- in the UV plane. The first
edition of W.K. Pratt "Digital Image Processing" -- and presumably other
authors that follow that bible -- has a matrix that erroneously omits the
axis flip; the second edition corrects the error.
Since an analog NTSC decoder has no way of knowing whether the encoder was
encoding YUV or YIQ -- it cannot detect whether the encoder was running at
0 degree or 33 degree phase -- in analog usage the terms YUV and YIQ are
often used somewhat interchangeably. YIQ was important in the early days
of NTSC but most broadcasting equipment now encodes equiband U and V.
The D-2 composite digital DVTR (and the associated interface standard)
conveys NTSC modulated on the YIQ axes in the 525-line version and PAL
modulated on the YUV axes in the 625-line version.
"Television Engineering Handbook", second edition (1986) by K. Blair
Benson -- published by McGraw-Hill -- has an explanation of the principles
of colourimetry as applied to video systems. R.W.G. Hunt's book "Colour
Reproduction in Business, Science and Industry" is an excellent text on
many aspects of colour -- I recommend it highly -- but it has only a page
of rather dated information about NTSC and PAL composite video, and it
In 601 Digital or 4:2:2 standard, components are Y:U:V and EBU(european
broadcast union) clock rate is 13.5MHz. So Y is 13.5 MHZ and U and V is sampled
on 6.75Mhz separately. Thats why its called 422. I have designed this in 1987
when D1 was just to be released.(this was in ADAC st.convertor).
In Pal YUV and in NTSC its YIQ, now lots of people will argue and Iam not going
into details as if you do your theoratical calculation you will agree with me.
But now YUV is a universal name for Pal and Ntsc.(Only designers know the real
In Beta(dont get confused with the home format with this one) and MII component
format it is YUV format is used and only slight Luminance and chroma differs
between the two.
One of the cockups in graphic area is it was design by the normal hardware and
software engineers. If this field is tackled by broadcast engineers you will
not get lots of diff. standards and misleading technical information.
Broadcasters now going to 444 standard and I see no reason for this as existing
Pal and Ntsc is well supported with 422 and if HDTV become universal then we
may have think again about new digital format.
Thanks for the compliment on my theorarticle.
> One of the cockups in graphic area is it was design by the normal hardware
> and software engineers. If this field is tackled by broadcast engineers
> you will not get lots of diff. standards and misleading technical
In my opinion, Siri, it's the other way around. The "cockup" is that the
broadcasting experts use terminology loosely, and they mislead the normal
engineers. For example, as a broadcasting expert you say
> In 601 Digital or 4:2:2 standard, components are Y:U:V
but if you consult your copy of CCIR Rec. 601 you will find no mention of
YUV, only YCbCr; and if you compare the matrix in 601 with the YUV-RGB
matrix of your Television Engineering Handbook you will find that the
elements differ by up to 22%. YUV is not YCbCr.
> (Only designers know the real truth).
Well "the designers" may know the real truth but they have given us:
- four different component codings (YUV, YIQ, YCbCr and YPbPr);
- four different sets of primary chromaticities (FCC 1953 NTSC, SMPTE
RP-145, EBU 3213 and CCIR 709);
- three different gamma values (NTSC 2.2, EBU 2.8, CCIR 709 1/0.45);
- two possible setup values (zero and 7.5%);
- two different picture-to-sync ratios (10:4 and 7:3);
- four subcarrier frequencies (3.579, 3.575, 3.582, 4.433 MHz);
and this is not even to mention the French technology triumph SECAM.
> if HDTV become universal then we may have think again about new digital
One of my intentions in painstakingly documenting the way video is
currently practiced is to get it right the next time. Maybe you've seen my
postings on 1920x1080, two megapixel, 24 Hz, square pixel HDTV?
Siri, I think it's time we experts paid attention to the needs of the
"normal hardware and software engineers".
> Siri Hewa.
> OTC Research
+> The U and V signals above must be carried with equal bandwidth, albeit
+> less than that of luma. However, the human visual system has less spatial
+> acuity for magenta-green transitions than it does for red-cyan. Thus, if
+> signals I and Q are formed from a 123 degree rotation of U and V
+> respectively [sic], the Q signal can be more severely filtered than I --
+> to about 600 kHz, compared to about 1.3 MHz -- without being perceptible
+> to a viewer at typical TV viewing distance. YIQ is equivalent to YUV with
+> a 33 degree rotation -- AND AN AXIS FLIP -- in the UV plane. The first
+> edition of W.K. Pratt "Digital Image Processing" -- and presumably other
+> authors that follow that bible -- has a matrix that erroneously omits the
+> axis flip; the second edition corrects the error.
The second edition (p. 68) corrects the error in the equations, but
still doesn't mention the axis flip in the text. It says "The I and Q
signals are related to the U and V signals by a simple rotation of
coordinates in color space."
I = - U sin(33) + V cos(33)
Q = U cos(33) + V sin(33)
I prefer looking at this in matrix notation as
_ _ _ _ _ _ _ _
| I | | 0 1 | | cos(33) sin(33) | | U |
| | = | | * | | * | | ,
| Q | | 1 0 | | -sin(33) cos(33) | | V |
|_ _| |_ _| |_ _| |_ _|
which shows the axis swap and the 33 degree rotation separately.
One note: I find it interesting that the conversion matrix is
its own inverse. What do you call a matrix that has that property?
Gary Sullivan (ga...@pictel.com)