ATSC/DTV Compression
Posted: Sat Dec 25, 2004 2:16 am
----- HDTV Magazine Tips List -----
Let me take a stab at this:
Jason Burroughs wrote:
>
>Hugh, there are 2 types of compression, lossy and lossless:
>
>Lossless makes a sort of table of contents and instead of saying
>111111111111111111 would say '14 1's' as one way of looking at it. In
>other words, since there is so much repetition, you can rewrite the data
>in a way that is shorter. This is what zip files on computers do.
Yes. This is known as "run-length coding".
Jason Burroughs also wrote:
>
>Lossy video does some things to make the file smaller but in a way that
>pulls data from the stream that the programmers decide probably will go
>unnoticed. Just like taking an analog signal from an audio master
> and
>convert to CD - they decided that frequencies above a certain level were
>undetectable and weren't needed. We all know that may not be true, but
>they did the best they could with the technology they had.
This is only partially correct. MPEG-2 compression (as well as many
other systems) use a number of different devices to reduce the amount
of information within a signal. One is called Discrete Cosine
Transform (DCT) compression, where the pixel information within an
8x8 pixel block of an individual frame is transformed into a table of
cosine values. Why cosine values? Because one of the principles of
complex waveforms is that you can select a variety of cosine wave
values and come extremely close to representing a complex waveform
(which video is). One of the tools of compression is to remove
certain of those cosine frequency values dynamically, changing from
one 8x8 pixel to another depending on how much data a given frame or
group of frames and what frequencies the eye may not perceive as well
as others within a picture.
How good the quality of the encode is depends on how well the
encoder's software is written to use all of these tools.
Second, the color information is subsampled to what is called
"4:2:0". All digital video today (except for some very specific
uses) uses 4:2:2 coded video, meaning that for every 4 samples of Y,
2 samples of Pr and 2 samples of Pb are taken. There are 2 4:2:2
samples taken per pixel, 1440 samples per 720 pixel line (for SD).
Since the human eye is relatively insensitive to color as opposed to
luma (Y), for final transmission you can subsample the chroma portion
of the video by averaging the PrPb samples for adjacent lines (lines
1 and 3 for interlace, lines 1 and 2 for progressive for example) so
you have only 2 samples each for Pr and Pb across every two lines.
The human eye has great difficulty seeing this averaging, but a good
example would be the softness and venetian blind effect you see on
the edge of very bright red areas of the picture bordered by black.
That is the chroma being averaged that you are seeing.
Third, MPEG-2 uses three types of compressed frames to compress
further: I, P and B frames.
I frames are single images that have only been DCT coded. These are
also known as 'anchor frames' since they start what is called a GOP:
Group of Pictures. P frames stands for "Predictive", where the
information from the I frame is re-used (since it hasn't changed or
has only changed position in the time between the I frame and the P
frame) and only the changes in location and the parts of the picture
not in the I frame are transmitted. P frames are moderately
compressed frames.
B frames stands for "bidirectional" frames and are the most heavily
compressed (ie the smallest data rate). B frames use information
from previous and subsequent I and P frames to recreate the frame,
and only transmit the differences between the frame being coded and
the previous and subsequent I and P frames.
The most common Group of Pictures sequence is IBBP (an anchor frame
followed by two B frames and ending with a P frame), but almost any
combination of frames can be created up to 120 frames long. The
benefit is you get the same picture quality at a much lower data
rate. The negative is if any interruption in the transmission occurs
in the middle of a GOP, the rest of the sequence (which could be many
frames long!) will have very noticeable artifacts since there are key
pieces of information missing. This is the blockiness or strange
color boxes you see in signals which are on the edge of breaking up:
they are P and B frames missing information needed to decode properly.
These are the basic tools MPEG-2 uses to compress, although there is
much more to it than just what I've listed here. That would take a
book to describe, and there are some good ones out there for those
who are interested.
Jason Burroughs further wrote:
>
>Raw HDTV is 150MB/sec. That is like taking a 2 megapixel digital camera
>and taking a photo and saving it on your computer in a .bmp file. It
>will be about 2MB. Save this file as a jpg and you have basically the
>same file but drastically smaller. Do that 60 more times and you have
>120MB in one second plus a little overhead. The thing is, most of the
>data on the screen doesn't actually change. So if there's a scene with a
>person in front of a house and the person moves, only the portion of the
>screen with the person has to be redrawn. Couple the lossless
>compression in the bmp to jpg example and the lossy compression and you
>wind up with a picture that looks *almost* as good as the original but
>with drastically smaller size.
>
>Jason Burroughs
This is DEAD WRONG.
Raw HDTV date rate is 1.22 Gbps for 1080i29.97; 978 Mbps for 720p60.
Slightly different rates for other frame rates, but very similar.
Including audio and other timing data, the data rate for broadcast
HDTV equipment to record and distribute uncompressed HD is 1.485
Gbps. That's 1,485 Mbps. This standard is known as HD-SDI: High
Definition Serial Digital Interface and is used throughout the
broadcast industry or SMPTE 292M, its standard number.
For SD, the raw data rate is 270 Mbps for 4:3 systems (whether
483i29.97 or 576i25), or 360 Mbps for 16:9 formatted 483 line (525 in
analog) or 576 line (625 line in analog). Some of you may be
wondering what these odd numbers are: in digital there is no such
thing as the Vertical Interval (those black bars at the top and
bottom of an analog signal which allows the electron beam to retrace
at the end of a frame) and so only the actual number of lines used
for picture transmission are included in digital terms.
The SD standard for broadcast recording and distributing is known as
SMPTE 259M or SDI: Serial Digital Interface.
If anyone would like to know the math that is behind these numbers,
email me off list and I'll be happy to walk you through it.
<Rodolfo> wrote:
>
>DVI or HDMI puts the removed data back into the signal.
>
No. The MPEG-2 decoder does the decoding, not the DVI or HDMI
connectors. They are simply conduits for getting the decoded signals
to your display.
Hope this helps!
James Snyder
Senior Video Technician
Intelsat Ground Network Systems
formerly DTV Consultant/Lecturer
PBS DTV Strategic Services Group
Harris/PBS DTV Express DTV Studio Technology lecturer
Let me take a stab at this:
Jason Burroughs wrote:
>
>Hugh, there are 2 types of compression, lossy and lossless:
>
>Lossless makes a sort of table of contents and instead of saying
>111111111111111111 would say '14 1's' as one way of looking at it. In
>other words, since there is so much repetition, you can rewrite the data
>in a way that is shorter. This is what zip files on computers do.
Yes. This is known as "run-length coding".
Jason Burroughs also wrote:
>
>Lossy video does some things to make the file smaller but in a way that
>pulls data from the stream that the programmers decide probably will go
>unnoticed. Just like taking an analog signal from an audio master
> and
>convert to CD - they decided that frequencies above a certain level were
>undetectable and weren't needed. We all know that may not be true, but
>they did the best they could with the technology they had.
This is only partially correct. MPEG-2 compression (as well as many
other systems) use a number of different devices to reduce the amount
of information within a signal. One is called Discrete Cosine
Transform (DCT) compression, where the pixel information within an
8x8 pixel block of an individual frame is transformed into a table of
cosine values. Why cosine values? Because one of the principles of
complex waveforms is that you can select a variety of cosine wave
values and come extremely close to representing a complex waveform
(which video is). One of the tools of compression is to remove
certain of those cosine frequency values dynamically, changing from
one 8x8 pixel to another depending on how much data a given frame or
group of frames and what frequencies the eye may not perceive as well
as others within a picture.
How good the quality of the encode is depends on how well the
encoder's software is written to use all of these tools.
Second, the color information is subsampled to what is called
"4:2:0". All digital video today (except for some very specific
uses) uses 4:2:2 coded video, meaning that for every 4 samples of Y,
2 samples of Pr and 2 samples of Pb are taken. There are 2 4:2:2
samples taken per pixel, 1440 samples per 720 pixel line (for SD).
Since the human eye is relatively insensitive to color as opposed to
luma (Y), for final transmission you can subsample the chroma portion
of the video by averaging the PrPb samples for adjacent lines (lines
1 and 3 for interlace, lines 1 and 2 for progressive for example) so
you have only 2 samples each for Pr and Pb across every two lines.
The human eye has great difficulty seeing this averaging, but a good
example would be the softness and venetian blind effect you see on
the edge of very bright red areas of the picture bordered by black.
That is the chroma being averaged that you are seeing.
Third, MPEG-2 uses three types of compressed frames to compress
further: I, P and B frames.
I frames are single images that have only been DCT coded. These are
also known as 'anchor frames' since they start what is called a GOP:
Group of Pictures. P frames stands for "Predictive", where the
information from the I frame is re-used (since it hasn't changed or
has only changed position in the time between the I frame and the P
frame) and only the changes in location and the parts of the picture
not in the I frame are transmitted. P frames are moderately
compressed frames.
B frames stands for "bidirectional" frames and are the most heavily
compressed (ie the smallest data rate). B frames use information
from previous and subsequent I and P frames to recreate the frame,
and only transmit the differences between the frame being coded and
the previous and subsequent I and P frames.
The most common Group of Pictures sequence is IBBP (an anchor frame
followed by two B frames and ending with a P frame), but almost any
combination of frames can be created up to 120 frames long. The
benefit is you get the same picture quality at a much lower data
rate. The negative is if any interruption in the transmission occurs
in the middle of a GOP, the rest of the sequence (which could be many
frames long!) will have very noticeable artifacts since there are key
pieces of information missing. This is the blockiness or strange
color boxes you see in signals which are on the edge of breaking up:
they are P and B frames missing information needed to decode properly.
These are the basic tools MPEG-2 uses to compress, although there is
much more to it than just what I've listed here. That would take a
book to describe, and there are some good ones out there for those
who are interested.
Jason Burroughs further wrote:
>
>Raw HDTV is 150MB/sec. That is like taking a 2 megapixel digital camera
>and taking a photo and saving it on your computer in a .bmp file. It
>will be about 2MB. Save this file as a jpg and you have basically the
>same file but drastically smaller. Do that 60 more times and you have
>120MB in one second plus a little overhead. The thing is, most of the
>data on the screen doesn't actually change. So if there's a scene with a
>person in front of a house and the person moves, only the portion of the
>screen with the person has to be redrawn. Couple the lossless
>compression in the bmp to jpg example and the lossy compression and you
>wind up with a picture that looks *almost* as good as the original but
>with drastically smaller size.
>
>Jason Burroughs
This is DEAD WRONG.
Raw HDTV date rate is 1.22 Gbps for 1080i29.97; 978 Mbps for 720p60.
Slightly different rates for other frame rates, but very similar.
Including audio and other timing data, the data rate for broadcast
HDTV equipment to record and distribute uncompressed HD is 1.485
Gbps. That's 1,485 Mbps. This standard is known as HD-SDI: High
Definition Serial Digital Interface and is used throughout the
broadcast industry or SMPTE 292M, its standard number.
For SD, the raw data rate is 270 Mbps for 4:3 systems (whether
483i29.97 or 576i25), or 360 Mbps for 16:9 formatted 483 line (525 in
analog) or 576 line (625 line in analog). Some of you may be
wondering what these odd numbers are: in digital there is no such
thing as the Vertical Interval (those black bars at the top and
bottom of an analog signal which allows the electron beam to retrace
at the end of a frame) and so only the actual number of lines used
for picture transmission are included in digital terms.
The SD standard for broadcast recording and distributing is known as
SMPTE 259M or SDI: Serial Digital Interface.
If anyone would like to know the math that is behind these numbers,
email me off list and I'll be happy to walk you through it.
<Rodolfo> wrote:
>
>DVI or HDMI puts the removed data back into the signal.
>
No. The MPEG-2 decoder does the decoding, not the DVI or HDMI
connectors. They are simply conduits for getting the decoded signals
to your display.
Hope this helps!
James Snyder
Senior Video Technician
Intelsat Ground Network Systems
formerly DTV Consultant/Lecturer
PBS DTV Strategic Services Group
Harris/PBS DTV Express DTV Studio Technology lecturer