Digital Video for the Next Millennium


This publication is copyright 1999 by the Video Development Initiative (ViDe). The document may not be reproduced, in whole or in part, without written permission from ViDe, except that a single copy for personal use may be printed by the reader. Please direct all comments to the author of this white paper.

   


Section Two: Video Encoding Standards
MPEG-1 (ISO/IEC 11172)

The first digital video and audio encoding standard, MPEG-1, was adopted as an international standard in 1992 to provide digital video at bit rates up to 1.5 Mb/sec. (The standard actually scales higher than 1.5 Mb, but 1.5 Mb is the accepted "sweet spot" for MPEG-1.) The impetus for the standard was to provide encoding and playback of VHS-quality digital video for CD-ROM playback. MPEG-1 is a progressive video sequence encoding standard. The standard implementation for MPEG-1 (known as "constrained bit stream") supports 352 pixels x 240 lines/sec at 30 frames/sec and requires 1.5 Mbit/sec bandwidth for transport. MPEG-1 compression relies on the considerable redundancy of information within and between frames to compress a video object without significantly compromising the integrity of the information it contains.

Video contains spatial, spectral and temporal redundancies, which may be compressed without significant sacrifice in meaning. The encoding techniques in MPEG-1 involve compression based on statistical redundancies in temporal and spatial directions. Spatial redundancy is based on the similarity in color values shared by adjacent pixels. A red sweater in a video frame will generally possess a uniform color value, with little or no perceptual variation from one pixel to the next. MPEG-1 employs intraframe spatial compression on redundant color values using DCT (discrete cosine transform).

Spectral redundancy in video is the similarity between color spectra or "brightness." MPEG-1 operates in the YUB color space. RGB data is converted to YUB. 24-bit RGB is subsampled at 4:2:0 YCrCB, where Y = luminance (brightness) and CrCB = Crominance (color difference). The human eye distinguishes difference in brightness more readily than difference in pure color value.

Temporal redundancy is the sameness in temporal motion between video frames. If frames were not redundant, there would be no perception of smooth, realistic motion in video. MPEG-1 relies on prediction--more precisely, motion-compensated prediction--for temporal compression between frames. MPEG-1 utilizes three frames to create temporal compression-I-Frames, B-frames and P-frames. An I-frame is an intra-coded frame, a single image heading a sequence, with no reference to past or future frames. MPEG-1 compresses only within the frame with no reference to previous or subsequent frames. P-frames are forward-predicted frames, encoded with reference to a past I- or P-frame, with pointers to information in a past frame. B-frames are encoded with reference to a past reference frame, a future reference frame or both. The motion vectors employed may be forward, backward, or both. B-frames are also sometimes known as digital video "spackle."

The MPEG-1 coding standard is a generic standard, intended to be independent of a specific application, serving as a toolbox to be adapted to different applications and their associated hardware and software.