The latest video compression standard – H.264 – represents a major development for IP video systems. Barry Keepence explains what’s behind the standard, and the benefits it can realise for end users. Illustrations courtesy of IndigoVision

H.264 is the latest oficial video compression standard. It follows on from the highly successful MPEG-2 and MPEG-4 video standards, offering improvements in both video quality and compression. The most significant benefit for IP video systems is the ability to deliver the same high quality, low latency, digital video with savings of between 25% and 50% on bandwidth and storage requirements. Put simply, it delivers significantly higher video quality for the same bandwidth.

H.264 is a video codec (compressor and decompressor) standard. A video codec is designed to compress and uncompress digital video in order to reduce the amount of bandwidth required to transmit and store the video. This is needed as the raw data rate of uncompressed CCIR601 active digital video (720 x 480 pixel 4:2:2 video at 30 fps) is in excess of 158 Mbps – over 300 times the capacity of a 512 kbps ADSL connection, and only just over one hour recording on an 80 Gb hard disk.

Simply scaling the video to SIF resolution (352 x 240 pixel 4:2:0 video at 30 fps) and compressing with standard utilities such as WinZip or Gzip could achieve 10:1 compression. However, at least 300:1 compression is needed to stream live video over an ADSL connection and to achieve 300 hours recording to an 80 Gb hard disk. This level of compression can be achieved with H.264.

Standards and their implementation

Before we look at H.264 in more detail, it’s important to understand the difference when making a comparison between a standard and the implementation of a standard. The two are very different. Thus when people say: “H.264 provides better video quality than MPEG-2” this is a little misleading. H.264 is a video compression standard. The H.264 standard defines the syntax of a compliant bit stream to which a compliant decoder must conform exactly, implementing all the necessary tools defined by the standard in order to decode the dedicated bit stream.

Conversely, an H.264 encoder can implement a subset of the syntax defined by the standard, provided it produces a compliant bit stream. Various implementations and algorithms within the encoder are also not defined by the standard, and are created by the designer of the encoder. As such, H.264 encoders from different vendors will produce streams of differing quality for the same bit rate.

Thus it is far more appropriate to say: “H.264 provides a richer syntax and toolset than MPEG-2 and, as such, allows the possibility of implementing a superior video encoder that can generate higher quality video for the same bit rate, and can generate the same quality video at a much lower bit rate”. This can be demonstrated using the reference software encoder (JM11) freely available from the International Standards Organisation (ISO). The H.264 reference encoder allows a user to select which tools to use in order to encode a particular video sequence.

The more tools and algorithms that are used the greater the compression achieved for the same quality of video. However, the addition of tools comes at the expense of increased complexity – in this case measured by the execution time of the encoding process. It is this increase in complexity that often causes some tools or algorithms to be omitted from the design of an H.264 encoder.

Relationship to MPEG-4 Part 2

MPEG-4 (ISO/IEC 14496) is a collection of standards defining the coding of audio-visual objects. The collection is divided into a number of parts describing video compression and audio compression standards, as well as system level parts, describing features such as the MPEG-4 file format.

The video compression standard found in many products is the traditional DCT-based MPEG-4 Part 2 (ISO/IEC 14496-2) standard.

The H.264 video compression standard has been incorporated into MPEG-4 as MPEG-4 Part 10 (ISO/IEC 14496-10). This means MPEG-4 now has two video compression standards available. However, these two video compression standards are non-interoperable, with each standard using different methods to compress and represent the data (ie an MPEG-4 Part 10 H.264 decoder cannot decode an MPEG-4 Part 2 bit stream and vice versa).

The best way to see the benefits of H.264 in IP video solutions is to look at an actual implementation of the standard, in this case IndigoVision’s new 9000 Series. Inside a 9000 transmitter, frames of video are captured from the camera and sent to the internal H.264 encoder to be compressed. Each frame of video is then compressed in one of two ways: as an I-frame or as a P-frame.

An I-frame is a video frame that has been encoded without reference to any other frame of video. A video stream or recording will always start with an I-frame and will typically contain regular I-frames throughout the stream. These regular I-frames – also called intra frames, key frames or access points – are crucial for random access of recorded H.264 files, such as with rewind and seek operations during playback. The regularity of these I-frames is known as the I-frame interval. However, the disadvantage of I-frames is that they tend to be much larger than P-frames.

P-frames are motion-compensated frames, that is to say the encoder makes use of the difference between the current frame being processed and a previous frame of video, ensuring that information that does not change (for example a static background) is not repeatedly transmitted. Unlike purely difference-based codecs, such as delta-MJPEG, H.264 not only looks for differences but searches for motion that has occurred in the video. This means that motion-compensated codecs will typically outperform simple difference-based codecs when there is motion. The process of searching for motion is known as motion estimation.

Within the codec the motion estimation unit is one of the most computationally expensive parts and critical to the performance of the H.264 encoder. Motion estimation is a complex procedure and often encoders, especially real-time software or DSP-based encoders, will use reduced search areas or a restrictive search algorithm in order to achieve real-time performance. However, this can often result in poor quality video and significantly reduced compression.

An example of the savings that can be achieved on a scene is demonstrated in the graph reproduced above. In this example, the same video sequence has been encoded using four different encoders: IndigoVision 8000 MPEG-4, the new IndigoVision 9000 H.264, an MPEG-4 encoder with no motion estimation and an MJPEG encoder. All were encoded at 25 fps (with the exception of MJPEG at 5 fps) to the same subjective video quality.

The graph shows that, compared to MPEG-4, H.264 can achieve savings of typically between 20% and 25% in bandwidth usage and in excess of 50% during periods of scene inactivity – ie when there is no moving traffic. Not only does this reduce the overall bandwidth requirements of the IP video system but, more importantly, it can significantly reduce the amount of storage required for recording the video – often one of the most expensive items in the system.

Demands on processing

It’s clear from looking at how H.264 can be implemented that the demands on the processing power of the codec are significant if the full range of features are used and the full benefits of the technology are to be realised. H.264 is a general purpose video compression standard not specifically designed for digital CCTV applications.

However, by using a custom FPGA-based design, the necessary processing power can be provided and the design tailored for CCTV applications.

For example, extra compression can be achieved when there is low activity in the video – a situation common in many surveillance applications. The custom FPGA approach has a number of other benefits, as follows:

*high quality video can be maintained during fast-moving activity without frames being dropped, regardless of bit rate and motion (this is paramount in applications such as casino gaming table surveillance);

*low cost, high performance encoding of 4SIF 30 fps video that is fully compliant with H.264;

*field upgrade to existing installations as compression standards advance;

*real-time analytics algorithms can be executed in high performance dedicated hardware rather than in software – doing this at the edge of the network, ie at the camera, makes for a truly scaleable solution.

Ultimately, H.264 offers significant benefits to the user and system designer. However, the extra complexity of the implementation comes at extra cost. H.264 will not replace MPEG-4 overnight, but rather sit alongside it providing a wider choice of solutions to the end user.

SIF or CIF? Which one is right?

Well, they are both right, but one means you have MPEG-4 while the other menas that you’re using an old H.261/H.263-based compression. Which one is which?, writes Brian Sims.
CIF and SIF are measures of video resolution. Basically, CIF resolution measures 352 x 288 pixels regardless of whether the video input is NTSC or PAL. SIF resolution, on the other hand, measures 352 x 288 pixels for PAL cameras but 352 x 240 for NTSC cameras. You may also see it in other literature expressed as 320 x 240 pixels. This is also a valid SIF resolution. Though it may not appear to be so from the numbers, there is absolutely no advantage to be gained from using CIF over and above SIF.

CIF is commonly associated with H.261/H.263 and SIF with MPEG. Unfortunately, because of the phonetic similarities (and the fact that, for PAL sources, they are identical) these terms are sometimes used interchangeably. Strictly speaking, though, they are distinct. Thus on occasion when individuals refer to CIF they actually mean SIF.

If your system is CIF (or 2CIF pr 4CIF) then you are using an old H.261/H.263-based codec. You will not be compliant with MPEG-4 or H.264. All MPEG-4 and H.264 systems are based on SIF. In other words, if you want genuine MPEG-4, whether Part 2 or Part 10, you will need to spell it with an ‘S’!

Downloads