H.264 is an open video compression standard. Uniquely, H.264 is the first compression format to be formed by collaboration between members of both the IT and telecommunications industries and each have their own name for it. H.264 is the name used by the ITU-T (International Telecommunication Union) and MPEG-4 Part 10 AVC (Advanced Video Coding) is the name used by the ISO (International Organization for Standardization). The video surveillance industry has adopted the term H.264 and this has become the primary reference to the standard. This is also the term we use.
H.264 is fast becoming the standard video compression format for the video surveillance world and if we look at the claims it makes we can see why. We hear bold statements about low bandwidth usage, reduced storage requirements, higher resolution monitoring and better quality images and it all sounds too good to be true… doesn’t it?
Benefits for security surveillance
Demands from the security industry constantly push for higher resolution monitoring and faster frame rates without any compromise on image quality. With conventional compression formats this just isn’t possible but with the introduction of H.264 we can see many benefits which can improve the quality of security surveillance applications.
A bit about networking
The bitrate is the total number of bits traveling between 2 devices at a given point. This is typically measured in seconds and expressed as kilobits per second (Kbps), Megabits per second (Mbps 1Mb = 1000Kb) or Gigabits per second (Gbps 1Gb = 1000Mb). It can also be known as the transmission speed, transmission rate or data rate. The bitrate of an IP camera directly affects the bandwidth usage on a network. Bandwidth is the term used to describe the maximum amount of data which can travel along a specific channel in a fixed amount of time. i.e. it’s the maximum amount of data you can transfer over your network at any given time.
Bandwidth is a probably the most valuable commodity in networking. If your surveillance system uses more bandwidth than is available then you will lose live video feeds and have breaks in your recordings. If you have computers attached to the same network their network connections will become slow and unusable.
What H.264 offers is a low bitrate for a reduction in bandwidth usage. In fact it offers significant reductions, 80% lower than Motion JPEG video and 30-50% lower than MPEG-4.
Lowering the bitrate means that more data can be transmitted which increases the transmission rate. This is ideal for security applications which need a fast frame rate such as casinos, traffic monitoring, object counting (such as vehicles, people) etc.
Having a low bitrate also reduces the file sizes for surveillance recordings. When designing a new video surveillance system attention has to be made to the amount of storage required for recordings. The maximum amount of storage space indicates how many days of archive material you can retain at any one time. Using H.264 will provide 30-80% total saving on storage space compared to conventional compression formats. This reduces the cost of ownership or allows you to dramatically increase the retention period for your recorded archives.
Using H.264 allows you to make cost savings when designing a new surveillance system. For example, if H.264 provides a 50% saving over conventional compression methods this would mean you could halve the amount of storage attached to your system or you can retain archives for twice as long. This makes H.264 an ideal format to use when designing large systems which require a lot of storage.
Of course, the savings in bandwidth and storage space mean nothing if the image quality suffers. For us this is where H.264 shines. We have taken a look at quite a few IP cameras with H.264 support and can say that the image quality does not from suffer. We would say that it is almost JPEG quality and certainly outclasses MPEG-4, especially with moving objects, particularly in mid-high bandwidth networks.
Image quality is of great importance for any security surveillance application. Without high quality images a security system is worthless. H.264 takes the benefits of MPEG-4, improves on them and also provides far better quality output making it ideal for security around the home, small businesses or large businesses with mission critical needs.
The difference in quality can be attributed to a few factors. Advanced motion compensation, intra-prediction encoding, and an in-loop de-blocking filter which smoothes out the images. We’ll talk more about how H.264 works later in this article.
Latency is the time it takes to encode, send, decode and display the video to the operator. H.264 offers a low latency (in milliseconds) which is a necessity for surveillance monitoring. Without low latency the images would not appear in real-time to control room operators and cameras with pan-tilt-zoom (PTZ) control would be difficult to operate as the images would not update in time with the controls.
Higher Res/megapixel uptake
We can look at H.264 from another angle. When you use H.264 to view a conventional VGA video stream you will make great savings in terms of bandwidth and storage requirements as we have already mentioned but what it also gives us is the ability to deliver higher resolution video and better image quality at the same bitrates are we are using for conventional VGA cameras.
The professional market is always looking for higher resolution video and faster frame rates. We think the quick adoption of H.264 will see a rise in megapixel and HDTV cameras which will be able to meet the market’s demands while maintaining current levels of bandwidth and storage use. Axis has already released their first HDTV camera with the Q1755 and the results are impressive.
Video compression involves the removal of redundant video data so that the video file can be transmitted or stored effectively. Without any form of compression the raw data rate of a video file would exceed 150Mbps which is 300 times the bandwidth of a 512Kbps ADSL connection and would only store around 1 hour’s worth of video on an 80GB hard disk drive. This is why we need to use compression.
Video compression works by using a pair of algorithms. One algorithm is used to encode the source video stream while a reverse algorithm is used to decode the video stream and display it at the same quality as the source video. Combined, these two algorithms are known as a codec (encode/decode, compress/decompress).
In IP video the encoding would be done by the camera/video encoder and the decoding is normally done on the computer/device which is displaying the live video.
How does compression work?
We’ll try to keep this as simple as we can.
Video compression formats such as H.264 and MPEG-4 use a technique known as ‘difference coding’. What this does is try to compare the difference between the current video frame with the preceding frame to ensure that information which does not change (static background etc.) is not repeatedly transmitted.
When encoding the video frames the encoder splits each video frame into specific types. These different types of frames are known as I-frames, P-frames and B-frames. Here is an example of a GOP (Group of Pictures) structure for video compression:
GOP Structure for H.264/MPEG-4
I-frame: An intra-frame, or I-frame, is a video frame which has been encoded without any reference to any other frame. A video file will always start with an I-frame and will have subsequent I-frames added at regular intervals. I-frames are also known as key-frames or access points and are important for random access of video files such as rewind, fast-forward and seek operations. The downside to an I-frame is that they are the largest in terms of size as the whole video frame is encoded every time.
P-frame: A predictive inter-frame, or P-frame uses previous I or P-frames as a reference when encoding. This means a P-frame will analyze a previous I or P-frame for any static elements which do not change between frames. Any areas which do not change are not encoded therefore a P-frame only stores video which registers movement making them much smaller than I-frames. The downside to P-frames is that they are sensitive to transmission errors because of their dependency on earlier frames.
B-frame: A bi-predictive inter frame, or B-frame makes reference to both a preceding reference frame as well as a future reference frame. Using B-frames improves the prediction and ultimately the quality of decoded video but it also increases the processing requirements and latency so are not generally used in the profile for IP video.
Block-Based Motion Compensation
With video containing a lot of movement difference coding alone is not enough to achieve high levels of compression so during the encoding process it can also use a method known as block-based motion compensation to estimate which objects are moving and where they are moving to.
Block-based motion compensation works on the principle that most of the pixels which make up a new frame can be found in a previous frame, just in a different location. The frame is split into a series of macroblocks and is analyzed for pixel patterns. If a block of pixels is found at a different location then the encoder simply codes the new location instead of the pixel information. Recording the location takes fewer bits to encode than the full pixel information found in the block.
Results from encoders using H.264 may vary as the manufacturer may choose a different implementation of the H.264 standard in their encoder. This is acceptable as long as the output from the encoder conforms to the standard and is compatible with a H.264 decoder. This makes H.264 an open standard which is flexible enough to adapt to a wide range of uses.
Within the H.264 standard is a range of capabilities which are known as profiles. These profiles range from low-cost implementations for video-conferencing and mobile use, up to high quality production applications and broadcast use.
For IP surveillance the most commonly used profile is known as the ‘baseline profile’. This is a low-cost profile which uses less computational resources and has a very low latency, ideal for live surveillance video and real-time PTZ operation. It achieves this by using a simplified implementation of the H.264 standard which only uses I and P-frames.
GOP Structure for H.264 Baseline Profile
What makes H.264 special?
H.264 takes video compression to greater heights. It uses advanced techniques which account for its high performance and picture quality:
Intra-prediction for I-frames
New for H.264 is an advanced intra-prediction scheme for encoding I-frames. This technique greatly reduces the size of an I-frame and as I-frames are the largest of the video frames this has a great impact on the overall size of the video file. H.264 achieves a smaller bit size for the frame while maintaining the quality by predicting smaller blocks of pixels which have been previously encoded. It looks for matches among pixels surrounding a new macroblock and reuses the pixel information if a match was found.
Theoretically, if we were to encode a video using H.264 with a GOP structure of purely I-frames this would result in a smaller bitrate than MJPEG compression (which is a format which uses continuous I-frames) because of the inter-prediction for I-frames built in to H.264.
In-loop deblocking filter
A notable benefit of H.264 is the image quality it produces. We feel that the images are of far higher quality than you would get from MPEG-4 compression, sometimes even coming close to JPEG quality. Some of this is down to the in-loop deblocking filter.
The in-loop deblocking filter helps to reduce image artifacts by smoothing the edges using an adaptive filter. The result is compressed video with fewer artifacts providing crisp, clear images every time.
Comparison to MPEG-4 compression
H.264 offers a bitrate which is typically 25-50% lower than MPEG-4 and this is down to H.264’s inter-prediction encoding and advanced motion estimation techniques. The image quality is also noticeably better by comparison due in-part to better prediction of I-, P- and B-frames as well as the in-loop deblocking filter which reduces image artefacting.
Comparison to MJPEG compression
MJPEG will produce a better overall image quality than H.264 but that is down to the format not using any form of difference coding at all. It makes no predictions and encodes every frame as an I-frame. This makes every image very high quality but is typically up to 80% larger than a H.264 stream. H.264 provides a more efficient method of compression, reducing the bitrate of streaming video so that it will use far less bandwidth over the network and take up less storage space for recordings while providing comparable images.
The downside to H.264 is the computational power it requires from both the encoder and decoder. As the compression techniques are more advanced in H.264 so is the hardware that is required to encode and decode the video.
This is the main reason that up until recently H.264 could only be found in some of the higher-end security cameras (such as the Sony SNC-RX series), the cost of extra computational power was prohibitive for entry-level IP cameras. This has recently changed with the launch of the Axis M10 series which provides H.264 compression (as well as MJPEG and MPEG-4 options) at an affordable price.
Processing power at the decoder is also increased. It is stated that the decoder complexity is around 2 times higher than MPEG-4 which means that certain allowances have to be made at the decoder to ensure that there will be enough processing power available to decode the number of required cameras effectively.
We ran some tests using an Axis M1011 entry level home security IP camera. This camera is especially useful when testing having support for H.264, MJPEG and MPEG-4 in one device.
We ran the camera in all 3 compression formats on a direct local connection with a crossover cable to a PC running a copy of Wireshark. The Wireshark software measures the bitrate coming through the Ethernet port. The scene was the same for all 3 tests and there was no movement, illumination approx. 265 lux. The results were as follows:
We ran the same test but with the camera pointed at a different scene showing a lot of movement, illumination approx. 150 lux:
Our results show a significant saving for H.264 recording a 40% saving against MPEG-4 for a static image and approx. 30% saving against MPEG-4 when there is a lot of movement.
We can see from the following 3 videos that the MJPEG video provides the best quality, which is to be expected but what is really telling is the difference between MPEG-4 and H.264. The MPEG-4 video shows typical artifacting when there is a lot of movement while the H.264 video remains artifact-free and is half the size.
We ran the Axis M1011 camera on a standard Windows XP machine with no other programs running in the background and viewed live video using the built-in Axis ActiveX control in all three compression formats while monitoring the CPU load.
We expected to see a clear indication that H.264 used double the processing requirements of MPEG-4 but that wasn’t the case. Both MPEG-4 and H.264 used between 0-15% of the processor at any given time while the computer was displaying live video. The only indication that H.264 used a bit extra was that (very) occasionally it would peak to 19% but drop right down again where MPEG-4 peaked at 15%.
Although our test was in no way conclusive we didn’t see the kind of results that would lead us to believe that H.264 was prohibitively processor intensive. If you study the required specifications of an H.264 decoder from the manufacturer you will also see they are pretty low and nothing to be alarmed about.
H.264 is a huge step forward for not only the video surveillance world but for video compression in general.
We feel H.264 is hugely important compression format for the security industry. The claims it makes about reduced bitrates are confirmed and we can say that the image quality is superior to MPEG-4 and even comes close to JPEG quality it’s that good.
Sure it does come at a cost. The processing power in the camera/encoder does have to be higher than those cameras which only run in MPEG-4 or MJPEG mode but what we’ve seen with the latest M10 series of cameras from Axis is that they can now be produced at an affordable price. We also feel that the arguments about greater computational power required from the decoder would have held better when the format was released over 5 years ago. Today’s computers should have no real problems as our findings showed us and let us not forget that these concerns were raised about MPEG-4 when it was first introduced into IP surveillance equipment several years ago.
H.264 is not going away. Being able to lower the bitrate and reduce your storage requirements while maintaining a high resolution is too valuable to ignore. Today it is common for us to design our surveillance systems around the available storage capacity or bandwidth allowance. H.264 will allow us more flexibility. As systems grow larger and the demands for high resolution images and faster frame rates increase, H.264 will be a key differentiator between various system solutions and for those who choose to ignore this new standard, we think they will be left behind.