In this article, I will explain to you about the various video redundancies and why eliminating them is of utmost importance and the first thing programmers tend to focus on when making a codec.
First lets go into what a redundancy is. In simple terms, Redundancy is the amount of useless data that exists in a specific information. “Useless”, here, is an extremely subjective term, and what I actually mean to say is though the data does contribute to the information, the contribution itself is so insignificant that reducing or eliminating it wont have any adverse effects.
Those coming here from my Handbrake Complete Tutorial Part 1: How to Transcode & Compress Videos article might remember how most video codecs process each frame of the video in terms of macroblocks. Basically, there are four kinds of redundancies that could exist in a lossless raw video:
- Spatial Redundancies: Spatial Video Redundancies are the similarities between the surrounding macroblocks. For example, consider the frame shown below:
- Temporal Redundancies: Temporal Video Redundancies are the similarities in successive frames. For example, consider the frames shown below:
- Psychovisual Redundancies: Psychovisual Video Redundancies exist due to the fact that the human eye is not equally sensitive to all visual information. So by eliminating the parts to which the eye is less sensitive, the frame can be compressed without a much noticeable quality loss.
- Coding Redundancies: In an uncompressed raw video, each pixel in a frame is usually encoded to a fixed length. This results in a file that would be much bigger than what is actually required, as the same length is used for each pixel in all the frames, regardless of whether it is required or not. This is called coding redundancy, and can be eliminated by using Variable Length Coding schemes. To see (albeit a crude example) how coding redundancies can be reduced, consider an 8bit 5x5px greyscale image, with the grey levels of the series of pixels going as;
120, 120, 120, 120, 120, 119, 119, 119, 119, 119, 118, 118, 118, 118, 117, 117, 117, 115, 115, 115
This could be represented as is, in which case a total of 160 bits would be required. Or, it could be represented much more efficiently as follows:
5.120, 5.119, 4.118, 3.117, 3.115
This is the basic principle of Run Length Encoding (RLE), one of the oldest and simplest encoding schemes.
Temporal Redundancies: In these frames, only the car is moving. The rest of the frame(the background) is stationary. So, there is no need to transmit this information in all the frames.
How these redundancies are processed and eliminated is one of the defining factors of a Codecs overall efficiency and performance.
Eliminating redundancies is extremely important, especially in the case of videos. This is because of the ridiculously high amount of redundant data that can exist in it. Consider a scenario where you have 86400 images averaging 350 KB per image(really not an inconceivable size for an image of that resolution). A 24fps video made out of that many images would be of around 3GB (for 30 mins). Let that sink in. Also in videos, you really wont see much of a difference between consecutive frames. So, all that data being used to represent it is, as I stated above, “Useless”!