What is the difference between Analog and Digital? To explain let’s imagine an audio recording of a tone. An analog recording would look like the original wave - all the details intact, it’s a copy- an analog to the original. On the other hand (pun intended) a Digital recording, breaks the wave into chunks called samples and the measures the amplitude of the wave at each sample and stores these measurements in a stream of binary code - a square wave of 0s and 1s. A digital player would reconstruct the wave using these measurements.
So right off the bat, you may think that analog is the better of the two formats - and you aren’t alone. There are plenty of people who swear that analog audio recordings are the best. But digital comes with some great advantages and analog simply doesn’t have.
The first is resistance to noise - introduce noise into an analog signal and you’re going to destroy the signal. Digital signals, because they’re either 0 or 1 and nothing in between, can withstand some noise and not lose any quality at all.
Digital is also easier to copy, there is no generation loss as analog loses a bit of quality every time it’s copied - like a game of telephone. Digital signals can also be synced up and read by computers which analog can’t. And very importantly for video, patterns can be found in the sequence of 1s and 0s in digital signals, so digital can be compressed - and that is key for making video as ubiquitous as it is today.
The First Digital Tapes
By the late 1970s and into the 80s, electronics manufacturers were experimenting with digital recording. The first commercially available digital video tape was made available in 1986 with the Sony D1 video tape recorder. This machine recorded a uncompressed standard definition component signal onto a ¾” tape at a whopping 173 million bits per second! that’s a lot of zeros and ones in a single second!
In comparison you are watching the video in this lesson in HD at a bit rate of only 5 million bits per second.
The D1 was expensive and only large networks could afford them. But they soon proved their worth as a rugged format. The Sony D1 would be challenged by Ampex with D2 in 1988, and Panasonic with D3 in 1991. Sony follow-up to the D1 was the Digital Beta format in 1993. DigiBeta was cheaper than D1, used tapes similar to Betacam SP which was a standard television industry tape at the time, it recorded composite video which was how most tv studios were wired, and it used a 3 to 1 Discrete Cosine Transform video compression to get the bitrate down to 90 million bits per second.
Before we dive too deeply into how data is compressed let’s talk about Chroma subsampling - a type of compression that was used even on the uncompressed D1 digital video recorder.
The human eye is comprised of light sensitive cells called rods and cones. Rods are sensitive to changes in brightness only and provide images to the brain in black and white. Cones are sensitive to either Red Green or Blue provide our brains with the information to see in color. But we have a lot more rods in our eyes than cones - 120 million rods to only 6 million cones.
Because of this we’re more responsive to changes in brightness which means you can take an image and throw away some of the color information while keeping the brightness and it would still look as crisp and bright as fully colored video.
So to compress color images, first we have to pull the brightness information out of the signal
Video is made of the primary colors Red Green and Blue - but storing signals in RGB leads to a lot redundancies. So the RGB signal is converted to what’s called a YCbCr colorspace. Y stands for Luma or brightness, Cb is the difference in the blue channel and Cr is the difference in red channel.
- A 3d Model of the YCbCr Colorspace
Now by separating out color from the brightness we can start to compress the color information reducing the resolution of the Cb and Cr channels.
The amount of subsampling or how much we’re reducing the color resolution is expressed in a ratio: J:a:b where
- J is the number of horizontal pixels in the compression scheme, usually 4,
- “a” is the number of Cb and Cr pixels in that sample r ow of pixels and
- “b: is the number of different Cb and Cr pixels in the row of pixels.
Let’s illustrate what this means.
A 4:4:4 signal is said to have NO chroma subsampling.
There are 4 pixels in our sample - that’s four pixels of Y. Each of those 4 pixels have their own Cr and Cb values - so 4 Cr and Cb pixels. And in the next line there are 4 more Cr and Cb pixels
Now let’s start subsampling
In a 4:2:2 subsample we again have 4 pixels in the sample - four pixels of y - we never throw away the Y value. But now now we only have 2 pixels of Cr and Cb... two of the pixels share the same values. And in the next line again we have 2 pixels of Cr and Cb.
The information needed to construct a 4:2:2 image is a third smaller than 4:4:4 and is considered good enough for most professional uses.
Another common one is 4:1:1, 4 pixels in the same and this time only 1 pixel of Cr and Cb in the same and one on the following line.
Here’s 4:2:0 - four in the sample, 2 pixels in the sample line and zero in the next line - essentially the 2 get carried over to the next line.
Both 4:1:1 and 4:2:0 need half as much data as 4:4:4.
The Compression Revolution
Chroma subsampling is a good start but we have ways to get the video data even smaller. One of the most important ways is the Discrete Cosine Transform. DCT is a seriously brilliant mathematical achievement - basically what it does is approximates the square wave signal that is the binary stream as a sum of different cosine waves.
The mathematics is nothing short of amazing and seriously well beyond my capability to explain. In the most simple terms - the more cosines waves you use, the more accurately you can describe the original square wave of the binary code. And since binary is resistant to noise you don’t need that many waves to get a pretty accurate result.
The first compression widely used for editing video was Motion-JPEG in the early 90s.
Motion JPEG is an intraframe compression. It uses DCT to breaks down individual frames into macroblocks. It basically looks the frame and finds chunks of the image that are similar then simplfies them. Now it didn’t look that great - the first Avid editing systems in the early 90s used an early form Motion JPEG compression and the quality was about that of VHS tape. But since the compression was done frame by frame, the codec wasn’t too taxing for the computer hardware at the time - and it was just good for offline editing.
Major breakthroughs came in 1995 with two important technological releases.
On the Distribution side - 1995 saw the introduction of DVD optical discs. These discs used a new kind of compression called MPEG-2 - not to be confused with motion-JPEG. MPEG-2 was developed by the Motion Picture Experts Group who had a rather novel approach to handling compression. Instead of standardizing the way video signals were encoded, they standardized the way video was decoded from a digital stream. The way a MPEG-2 was decoded stayed the same no matter where it was done, on DVD player, to your computer, or ever on a modern day DVR.
Now how that digital stream was encoded- what algorithms were used to compress that original data was left open so that media companies could continually fight it out and develop the more and more efficient encoders.
MPEG-2 was Interframe compression. Unlike Intra-frame, which compressed frames individually. Interframe compression puts frames into GOPs, groups of pictures. These GOPs would start with an I-frames or reference frames - a full image. Then the encoder would wait a few frames and then record a P frame - a predictive frame. This frame only contains the information that is DIFFERENT from the I frame. Then the encoder goes back a calculates the difference between the I and P frames and records them as B frames - Bidirectional predictive frames.
Describing the process almost sounds like magic - This process of building frames based on reference frames was rather computationally taxing - it would take a while before computers could muscle the processing power to edit this type of compression.
But in 1995 they didn’t have to as that was the same year the DV Format was introduced. Intended to be a consumer grade video tape format, DV recorded video at a 4:1:1 color subsample using an Intra-frame DCT compression giving 25 million bits per second - quite an improvement in size from original the D1
This wasn’t considered a professional quality standard, but it was a huge step up from consumer analog formats like VHS or Hi8. And all the DV cameras had IEEE1394 (Firewire) connections which mean people could get a perfect digital copy of their video onto their computer without having to specialized hardware to encode the file. The tapes themselves were extremely cheap, $3-5 per hour.
Armed with relatively inexpensive cameras digital video production began to take off.