The Evolution of Modern Non-Linear Editing: Part 2 – the Digital Revolution

In the first part we looked at the accomplishments of video engineers that made it possible to record video signals and edit them. But now the story turns from engineers to programmers and computer scientists as we look at the explosion of digital and how it made filmmaking accessible to practically everyone.

Analog vs Digital

Analog-vs-Digital

What is the difference between Analog and Digital? To explain let’s imagine an audio recording of a tone. An analog recording would look like the original wave – all the details intact, it’s a copy- an analog to the original. On the other hand (pun intended) a Digital recording, breaks the wave into chunks called samples and the measures the amplitude of the wave at each sample and stores these measurements in a stream of binary code – a square wave of 0s and 1s. A digital player would reconstruct the wave using these measurements.

So right off the bat, you may think that analog is the better of the two formats – and you aren’t alone. There are plenty of people who swear that analog audio recordings are the best. But digital comes with some great advantages and analog simply doesn’t have.

The first is resistance to noise – introduce noise into an analog signal and you’re going to destroy the signal. Digital signals, because they’re either 0 or 1 and nothing in between, can withstand some noise and not lose any quality at all.

Digital is also easier to copy, there is no generation loss as analog loses a bit of quality every time it’s copied – like a game of telephone. Digital signals can also be synced up and read by computers which analog can’t. And very importantly for video, patterns can be found in the sequence of 1s and 0s in digital signals, so digital can be compressed – and that is key for making video as ubiquitous as it is today.

The First Digital Tapes

By the late 1970s and into the 80s, electronics manufacturers were experimenting with digital recording. The first commercially available digital video tape was made available in 1986 with the Sony D1 video tape recorder. This machine recorded a uncompressed standard definition component signal onto a ¾” tape at a whopping 173 million bits per second! that’s a lot of zeros and ones in a single second!

In comparison you are watching the video in this lesson in HD at a bit rate of only 5 million bits per second.

The D1 was expensive and only large networks could afford them. But they soon proved their worth as a rugged format. The Sony D1 would be challenged by Ampex with D2 in 1988, and Panasonic with D3 in 1991. Sony follow-up to the D1 was the Digital Beta format in 1993. DigiBeta was cheaper than D1, used tapes similar to Betacam SP which was a standard television industry tape at the time, it recorded composite video which was how most tv studios were wired, and it used a 3 to 1 Discrete Cosine Transform video compression to get the bitrate down to 90 million bits per second.

Chroma Subsampling

Before we dive too deeply into how data is compressed let’s talk about Chroma subsampling – a type of compression that was used even on the uncompressed D1 digital video recorder.

The human eye is comprised of light sensitive cells called rods and cones. Rods are sensitive to changes in brightness only and provide images to the brain in black and white. Cones are sensitive to either Red Green or Blue provide our brains with the information to see in color. But we have a lot more rods in our eyes than cones – 120 million rods to only 6 million cones.

Because of this we’re more responsive to changes in brightness which means you can take an image and throw away some of the color information while keeping the brightness and it would still look as crisp and bright as fully colored video.

So to compress color images, first we have to pull the brightness information out of the signal

Video is made of the primary colors Red Green and Blue – but storing signals in RGB leads to a lot redundancies. So the RGB signal is converted to what’s called a YCbCr colorspace. Y stands for Luma or brightness, Cb is the difference in the blue channel and Cr is the difference in red channel.

YCbCr

A 3d Model of the YCbCr Colorspace

Now by separating out color from the brightness we can start to compress the color information reducing the resolution of the Cb and Cr channels.

The amount of subsampling or how much we’re reducing the color resolution is expressed in a ratio: J:a:b where

  • J is the number of horizontal pixels in the compression scheme, usually 4,
  • “a” is the number of Cb and Cr pixels in that sample r ow of pixels and
  • “b: is the number of different Cb and Cr pixels in the row of pixels.

Let’s illustrate what this means.

Chroma-Subsampling-Schemes

A 4:4:4 signal is said to have NO chroma subsampling.
There are 4 pixels in our sample – that’s four pixels of Y. Each of those 4 pixels have their own Cr and Cb values – so 4 Cr and Cb pixels. And in the next line there are 4 more Cr and Cb pixels

Now let’s start subsampling

In a 4:2:2 subsample we again have 4 pixels in the sample – four pixels of y – we never throw away the Y value. But now now we only have 2 pixels of Cr and Cb… two of the pixels share the same values. And in the next line again we have 2 pixels of Cr and Cb.

The information needed to construct a 4:2:2 image is a third smaller than 4:4:4 and is considered good enough for most professional uses.

Another common one is 4:1:1, 4 pixels in the same and this time only 1 pixel of Cr and Cb in the same and one on the following line.

Here’s 4:2:0 – four in the sample, 2 pixels in the sample line and zero in the next line – essentially the 2 get carried over to the next line.

Both 4:1:1 and 4:2:0 need half as much data as 4:4:4.

The Compression Revolution

Chroma subsampling is a good start but we have ways to get the video data even smaller. One of the most important ways is the Discrete Cosine Transform. DCT is a seriously brilliant mathematical achievement – basically what it does is approximates the square wave signal that is the binary stream as a sum of different cosine waves.

SquareWaveDCFigure2

Creating a square wave out of cosine waves

The mathematics is nothing short of amazing and seriously well beyond my capability to explain. In the most simple terms – the more cosines waves you use, the more accurately you can describe the original square wave of the binary code. And since binary is resistant to noise you don’t need that many waves to get a pretty accurate result.

The first compression widely used for editing video was Motion-JPEG in the early 90s.

Motion JPEG is an intraframe compression. It uses DCT to breaks down individual frames into macroblocks. It basically looks the frame and finds chunks of the image that are similar then simplfies them. Now it didn’t look that great – the first Avid editing systems in the early 90s used an early form Motion JPEG compression and the quality was about that of VHS tape. But since the compression was done frame by frame, the codec wasn’t too taxing for the computer hardware at the time – and it was just good for offline editing.

Major breakthroughs came in 1995 with two important technological releases.

On the Distribution side – 1995 saw the introduction of DVD optical discs. These discs used a new kind of compression called MPEG-2 – not to be confused with motion-JPEG. MPEG-2 was developed by the Motion Picture Experts Group who had a rather novel approach to handling compression. Instead of standardizing the way video signals were encoded, they standardized the way video was decoded from a digital stream. The way a MPEG-2 was decoded stayed the same no matter where it was done, on DVD player, to your computer, or ever on a modern day DVR.

Now how that digital stream was encoded- what algorithms were used to compress that original data was left open so that media companies could continually fight it out and develop the more and more efficient encoders.

GOP

MPEG-2 was Interframe compression. Unlike Intra-frame, which compressed frames individually. Interframe compression puts frames into GOPs, groups of pictures. These GOPs would start with an I-frames or reference frames – a full image. Then the encoder would wait a few frames and then record a P frame – a predictive frame. This frame only contains the information that is DIFFERENT from the I frame. Then the encoder goes back a calculates the difference between the I and P frames and records them as B frames – Bidirectional predictive frames.

Describing the process almost sounds like magic – This process of building frames based on reference frames was rather computationally taxing – it would take a while before computers could muscle the processing power to edit this type of compression.

But in 1995 they didn’t have to as that was the same year the DV Format was introduced. Intended to be a consumer grade video tape format, DV recorded video at a 4:1:1 color subsample using an Intra-frame DCT compression giving 25 million bits per second – quite an improvement in size from original the D1

This wasn’t considered a professional quality standard, but it was a huge step up from consumer analog formats like VHS or Hi8. And all the DV cameras had IEEE1394 (Firewire) connections which mean people could get a perfect digital copy of their video onto their computer without having to specialized hardware to encode the file. The tapes themselves were extremely cheap, $3-5 per hour.

Armed with relatively inexpensive cameras digital video production began to take off.

Pressure from Below

In Hollywood during 90s, AVID was the king of Nonlinear editing systems but it was still a fairly expensive system. Several companies tried to compete for a share of that video production market.

Beginning In 1990, Newtek released the first Video Toaster on the Amiga system. Though technically it was more of a Video Switcher which only had limited linear editing capabilities until they added the Flyer, the video toaster brought video production to lots small televisoin studios, production shops and schools. Costing only a few thousand but loaded with effects, character generator and even a 3d package called Lightwave 3D, Video Toast proved there was a market for small scale media production.

As computers continued getting more powerful and storage cheaper and cheaper, Software based Non-linear editors like Adobe Premiere and Media 100 kept nipping at the heels on Avid forcing the company to constantly lower the price of their system.

A media company called Macromedia wanted to get in the game. The hired the lead developer of Adobe Premiere Randy Ubillos (you-bil-los) to create a program called “Keygrip” based on Apple’s Quicktime codecs. The product was fairly developed when Macromedia realized it would be unable to release the program as it interferred with their licensing agreements they had with their partner Truvision and Microsoft. So Macromedia sought a company to buy the product they had developed and they found one at a private demonstration at NAB in 1998. The buyer’s name was Steve Jobs and his company Apple would release the software the following year as Final Cut Pro.

High-Definition Video and Film Scanning

The divide between television/video production and film production began to close with the adoption of high definition video production. Engineering commissions had been working on the standardization of High Def video since the 70s and experiments in HD broadcast were being conducted by the late 80s in Japan. The first public HD broadcast in the United States occurred on July 23, 1996.

Digital Intermediary

Digital Intermediary

Now about this same time, the mid to late 90s, Hollywood studios were beginning to use DI or digital intermediaries to create special effects. A DIs were created by sending 35mm celluoid film through a telecine which scanned the film to created a digital files. These could be manipulated and composited in the computer and when they were satisfied, the final shot would sent to an optical printer which put the digital images back on film. Hence the term Intermediary.

In 1992, Visual Effects Supervisor/Producer, Chris Woods overcame several technological barriers with telecine to create the visual effects for 1993’s release of Super Mario Bros. Over 700 visual effects plates were created at a 2K resolution – that’s roughly 2,000 pixels across.

Chris Watts, further revolutionized the DI process with 1998’s Pleasantville. Pleasantville held the record for most visual effects shots in a single film as almost every shot when the characters visit the fictional 1950s idyllic town of Pleasantville required some kind of color special effects.

That title for most digital effects would not last long as it was overtaken by Star Wars Episode I in 1999.

The first Hollywood film which utilized the DI process for the ENTIRE length of the movie was the Coen brother’s O Brother Where Art Thou in 2000. After trying bleach processes but never quite getting the right look, cinematographer Roger Deakins suggested doing it digitally with a DI. He spent 11 weeks pushing the color of the scanned 2k DI, fine-tuning the look of old-timey American south.

The thing is HD video and 2K film scans share roughly the same resolution – HD being 1920×1080 whereas 2K is 2048×1080. So it wasn’t long before Hollywood started asking, can we just skip the whole 35mm film step all together.

HD-and-2K

The first major motion picture shot entirely on digital was Star Wars II: attack of the clones on a pre-production Sony HDW-F900

Star Wars Episode 2

And by the late half of the 2000s, with faster computers and storage, better cameras and even 4K resolution, it became conceivable to capture straight onto a digital format, edit online which means working with the original full quality files rather than a low quality working file, and even project digital files – all without celluloid film.

Moving into the second decade of the 21st century we’re adding even faster computer and video processors, incredibly efficient compression techniques like MPEG-4 and H.265 and a powerful network of data distribution with broadband internet.

The journey to get to modern day film/video editing can trace all the way back to TV networks needing to delay the broadcast of their shows. Everything we have now is built on the sparks of genius that electronics engineers, software engineers, and mathematicians had over the past 60 years – coming up with incredibly brilliant solutions to problems that hounded electronics from the start. Each step, each advancement adding more and more tools for us filmmakers to realize our dreams. How can you not look at the momentum of history and how we got here and not wonder in awe that so much as changed in so little time and it’s all so we can just tell stories to each other – Filmmaking is it technological fulfillment of our most basic human need, the need to communicate. So go out there and communicate! Use these tools that are available. Be part of the next chapter in filmmaking history.


Quizzes

The Evolution of Modern Non Linear Editing: Part 2 - Quiz