Alex Mercer is a doctoral candidate in computer science at Stanford University and the inaugural leader of the Public Engagement for Scholars initiative.
The 1990s were a turbulent era, marked by quirky fads like Crystal Pepsi and the Macarena, not to mention the infamous Tickle Me Elmo. Yet, one of the most exasperating experiences of that decade was the glacial pace of the Internet. Whenever I needed to email a PowerPoint presentation for school, I’d connect the modem, endure the symphony of beeping sounds, initiate the upload, and head off for dinner. By the time I returned, I might have successfully sent a single email.
However, I discovered a handy trick when time was of the essence: file compression, commonly known as “zipping.” Software like WinZip could take an 80 MB PowerPoint file, process it for a moment, and compress it into a ZIP file that retained the original content, yet reduced in size to a mere third.
Initially, I viewed this technique as a minor convenience. But the more I pondered it, the more it seemed like a form of sorcery. The file was smaller, yet no data had been lost, allowing the recipient to seamlessly reconstruct the original document. It was akin to fitting a 6-foot package into a 2-foot box and later extracting the original size on the other end. Where did all that data go during the journey?
Extracting the Excess
The packaging analogy provides a glimpse into the answer. Imagine you are shipping a large inflatable object, such as a beach ball. Instead of sending it in its inflated state, you could deflate the ball and tuck it into a smaller box, including instructions for re-inflation. Yet, this example only illustrates the concept to a degree: the ball is primarily composed of air, which is negligible. If WinZip started removing portions of my meticulously crafted presentation, I would be quite upset. So what is the “air” that can be removed from a PowerPoint file?
To achieve compression, computers utilize strategies similar to how humans process information. Consider a scenario in which someone is memorizing a complex piece of music, such as Ravel’s iconic “Boléro.” This composition contains a staggering 4,050 drumbeats. However, the task becomes significantly easier upon realizing the snare drum part is highly redundant; it consists of a single sequence of 24 beats repeated continuously until the end. In psychological terms, there’s merely one chunk of information to remember. Rather than memorizing each individual note, one can simplify it to “chunk chunk chunk…”
This process mirrors how a computer compresses files. Just as a musician seeks patterns in music, a compression algorithm identifies recurring sequences within a file and substitutes them with shorthand notations. For instance, if my presentation contained the phrase, “How much wood could a woodchuck chuck if a woodchuck could chuck wood?” the program would recognize the repetitive words and replace them with symbols—let’s say “X,” “Y,” and “Z.” These redundant elements represent the air that can be extracted from the document.
Of course, the receiving computer needs to understand these shorthand terms, so the compression program includes a symbol table, analogous to instructions for re-inflating the ball. This table enables the recipient’s computer to accurately reconstruct the original file.
Redundancy accounts for the enigma of compression and opens avenues for further data minimization. Our tendency to share large media files, such as music and videos, is made feasible through innovative techniques that eliminate even more redundancy. Yet, this raises another question: if there’s ample redundancy to eliminate, why store an 80 MB file when 30 MB suffices?
Developers of software like PowerPoint are certainly aware of compression capabilities; however, file size isn’t their sole concern. Imagine if you had to inflate your exercise ball each time you wanted to use it. While that would maximize space efficiency, it would also be highly inconvenient. Similarly, if a computer needed to decompress files every time it accessed them, it would evoke the same frustrations of those 56K modem days. Retaining redundancy may lead to larger files, but it also significantly reduces hassle.
For both computers and humans, redundancy is a delicate balance. Too little redundancy forces constant re-evaluation of the same information, while too much can overwhelm bandwidth, as seen with streaming services. Striking the right balance allows us to enjoy seamless access to films like “The Shawshank Redemption” or “Braveheart.” Perhaps the 90s weren’t so bad after all.
For more insights on fertility and home insemination, consider visiting Make a Mom, a valuable resource on enhancing fertility. Additionally, for those interested in at-home insemination options, check out this 21-piece kit. For excellent guidance on pregnancy week by week, the March of Dimes offers an outstanding resource.
Summary
This article explores the fascinating world of data compression, likening it to the human ability to process and remember information. It discusses how redundancy is essential for efficiency, balancing convenience and size, ultimately revealing the intricate mechanics behind file compression.
Leave a Reply