There Would Be No YouTube or the Internet Without Compression

Welcome to the world of compression. When we store or recycle certain items, we compress them to reduce the amount of space they take. For instance, we press the aluminum cans to make them flat and put blankets or clothes in a space bag to reduce the volume. Likewise, the technique of applying pressure and reducing the volume is called compression. In computer science, compression means decreasing the size of data by removing unnecessary or repeated parts using special coding techniques. Without compression, we might be living in a totally different world.

# Principle of Compression

Principle of compression can be explained with this example. Let's say there is a text file(.txt) 'aaaaabbbccccccddeeee'. This 20-digit text can be compressed into a 10-digit text by writing the number of repeating characters. This method is called RLE (Run Length Encoding).

aaaaabbbccccccddeeee -> a5b3c6d2e4
A 20-digit text string was compressed as much as 50% into a 10-digit string

You don’t need to remember the above method since other methods, such as Lempel Zip Welch (LZW) or Huffman Coding, are more frequently used. These techniques are not very easy for novices to understand, so it is sufficient to have a very brief understanding of the concepts.

The history of compression goes way beyond than you think. The first file compression utility called SQ was invented in 1983, allowing users to reduce the storage size in their computers. Around 1985, the standard compression format ZIP was developed, and it is most frequently used nowadays. Of course, now you can choose from several filename extensions depending on the type of file compression.

# World of Wonders Created by Compression

Recently, Bluetooth earphones have gained popularity since they are wireless and are quite convenient when listening to music. Did you know that these mp3 files are also compressed files? Most of the files that we use in our daily lives are compressed. Another example is the jpg (or jpeg) format. It is based on an image compression technique. When you record a video using a smartphone, the video is usually stored in the mp4 format, which is also a compressed file format. What would happen without such compression technologies?

First, we probably still used CD players. You can digitize analog data, but you cannot reduce the size of the data without compression technologies. Besides, there would be no UHD TV or 4K broadcasting because TV broadcasting networks use compressed formats such as NTSC or PAL. Sending a massive amount of image data would be impossible without compression, even with high-speed internet or 5G. What’s more, the so called 5G era would not have arrived since 5G telecommunication services use compression technology. Ultimately, the Internet itself would be impossible because the data communications use compression technologies as well.

Can you imagine the world without the Internet, YouTube or Netflix, or even without the smartphones?

# There are two types of compression: Lossless and lossy compression

If you are interested in music, you may have heard that it is better to listen to FLAC files, which is a lossless sound source versus mp3 files. There are two types of compression formats: lossy and lossless compressions. Since the human eyes and ears are not super sensitive like machines, humans do not notice much differences even if a certain amount of data is omitted.

For instance, sound frequencies are divided into audible and inaudible frequency ranges. The latter refers to sounds that humans cannot hear, but can be heard by animals. Sound data that falls under this inaudible frequency ranges can be removed in compression since they do not affect the quality of the sound that humans detect. Likewise, compressing data by deleting certain amount of data is called lossy compression, whereas compressing all the data without deletion is lossless compression. FLAC is a representative case of lossless compression, while mp3 is a primary lossy compression format.

lossless compression Files before compression > compressed files (Zip, Lha, etc.) -> Restored files (same as the files before compression) Lossy compression Files before compression -> compressed files (JPEG, etc) -> Restored files (part of the information omitted)

The same thing happens with human eyes. Humans cannot recognize certain color/space ranges. So areas that are extremely bright or dark, or areas that have similar saturations or brightness can be expressed with the same color to reduce the volume of data. A good example is the JPEG standard.

Indeed, when the human brain processes images or voices, data losses occur automatically. If the human brain processed all visual and auditory data, we are likely to suffer a severe headache. In other words, the concept of compression was developed based on the understanding of the human brain.

Such an understanding of the human brain led to the development of artificial intelligence. The concept of artificial intelligence became widely known thanks to the emergence of AlphaGo. Other examples are neural network learning, brain decoding (hypothetical reconstruction of thoughts) and natural language/voice recognition technologies.

By Andy Cho principal Professional_Development Competency Office