Zip, archive and container

Zip has become a universal compressed archive format, often not using software as a storage medium.

The format was placed in the public domain, which allows operating systems such as Windows, Linux, Mac to integrate it into their file management tools and authors to distribute their software in this format. Thus, a JAR file containing Java bytecode is only a ZIP file with a JAR extension.
It also made it possible to see the distribution of GUI archive management software such as WinZip, IZArc, PeaZip, etc.

Le format zip et son histoire If

we will not jump, we will be locked in the end!

When using the source code of a freely available ARC compression program to create his own version, which he called PKARC, Phil Katz was put on trial. While ARC was all C, PKARC was a combination of Assembler and C, so significantly faster. ARC was widely used for software distribution at the time, and the term "ARCing" was used to refer to the creation of an archive. But PKARC (Phil Katz ARC) quickly became more popular since 1986, which did not please the author of the original.

The lawsuit was based on the use of the word ARC, which is a registered trademark of SEA, as well as the similarity between the source codes, confirmed at the hearing by an expert. Finally, in 1988, both authors came to an understanding. Phil Katz had to pay $62,500. USA and pledged to no longer use the term "ARC," as well as pay SEA 6.5% of its software sales.

Then Phil Katz got better at writing his own compression code, which he called DEFLATE, and implementing it in the PKZIP archiver, common in shareware. Now the archive had a ZIP extension, a word symbolizing speed. PCZIP was not only more effective than ARC, but behind it the author had the entire community of users who appreciate little that competition problems are resolved through litigation .
Indeed, a second lawsuit was filed by SEA because the PKZIP manual used the term "arcing" to say "archive..." It's like tenacity! The lawsuit, by the way, lost to SEA.
As a protest, the. "arc" extension was often replaced with. "sue," which leads to "prosecution."
ZIP also quickly became the most used format, and ARC went into oblivion! And now the term "zipping" is used to refer to the creation of an archive ...

Phil Katz's company brought in millions. However, since the 90s, the author has faced difficulties due to his addiction to alcohol. He is arrested many times for driving under the influence. He died in April 2000, twenty years after the debut of PKZIP in a hotel room, due to loss of internal blood from his chronic alcoholism. He was 37 years old.

Zip, Gzip and so on...

ARC (1985) was based on the LZW (Lempel-Ziv-Welsh) algorithm. Since the source code was available at the beginning, it could be ported to several platforms and, on occasion, a second coding was added - Hufman coding.

PKZIP 1.0 (1989) used the same algorithm as ARC. In version 2.0, he adopted the new DEFLATE algorithm always based on the first LZW compression, and then the second with Huffman encoding. PKZIP has since used several compression modes with several other algorithms.

IHF (1987). This graphic format uses LZW. Interestingly, Unisys, which holds a patent for the LZ78 and LZW format (in the US), launched a series of lawsuits to force users to pay, which led to the creation of a competing PNG format.

PNG (1995, "Portable Network Graphics"). Based on zlib, uses a more efficient and non-embossed patent DEFLATE algorithm.

Тар (Tape Archive, 1975, Bell Labs). A container for storing a list of files in one archive, which can be extracted individually. There is no compression, but then you can compress the archive into a block. Most often with Gzip, which produces the .tar.gz archive.

Gzip (1992). Based on DEFLATE, unlike PKZIP, it is planned to compress a unique file.

Zip (1992). One of the unzip Info-Zip tools that is compatible with the ZIP format and uses DEFLATE is open source and is ported to many platforms. The archive also has a .zip extension. Info-zip uses Gzip's zlib library.

Zlib (1995). Created as a separate gzip module to use PNG and the libpng library, it is now used in many programming languages ​ ​ to create archives and access content. Thus, this library is used by Zip, Gzip, PNG, Git and others. It is also used to compress files between the server and the browser and speed up the transfer.

Zip format

The format has always been public and described in the APPNOTE.TXT file includes in the archive what allows third-party software to be compatible. The file mentions that this specification can be implemented.

In the zip archive, the directory with the list of files is put to the end. This allows you to add other files. In this case, a new directory is added to the end, which adds them to the list .

Each file has:

The central directory consists of a list of headers, one per file, and then an end-of-directory entry.

Each header contains the same information as the file header plus the relative position of the file in the archive. This allows you to both get a quick list of files in the archive, and access one of them.

The entry at the end of the directory includes information such as the number of files, the disk number for archives distributed across multiple disks (it is designed during floppy disks), the location of the central directory, etc .

Thus, if you want to access the content of one archive through the program, then you need to arrange a certain number of bytes to the end of the file and get information that will allow you to access everything else.

Links

See APPNOTE.TXT for more information.

RFU 1951. Description of the DEFLATE compression algorithm based on LZW combining and Huffman coding.