A lot of people have done this already, but I would like to try it myself and document it in this post. I will be comparing the different compression algorithms or formats which are used with the tar
program on Linux, to be more specific GNU's version of tar
which differs a bit from the standard UNIX tar program.
For the first example, I will be compressing the source code for dwm
version 6.5 made by suckless.org, I will be removing the .git
directory from the source directory. Contents for the source can be found on this website.
Using gzip, it reduced the size significantly as shown below with the du
command output.
$ tar czf dwm.tar.gz dwm/ $ file dwm.tar.gz dwm.tar.gz: gzip compressed data, from Unix, original size modulo 2^32 102400 $ du dwm 120 dwm $ du dwm.tar.gz 28 dwm.tar.gz
Repeating the same step for bzip2, XZ, Zstandard and LZMA.
$ ls -l -rw-r--r-- 1 spekie spekie 24009 Apr 12 14:14 dwm.tar.bz2 -rw-r--r-- 1 spekie spekie 25732 Apr 12 14:22 dwm.tar.gz -rw-r--r-- 1 spekie spekie 23212 Apr 12 14:25 dwm.tar.lzma -rw-r--r-- 1 spekie spekie 23264 Apr 12 14:12 dwm.tar.xz -rw-r--r-- 1 spekie spekie 27374 Apr 12 14:21 dwm.tar.zst
To be more accurate I used the ls -l
command to show the size of these files, since du
only estimates the size. As we can see XZ and LZMA came really close and are very similar in terms of file size, but LZMA still beats XZ by a tiny bit. Depending on the types of files you are compressing, compression and compressed size of files may vary between these formats, but here I am only testing regular old text files and one small PNG file.
For the second example, I will be compressing a directory that contains six FLAC files including their metadata. Below is an output of ffprobe
probing one sample FLAC file inside the directory, for reference and bitrate information.
ffprobe version 6.1.2 Copyright (c) 2007-2024 the FFmpeg developers built with gcc 14 (Gentoo 14.2.1_p20241221 p7) libavutil 58. 29.100 / 58. 29.100 libavcodec 60. 31.102 / 60. 31.102 libavformat 60. 16.100 / 60. 16.100 libavdevice 60. 3.100 / 60. 3.100 libavfilter 9. 12.100 / 9. 12.100 libswscale 7. 5.100 / 7. 5.100 libswresample 4. 12.100 / 4. 12.100 libpostproc 57. 3.100 / 57. 3.100 Input #0, flac, from 'ReCoda/01. ReCoda.flac': Metadata: TITLE : ReCoda GENRE : Anime ARTIST : TRUE ALBUM : ReCoda/ブルーデイズ DATE : 2024 track : 1 Duration: 00:04:12.92, start: 0.000000, bitrate: 1364 kb/s Stream #0:0: Audio: flac, 44100 Hz, stereo, s16 Stream #0:1: Video: png, rgb24(pc, gbr/bt709/iec61966-2-1), 3307x3307, 90k tbr, 90k tbn (attached pic) Metadata: comment : Cover (front)
Below you will find the results; LZMA performed worse here, with XZ surpassing it based on the reduced size of the final fully compressed tarball. I have also included a POSIX tarball just for comparison.
$ ls -l -rw-r--r-- 1 spekie spekie 222504960 Apr 12 14:44 ReCoda.tar -rw-r--r-- 1 spekie spekie 221134890 Apr 12 14:46 ReCoda.tar.bz2 -rw-r--r-- 1 spekie spekie 220179375 Apr 12 14:44 ReCoda.tar.gz -rw-r--r-- 1 spekie spekie 222712912 Apr 12 14:48 ReCoda.tar.lzma -rw-r--r-- 1 spekie spekie 220164368 Apr 12 14:45 ReCoda.tar.xz -rw-r--r-- 1 spekie spekie 220180221 Apr 12 14:46 ReCoda.tar.zst
I can conclude from this that, XZ is probably the best for reducing size if you do not particularly care about the speed of extraction or the speed of compression. Something to consider is that I only tested the FLAC codec and regular text files. I will not test video formats since they are all container formats meaning they support different kinds of video and audio codecs which will all vary by these compression algorithms and the types of files you are compressing like which codec they are using. Zstandard was the fastest in extraction in most tests, which makes sense as to why Arch Linux uses Zstandard in their binary package distribution system and package manager known as pacman
in comparison to other compression formats like LZMA and XZ which might take a bit longer to extract depending on the file size of a binary file.