Why .tar.gz Still Popular: Compression, Streaming, and Real-World Performance

If you've shipped software, distributed updates, or archived build artifacts, chances are you've encountered .tar.gz
. Despite being decades old, it's still the go-to format in many modern workflows, from distributing software packages to transferring build artifacts and backups.
But why does it remain so widely used? Is it really better than alternatives like .zip
, .tar.zst
, or .tar.xz
? And how does it actually work under the hood?
This article breaks down what .tar.gz
is, why it's still relevant, when it falls short, and what to consider as alternatives.
Understanding .tar.gz
: Two Layers, One Purpose
A .tar.gz
file is not a single format, but a combination of two:
TAR (
.tar
): a simple archive format that bundles files together. No compression.GZIP (
.gz
): a compression algorithm that reduces size by encoding repeating patterns.
So when you create a .tar.gz
, you're essentially doing:
tar -cf archive.tar folder/
gzip archive.tar
# or more commonly:
tar -czf archive.tar.gz folder/
This separation is a strength. tar
handles file structure and metadata, while gzip
handles compression. This modularity also allows you to swap in different compression algorithms (zstd
, xz
, bzip2
) without changing how the archive is created or extracted.
Why .tar.gz
Works So Well for Streaming
One of the lesser-known advantages of .tar.gz
is that it supports true streaming. That means you can start reading and processing the archive before it's even fully downloaded or written to disk.
This works because:
Gzip decompresses the data sequentially as it’s read.
Tar stores files linearly: each file's header is followed by its content.
As a result, you can extract files one-by-one as the stream comes in. Here's what that looks like in Go:
func ExtractAndProcessTarGz(r io.Reader, process func(name string, data []byte) error) error {
gzReader, err := gzip.NewReader(r)
if err != nil {
return fmt.Errorf("failed to create gzip reader: %w", err)
}
defer gzReader.Close()
tarReader := tar.NewReader(gzReader)
for {
header, err := tarReader.Next()
if err == io.EOF {
break // end of archive
}
if err != nil {
return fmt.Errorf("failed to read tar entry: %w", err)
}
if header.Typeflag != tar.TypeReg {
continue // skip non-regular files
}
data := make([]byte, header.Size)
if _, err := io.ReadFull(tarReader, data); err != nil {
return fmt.Errorf("failed to read file %s: %w", header.Name, err)
}
if err := process(header.Name, data); err != nil {
return fmt.Errorf("error processing file %s: %w", header.Name, err)
}
}
return nil
}
There’s no need to write the full .tar.gz
to disk, nor extract the whole archive at once. This is ideal for CI/CD pipelines, mobile update servers (e.g., Expo), or any case where latency or memory matters.
Compression Benchmarks: How .tar.gz
Stacks Up
So how does .tar.gz
compare to other formats in terms of size and speed? Here's a rough benchmark using a 100MB mixed-content folder:
Format | Compressed Size | Compression Time | Decompression Time |
.tar.gz | 35 MB | 1.8s | 1.2s |
.tar.zst | 33 MB | 0.8s | 0.6s |
.tar.xz | 29 MB | 4.5s | 2.3s |
.zip | 41 MB | 1.5s | 1.5s |
.7z | 26 MB | 5.2s | 2.6s |
Takeaways:
.tar.gz
is a good balance of speed and size..tar.zst
is faster and slightly smaller, but requireszstd
tooling..tar.xz
and.7z
compress better but are slow..zip
is universally supported but larger and not stream-friendly.
Limitations of .tar.gz
Of course, .tar.gz
isn't perfect.
No random access: You can’t seek to a specific file without reading through the whole stream.
Not ideal for Windows: Native support for
.zip
makes it more common in that ecosystem.Not indexed: You have to list and scan everything up front to know what’s inside.
Also, gzip is fast but outdated. Algorithms like Zstandard (.tar.zst
) offer better performance and compression, at the cost of requiring newer tooling.
When to Use Something Else
You might consider .zip
, .7z
, or .tar.zst
depending on:
Your audience:
.zip
is best for non-technical users or Windows.Speed:
.tar.zst
is significantly faster for both compression and decompression.Disk space:
.tar.xz
and.7z
offer better compression ratios for archival use.
But for streaming workflows, .tar.gz
and .tar.zst
remain the top contenders.
Real-World Use Cases
Expo OTA Updates: The update bundle is shipped as
.tar.gz
, then streamed and extracted per-file.Docker Layers: Image layers are packed and compressed using TAR and gzip or zstd.
CI/CD Artifacts: Many pipelines cache and restore build artifacts using
.tar.gz
for speed and simplicity.Linux Distributions: Kernel source, patches, and package trees are still commonly shipped this way.
Conclusion
Despite its age, .tar.gz
is still one of the most practical archive formats available. It's not the best in every category, but it hits a rare sweet spot: good compression, fast performance, and strong support across platforms and tools.
For use cases where streaming, simplicity, and compatibility matter, it's hard to beat. But if you're optimizing for speed and efficiency, and you're in control of the toolchain, .tar.zst
is a worthy upgrade.
Just keep in mind: not all Linux distributions come with zstd
support out of the box, so you may need to install it manually or update your tar
version to take full advantage.
Subscribe to my newsletter
Read articles from Billy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
