Published:
Comparison of Go data compression implementations - gzip, zstandard, brotli
A comparison of the different data compression algorithms and which performs the best on XML files
↑Introduction
I need to store XML messages within the vault repository as part of my ATNA Audit Vault project. Although the XML messages are individually small, the vault must hold millions of new messages each year; therefore, the repository would benefit from data compression.
This article examines and compares the different Golang compression implementations. The comparison focuses on the resulting size. It’s not concerned about compression/decompression speeds, as the main requirement is for the smallest size possible.
We’ll look at gzip from the standard library. zstandard and brotli we’ll use the libraries linked to, which need installing like:
Installation of dependencies
go get github.com/klauspost/compress
go get github.com/andybalholm/brotli
↑Opening the XML file
Firstly we need to import the two additional libraries and the items from the standard library. We then define variables to hold the XML data and the compressed formats as byte buffers. We then use ioutil to read the contents of the XML file into a byte array.
package main
import (
"bytes"
"compress/gzip"
"io"
"io/ioutil"
"log"
"github.com/andybalholm/brotli"
"github.com/klauspost/compress/zstd"
)
func main() {
var err error
var auditMessageContent []byte
var auditBrotliMessageBuffer bytes.Buffer
var auditGZIPMessageBuffer bytes.Buffer
var auditZSTDMessageBuffer bytes.Buffer
auditMessageContent, err = ioutil.ReadFile("test-document.xml")
if err != nil {
log.Fatalln(err.Error())
}
}
↑gzip
The first implementation I wanted to try was gzip, which is based on the DEFLATE algorithm, a combination of LZ77 and Huffman coding and has been around for over thirty years. Many tools support the gzip format, and as it’s part of the Go standard library, it’s a good choice if you want to use something other than third-party libraries.
The function CompressWithGZIP accepts a single parameter: the data to compress. The function returns a bytes.Buffer, which contains the compressed data and an error struct if any error is detected within the function.
func CompressWithGZIP(input []byte) (bytes.Buffer, error) {
var encoder *gzip.Writer
var err error
var tmpBuffer bytes.Buffer
encoder, err = gzip.NewWriterLevel(
&tmpBuffer,
gzip.BestCompression)
if err != nil {
return tmpBuffer, err
}
_, err = encoder.Write(input)
if err != nil {
return tmpBuffer, err
}
if err := encoder.Close(); err != nil {
return tmpBuffer, err
}
return tmpBuffer, nil
}
It can be called from the main function like:
auditGZIPMessageBuffer, err = CompressWithGZIP(auditMessageContent)
if err != nil {
log.Fatalln(err.Error())
}
The original file was 43,099 bytes in size. The gzip’d version was 12,623 a saving of 70.72%.
↑zstandard
Zstandard, or zstd as a short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios. It’s backed by a fast entropy stage provided by Huff0 and FSE library.
The function CompressWithZSTD accepts a single parameter: the data to compress. The function returns a bytes.Buffer, which contains the compressed data and an error struct if any error is detected within the function.
func CompressWithZSTD(input []byte) (bytes.Buffer, error) {
var encoder *zstd.Encoder
var err error
var tmpBuffer bytes.Buffer
encoder, err = zstd.NewWriter(
&tmpBuffer,
zstd.WithEncoderLevel(zstd.SpeedBestCompression))
if err != nil {
return tmpBuffer, err
}
encoder.Write(input)
if err != nil {
return tmpBuffer, err
}
if err := encoder.Close(); err != nil {
return tmpBuffer, err
}
return tmpBuffer, nil
}
It can be called from the main function like:
auditZSTDMessageBuffer, err = CompressWithZSTD(auditMessageContent)
if err != nil {
log.Fatalln(err.Error())
}
The original file was 43,099 bytes in size. The zstd’d version was 12,495 a saving of 71.01%.
↑brotli
brotli, developed by Google, a generic-purpose lossless compression algorithm that compresses data using a combination of a modern variant of the LZ77 algorithm, Huffman coding and 2nd order context modeling, with a compression ratio comparable to the best currently available general-purpose compression methods. It is similar in speed with deflate but offers more dense compression.
The function CompressWithBrotli accepts a single parameter: the data to compress. The function returns a bytes.Buffer, which contains the compressed data and an error struct if any error is detected within the function.
func CompressWithBrotli(input []byte) (bytes.Buffer, error) {
var encoder *brotli.Writer
var err error
var tmpBuffer bytes.Buffer
encoder = brotli.NewWriterLevel(
&tmpBuffer,
brotli.BestCompression)
_, err = encoder.Write(input)
if err != nil {
return tmpBuffer, err
}
if err := encoder.Close(); err != nil {
return tmpBuffer, err
}
return tmpBuffer, nil
}
It can be called from the main function like:
auditBrotliMessageBuffer, err = CompressWithBrotli(auditMessageContent)
if err != nil {
log.Fatalln(err.Error())
}
↑The Results
The results we’re calculated by printing out the length of each byte buffer:
log.Println("Original:", len(auditMessageContent))
log.Println("GZIP :", len(auditGZIPMessageBuffer.Bytes()))
log.Println("ZSTD :", len(auditZSTDMessageBuffer.Bytes()))
log.Println("Brotli :", len(auditBrotliMessageBuffer.Bytes()))
Which produced the following results:
Original: 43099
GZIP : 12623 70.72% savings
ZSTD : 12495 71.01% savings
Brotli : 10605 75.40% savings
↑Decompressing Brotli
From these results, I will pick Brotli as the compression algorithm, so let’s look at how you would decompress the data back into the original data.
func UnCompressWithBrotli(input []byte) (string, error) {
var decompressedContent bytes.Buffer
br := brotli.NewReader(bytes.NewReader(input))
_, err := io.Copy(&decompressedContent, br)
if err != nil {
return ``, err
}
return decompressedContent.String(), nil
}