Comparison of Go data compression implementations - gzip, zstandard, brotli
A comparison of the different data compression algorithms and which performs the best on XML files
↑Introduction
I need to store XML messages within the vault repository as part of my ATNA Audit Vault project. Although the XML messages are individually small, the vault must hold millions of new messages each year; therefore, the repository would benefit from data compression.
This article examines and compares the different Golang compression implementations. The comparison focuses on the resulting size. It’s not concerned about compression/decompression speeds, as the main requirement is for the smallest size possible.
We’ll look at gzip from the standard library. zstandard and brotli we’ll use the libraries linked to, which need installing like:
Installation of dependencies
1go get github.com/klauspost/compress
2go get github.com/andybalholm/brotli
↑Opening the XML file
Firstly we need to import the two additional libraries and the items from the standard library. We then define variables to hold the XML data and the compressed formats as byte buffers. We then use ioutil to read the contents of the XML file into a byte array.
1package main
2
3import (
4 "bytes"
5 "compress/gzip"
6 "io"
7 "io/ioutil"
8 "log"
9
10 "github.com/andybalholm/brotli"
11 "github.com/klauspost/compress/zstd"
12)
13
14func main() {
15 var err error
16 var auditMessageContent []byte
17 var auditBrotliMessageBuffer bytes.Buffer
18 var auditGZIPMessageBuffer bytes.Buffer
19 var auditZSTDMessageBuffer bytes.Buffer
20
21 auditMessageContent, err = ioutil.ReadFile("test-document.xml")
22 if err != nil {
23 log.Fatalln(err.Error())
24 }
25
26}
↑gzip
The first implementation I wanted to try was gzip, which is based on the DEFLATE algorithm, a combination of LZ77 and Huffman coding and has been around for over thirty years. Many tools support the gzip format, and as it’s part of the Go standard library, it’s a good choice if you want to use something other than third-party libraries.
The function CompressWithGZIP accepts a single parameter: the data to compress. The function returns a bytes.Buffer, which contains the compressed data and an error struct if any error is detected within the function.
1func CompressWithGZIP(input []byte) (bytes.Buffer, error) {
2 var encoder *gzip.Writer
3 var err error
4 var tmpBuffer bytes.Buffer
5
6 encoder, err = gzip.NewWriterLevel(
7 &tmpBuffer,
8 gzip.BestCompression)
9
10 if err != nil {
11 return tmpBuffer, err
12 }
13
14 _, err = encoder.Write(input)
15 if err != nil {
16 return tmpBuffer, err
17 }
18
19 if err := encoder.Close(); err != nil {
20 return tmpBuffer, err
21 }
22
23 return tmpBuffer, nil
24}
It can be called from the main function like:
1auditGZIPMessageBuffer, err = CompressWithGZIP(auditMessageContent)
2if err != nil {
3 log.Fatalln(err.Error())
4}
The original file was 43,099 bytes in size. The gzip’d version was 12,623 a saving of 70.72%.
↑zstandard
Zstandard, or zstd as a short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios. It’s backed by a fast entropy stage provided by Huff0 and FSE library.
The function CompressWithZSTD accepts a single parameter: the data to compress. The function returns a bytes.Buffer, which contains the compressed data and an error struct if any error is detected within the function.
1func CompressWithZSTD(input []byte) (bytes.Buffer, error) {
2 var encoder *zstd.Encoder
3 var err error
4 var tmpBuffer bytes.Buffer
5
6 encoder, err = zstd.NewWriter(
7 &tmpBuffer,
8 zstd.WithEncoderLevel(zstd.SpeedBestCompression))
9
10 if err != nil {
11 return tmpBuffer, err
12 }
13
14 encoder.Write(input)
15 if err != nil {
16 return tmpBuffer, err
17 }
18
19 if err := encoder.Close(); err != nil {
20 return tmpBuffer, err
21 }
22
23 return tmpBuffer, nil
24}
It can be called from the main function like:
1auditZSTDMessageBuffer, err = CompressWithZSTD(auditMessageContent)
2if err != nil {
3 log.Fatalln(err.Error())
4}
The original file was 43,099 bytes in size. The zstd’d version was 12,495 a saving of 71.01%.
↑brotli
brotli, developed by Google, a generic-purpose lossless compression algorithm that compresses data using a combination of a modern variant of the LZ77 algorithm, Huffman coding and 2nd order context modeling, with a compression ratio comparable to the best currently available general-purpose compression methods. It is similar in speed with deflate but offers more dense compression.
The function CompressWithBrotli accepts a single parameter: the data to compress. The function returns a bytes.Buffer, which contains the compressed data and an error struct if any error is detected within the function.
1func CompressWithBrotli(input []byte) (bytes.Buffer, error) {
2 var encoder *brotli.Writer
3 var err error
4 var tmpBuffer bytes.Buffer
5
6 encoder = brotli.NewWriterLevel(
7 &tmpBuffer,
8 brotli.BestCompression)
9
10 _, err = encoder.Write(input)
11 if err != nil {
12 return tmpBuffer, err
13 }
14
15 if err := encoder.Close(); err != nil {
16 return tmpBuffer, err
17 }
18
19 return tmpBuffer, nil
20}
21
It can be called from the main function like:
1auditBrotliMessageBuffer, err = CompressWithBrotli(auditMessageContent)
2if err != nil {
3 log.Fatalln(err.Error())
4}
↑The Results
The results we’re calculated by printing out the length of each byte buffer:
1log.Println("Original:", len(auditMessageContent))
2log.Println("GZIP :", len(auditGZIPMessageBuffer.Bytes()))
3log.Println("ZSTD :", len(auditZSTDMessageBuffer.Bytes()))
4log.Println("Brotli :", len(auditBrotliMessageBuffer.Bytes()))
Which produced the following results:
1Original: 43099
2GZIP : 12623 70.72% savings
3ZSTD : 12495 71.01% savings
4Brotli : 10605 75.40% savings
↑Decompressing Brotli
From these results, I will pick Brotli as the compression algorithm, so let’s look at how you would decompress the data back into the original data.
1func UnCompressWithBrotli(input []byte) (string, error) {
2 var decompressedContent bytes.Buffer
3
4 br := brotli.NewReader(bytes.NewReader(input))
5 _, err := io.Copy(&decompressedContent, br)
6 if err != nil {
7 return ``, err
8 }
9
10 return decompressedContent.String(), nil
11}