VPack is a general purpose library for binary serialization/deserialization (which we shall refer to as packing/unpacking).
It implements a robust scheme that allows you to be confident you're not making subtle mistakes which could otherwise cause data corruption.
While this package was originally designed to serve as a building block to VBolt, it is in fact general purpose and has no dependency on VBolt.
The techniques described below were largely inspired by the following blog post:
How Media Molecule Does Serialization
Which itself was informed by the author (Oswald Hurlem) asking Alex Evans from Media Molecule about how to do robust versioned serialization; the answers from Alex were recorded here:
gist: mmalex serialization formats
Given a type T, we define a packing function like this:
func PackT(obj *T, buf *vpack.Buffer)
The buffer has a Writing
flag that determines whether we're reading from it or
writing to it.
This packing function can then be used to both pack and unpack instances of
T to and from a byte buffer using vpack.ToBytes
and vpack.FromBytes
or
vpack.FromBytesInto
.
// t is T
// data is []byte
data := vpack.ToBytes(&t, PackT)
// data is []byte
// t is *T
t := vpack.FromBytes(data, PackT)
// data is []byte
var t T // empty object
vpack.FromBytesInto(data, &t, PackT)
The difference can be subtle, but FromBytesInto
allows you to program without
littering your code with nil checks.
If an error was encountered during unpacking, FromBytes will return a nil
pointer, while FromBytesInto
will return false
.
Let's look at a concrete - if contrived - example.
type XYZ struct {
Unit int
Energy int
Chapter string
President bool
}
func PackXYZ(xyz *XYZ, buf *vpack.Buffer) {
vpack.Version(1, buf)
vpack.Int(&xyz.Unit, buf)
vpack.Int(&xyz.Energy, buf)
vpack.String(&xyz.Chapter, buf)
vpack.Bool(&xyz.President, buf)
}
The function packs the struct fields in sequence using packing functions
defined in the vpack
package. Each of these functions itself follows the same
signature: taking a pointer to the item it packs and a pointer to the buffer.
func Int(n *int, buf *Buffer)
func String(s *string, buf *Buffer)
func Bool(b *bool, buf *Buffer)
Now let's consider one line in isolations:
vpack.Int(&xyz.Unit, buf)
Depending on the Writing flag on buf, this function will either dump the value
of xyz.Unit
to the buffer, or it will read data from the buffer and assign it
to xyz.Unit
.
This same logic applies to all these lines:
vpack.Int(&xyz.Unit, buf)
vpack.Int(&xyz.Energy, buf)
vpack.String(&xyz.Chapter, buf)
vpack.Bool(&xyz.President, buf)
The aggregate effect is that when the Writing flag is on, the entire struct is written to the buffer, and when the Writing flag is off, the entire struct is read from the buffer.
As a user of this package, you almost never have to worry about checking the
Writing
flag yourself; the functions for packing the primitive types do that
so that you don't have to.
This architecture solves two problems at once:
- You don't need to duplicate the code for reading and writing.
- You don't need to worry about subtle mistakes with catastrophic effects!
Just pack the fields in some sequence, and you're guaranteed that unpacking will happen in exactly the same order they were written.
The first line specifies the version. It doesn't look like the other packing functions, so why is it there and what does it do?
vpack.Version(1, buf)
Robust packing for long term storage requires supporting schema evolution through a version flag: before writing out the fields, write a version number.
For version 1, we don't do anything special. We just write the version to allow the code to evolve in the future.
In the future, when we make changes (add fields, remove fields), we surround the relevant item packing lines with if blocks against version ranges.
Keeping with our example, let's suppose that we decided to remove the Energy
field, and add a Price
field.
type XYZ struct {
Unit int
Price int
Chapter string
President bool
}
How would the packing function change?
We need to consider three things:
- Fields we used to pack, but that no longer exist:
- Guard with if version <= last version field existed
- Read into a local varible instead of a struct field
- New fields we added:
- Guard with if version >= first version field was added
- Changes in data representation
- Application specific
In this example, since it's contrived, we'll say that in the new model, Price
is Energy * Unit
.
Here's how the packing function would look like:
func PackXYZ(xyz *XYZ, buf *vpack.Buffer) {
version := vpack.Version(2, buf)
vpack.Int(&xyz.Unit, buf)
if version == 1 {
// version 1 is the last version that had the Energy field
var energy int
vpack.Int(&energy, buf)
// version migration code:
// In the new version, price is energy * unit
xyz.Price = energy * xyz.Unit
}
if version >= 2 {
// version 2 is the first version where Price was added
vpack.Int(&xyz.Price, buf)
}
vpack.String(&xyz.Chapter, buf)
vpack.Bool(&xyz.President, buf)
}
You do need to be careful so that the field order does not change between versions. We introduce a simple testing technique further down.
The basic building block is a Buffer
struct that wraps a byte slice, and has a
Writing
flag. This flag is the key innovation that allows this library to be
what it is.
It allows the packing function to fulfill the role of both reading and writing (serialization and deserialization, or packing and unpacking) at the same time.
If the buffer has the Writing
flag set, the packing function writes the
binary representation of the object to the buffer. If the flag is not set, it
reads from the buffer into the object passed.
We saw struct packing code that does not check the flag itself; it just calls more "primitive" packing functions.
These functions do work with the flag. For example, here's the function for
packing a uint64
using varint
encoding:
func VInt64(func VInt64(n *int64, buf *Buffer) {
if buf.Writing {
buf.Data = binary.AppendVarint(buf.Data, *n)
} else {
var err error
*n, err = binary.ReadVarint(buf)
if err != nil {
buf.Error = true
}
}
}
VPack provides enough primitive packing functions that you generally don't need to implement your own, but we don't discourage you from implementing your own primitive packing functions.
You just have to be really careful with them, and test them very well. Small subtle mistakes here can cause data corruption if you're not careful!
The most important thing to worry about is the reading code has to consume the exact same number of bytes that were written by the writing code.
If this condition is not upheld, data will be corrupted.
VPack provides helper functions to pack slices and maps:
func Slice[T any](list *[]T, fn PackFn[T], buf *Buffer)
func Map[K comparable, T any](m *map[K]T, keyFn PackFn[K], valFn PackFn[T], buf *Buffer)
To pack an int slice field []int
named Ids
, you can call:
vpack.Slice(&item.Ids, vpack.Int, buf)
You can also use this to packs slices of objects. Suppose you have these types:
type ArticleAuthor struct {
AuthorId int
Nickname string
Contribution string
}
type Article struct {
ArticleId int
Title string
Authors []ArticleAuthor
}
Notice the field Authors on Article is a slice of objects.
Here are the packing functions:
func PackArticleAuthor(self *ArticleAuthor, buf *vpack.Buffer) {
vpack.Int(&self.AuthorId, buf)
vpack.String(&self.Nickname, buf)
vpack.String(&self.Contribution, buf)
}
func PackArticle(self *Article, buf *vpack.Buffer) {
vpack.Int(&self.ArticleId, buf)
vpack.String(&self.Title, buf)
vpack.Slice(&self.Authors, PackArticleAuthor, buf) // <<<<<<
}
Notice how we pack the list of Authors using vpack.Slice
.
Earlier we explained versioning on a contrived example, and we mentioned that we do need to take some care with how code is changed, and that this is one area where a small mistake could cause problems.
So how do we test the versioning code? Here's a simple idea:
We can duplicate (and rename) our old versions of the struct and serialization functions
type XYZ_V1 struct {
Unit int
Energy int
Chapter string
President bool
}
func PackXYZ_V1(xyz *XYZ, buf *vpack.Buffer) {
vpack.Version(1, buf)
vpack.Int(&xyz.Unit, buf)
vpack.Int(&xyz.Energy, buf)
vpack.String(&xyz.Chapter, buf)
vpack.Bool(&xyz.President, buf)
}
In the test code, we write to a buffer using PackXYZ_V1
, then we read from the
same buffer using PackXYZ
, and we check the fields we read to make sure the
data matches what we expect
// helper function inside yor unit test
compareXYZ_V1_V2 := (v1 XYZ_V1, expected XYZ) {
var packed = vpack.ToBytes(&v1, PackXYZ_V1)
var actual XYZ
vpack.FromBytes(packed, &actual, PackXYZ)
if actual != expected {
// report error
}
}