Delta Coding Techniques
Introduction
Delta Coding is the idea of encoding the difference
(diff/delta) between two instances of something instead of
encoding the "whole" value. It can be seen as a form of
lossless compression mechanism, where we have two values: a base
value (sometimes called a previous value) and a new incoming
value. The objective is to generate an encoding using the base
value and the new value and to be able to restore the new value
using the encoding and the base value. We therefore always
require the base value to both encode/decode, the "new" value is
what gets encoded.
In this article we will go over
some examples, ideas and some actual implementations.
Some programs might be running some form of
computation every n seconds. Instead of storing the result
of every computation, we could store the difference between
the most recent computation (new value) and a base computation (base value),
resulting in data that is potentially smaller (this
is explained in more details below). This would be the
encoding part. We could then restore the most recent
value by doing the opposite, which would be the decoding
part.
Another example might be some form of streaming software where
frames are computed and sent over the wire at a certain
rate. Instead of sending the full frame every time, we could
send the delta between the previous frame and the current
frame. This could potentially reduce network traffic by a lot.
The receiving end could then restore the value from the delta and
the previous frame and so on.
This also has some obvious downsides, one being that
if a packet containing one frame is dropped (or if they arrive out
of order, depending on the underlying protocol) then the
following delta won't be restored properly on the receiving
end. Another being that the sending side needs to keep a valid copy
of the previous frame at all times. There's also the
time it takes from generating the delta. As always there are
tradeoffs.
Delta coding therefore has two phases: an encoding
part (serialization), where we get the delta between two instances of a same
type (or any data with some form of schema) and a decoding
(deserialization) part,
where we restore
the value that was delta'ed using the base value. We therefore
always need a non-encoded base value to encode/decode against,
this is the interesting particularity, and also a potential downside for
some use cases of delta coding.
The goal is typically to reduce network traffic, memory bandwith or storage size.
Fundamentals
Delta coding typically require the implementator to write encoders
(serializers) and decoders (deserializers) for whatever needs to be
delta encoded/decoded.
To do so it's useful to have a mental model where we think
purely in terms of information theory. Depending on how deep we
want/need to go it's good to have an understanding of the binary
representation of the data we want to delta encode.
The general idea is to obtain the smallest possible binary encoding
for our data, ideally while keeping things as simple as possible. We
might also need to consider endianess (it's quite easy to keep
things endian agnostic) and to keep the cost of
encoding/decoding our data to a minimum.
To do so we need to come up with a schema that for our encodings.
Typically we want our encodings to be byte-even (or byte rounded).
What I mean by this is that if the delta encoding of two values
require 6 bits, we might have to round it to 8bits/1byte for
storage or usage purposes.
The usual strategy is to generate a delta value that has the most
leading or trailing 0 bits possible, so that we can simply
shave them off. For example a 64 bit value representing the number 1
only requires 1 actual bit of information, if that bit is on
the right most side (reading from left to right),
we would say it has 63 leading zeroes
that we can remove (wasted space). The cpu often offers
intrinsics for counting leading and trailing zeroes that we
can leverage.
The encoders accept two instances of a type and
output an array of bytes.
The decoders take in the previous instance and the encoded byte
array. It outputs the restored value.
Encoder(newInstance: T, previousInstance: T) -> byte[] Decoder(encoding: byte[], previousInstance: T) -> T
In the above snippet the T that the decoder returns has the value of "newInstance" that the Encoder() fn takes in, effectively restoring it.
For over the wire schemas we might also want to wrap our encodings or to prefix them with useful metadata about the encoding so that the decoders can have information about how to decode the delta and restore the value. This is particularily important for algebraic data types. We will see concrete examples that will shed light on all of this later.
32bits Integer Example
To get us warmed up let's say we have a system in which we are
constantly generating a signed 32 bit value. We first need to
define a schema that will represent the binary layout of our
encoding. This schema is the contract between the
encoder/decoder, it's effectively the protocol.
For the best delta encoding results, it's useful to have information
about our values. For instance if we know that in our
program the integer never has values that go over
100000, then we can design our schema with this in mind.
This is very much related to my previous post about
performance aware programming. For the sake of this example let's
say we can't make any assumptions about the values of our
data: we will need to make a generic schema.
ENCODING: [value: nbits][delta or not:1bit][sign bit: 1bit]
The above encoding is quite simple: We either encode the bits that represent the delta (aka the substraction) between two integer values or we directly encode the bits that represent the value when it's not worth it. To inform the decoder of that disctinction we have a bit that is set to 1 when we encode the delta. We then have the right most bit which represents the sign.
The delta encoding itself will be a substraction between the two int values. Signed integers typically have a sign bit at the MSB (i.e on the left most end). Our goal is to pack everything so that our encoding can remove as many bits as possible from the value, we therefore encode the sign bit at the LSB, right after the value. For example:
The value -6 in a byte: 1000_0110 // notice the leading 1 (sign bit) The value -6, sign bit at the left most position: 0000_1101
With this simple tactic we can now completely get rid of all the leading bits! We therefore went from requiring 8 bits (since we needed the sign bit which was at the right most position) to requiring 4 bits to hold the same information.
The other thing to notice in our encoding from above is that there is a bit reserved for "delta or not". This is because for some values doing the delta is not worth it. For example, if we have a previous value 124124124 and a new value of 3, it's much better to directly encode 3 than to encode the delta (the substraction) between 124124124 and 3 (which is 124124121) since the number 3 requires way less bits to represent.
Here's an implementation of the encoder and decoder we just described:
// encoding reminder: value of the delta or of the new value // directly if not worth doing delta, 1bit delta or not, 1bit for the sign bit. // Takes two signed 32bits integers and returns the binary encoding as an // unsigned 32 bit integer, respecting the schema. fn EncodeInt32(toEncode: i32, previous: i32) -> byte[] { // A single byte with values set to 0 means toEncode and previous are equal. if (toEncode == previous) { return {0b_0000_0000}; } delta = previous - toEncode; // Encode the bit that says if the encoding holds a delta or not. // set it to 0 (not holding a delta) by default. delta <<= 1; // worth it (delta requires less bit to represent than the value to encode) : set the bit to 1 if (Math.Abs(delta) < Math.Abs(toEncode)) { delta |= 1 << 0 } // if the resulting delta is negative, set the sign bit (lsb) to 1. if (delta < 0) { return (Math.abs(delta) << 1 | 1); } // else set it to 0 return delta << 1; } // Takes the encoding and the previous value and returns the restored value. DecodeInt32(toDecode: byte[], previous: i32) -> i32 { // indicates they previous/new are the same, as seen in the encoder. if (toDecode[0] == 0) return previous; const byteSize = 8; // shift the encoded byte array into an integer variable, so that it // can be decoded. u64 bitmap = 0; for (int i = toDecode.length - 1; i >= 0; i--) { bitmap = (bitmap << byteSize | (toDecode[i] & 0xFF)); } // control bits are the "delta or not" bit and the sign bit const nbControlBits = 2; // introduce a new var that represents the encoding without the ctrl bits. encodedValue = (bitmap >> nbControlBits); // The "delta or not bit" (which is the second right-most bit) is 1, // meaning we are dealing with a delta. if ((bitmap & 0b_0000_0010) == 1) { // Sign bit (right-most bit) is 1, meaning delta is negative. if ((bitmap & 0x01) == 1) { return previous + encodedValue; } // delta is positive. return previous - encodedValue; } // we are not dealing with a delta. Simply check if negative or not // Sign bit is 1, meaning the value is negative (convert it). if ((toDecode & 0x01) == 1) { return 0 - encodedValue; } // Sign bit is 0, return the value directly. return encodedValue; }
Surely this all seems a bit complicated to simply save a few bits. It's probably not worth it to do all of this for 32 bits integers in most cases. With this technique in the best case (when the values are the same), the encoding requires a single byte. The interesting part happens when we are dealing with more complex types or with collections. But first let's look at the two top level approaches.
The Simple Approach
The simplest way to do delta coding is a form of "fake" delta encoding, where we never actually compute the delta between two values. We simply look at the "new" value and if it's the same as the previous one, we say it's "0" (or have an invalid value that represent this state). If it's different, then we simply have the new value. That way we only save space when the values are the same.
The decoder see's 0, and therefore knows the value was 5. 0 requires less bits to represent than 5, therefore a win. (Bare with me and ignore to caveats (like not being able to represent the actual value 0) for now).
The decoder see's it's not 0, and therefore knows the "updated" value is 15. With this approach we only save space when the previous value and the new value are the exact same. If this happens often-ish this is a very good solution since it has extremely low complexity.
This is also often the best solution for many types. For example strings are notoriously bad for these sorts of manipulations, it's often simply better to have an encoding where when two strings are the same we output 0, and when they are different we output the most recent string. Maps (k/v) are also notiriously bad since they are not contiguous in memory by design: we therefore often need to keep the key intact in the encoding, which takes space. The simple method is to design an encoding schema where the key is always present followed or prefixed by a bit (or byte) that says if the value is the same as it previously was for that key or if it's delta encoded (or simply encoded as is, without any modifications/delta). Obviously this depend on the types of the key/value. A k/v pair where the key is a primitive or a string and the value is a more complex type can get a lot of benefit from using delta encoding between the different values of a given key.The Interesting Approach
This is the "legit"/actual approach, where we defined schemas and build encoders/decoders that are able to generate a binary delta between two instances of our data. Our signed 32 bits integer example from above used this approach. Let's see more examples below.
Example: Array of 32 bit integers
This is where we can start to see enormous compression gains.
We want to encode arbitrary sized arrays of 32 bits signed integers.
The decoders will need to know how many elements it has to decode,
and we need a way to know how to delta, since there are many values.
The simplest way is probably to encode via index. Array1[5] will be
delta'ed with Array2[5], this makes it easier for the encoder/decoder
without having to make assumptions about the data. If we could make
assumptions we absolutely should, but let's keep things generic
here.
As previously displayed in the 32 bit integer example above, a good
way to tell the decoder that a given value is the same as it
previously was is to encode a 0. In that example, we needed things to be
byte rounded, since we were encoding a single int directly and wanted to
manipulate it (send it over the wire, decode it, etc..). With an array
of integers (or of any types for that matter), every single int within
the array can be encoded in arbitrary bit lengths, as long as the whole array
is encoded as a valid byte array. This means that a delta between two
numbers that result in the number 6 would not need to be encoded using 1
byte, it could be encoded using 3 bits only, as long as the top level
decoder knows and is able to restore the number properly. This gives
us way more opportunities to optimize if we so desire.
We will use the following strategy to encode two arrays of integers:
fn IntToBytes(toEncode: i32) -> byte[] { byte[4] bytes; bytes[0] = number & 0xFF; bytes[1] = (number >> 8) & 0xFF; bytes[2] = (number >> 16) & 0xFF; bytes[3] = (number >> 24) & 0xFF; return bytes; } BytesToInt(toDecode: byte[]) -> i32 { return (bytes[0] & 0xFF) | ((bytes[1] & 0xFF) << 8) | ((bytes[2] & 0xFF) << 16) | ((bytes[3] & 0xFF) << 24); } EncodeInt32Raw(toEncode: i32, previous: i32) -> i64 { // as always, when they are the same return 0. if (toEncode == previous) { return 0; } i64 delta = previous - toEncode; // if delta is negative shift sign bit to right most position. if (delta < 0) { return (Math.abs(delta) << 1 | 1); } // if delta is positive, sign bit is 0. return delta << 1; } // other way around. Gets us the i32 back from the encoding and the base. DecodeInt32Raw(toDecode: i64, previous: i32) -> i32 { if (toDecode == 0) { return 0; } // remove the sign bit. delta = toDecode >> 1; // Sign bit is 1, meaning delta is negative. if ((toDecode & 0x01) == 1) { return previous + delta; } return previous - delta; }
Here's the schema we will use for our array/list of int's encoding:
[HEADER][SEQUENCE] HEADER -> [32bits: number of encoded items][32bits: total nb of bytes for the encoding]. SEQUENCE -> [[change byte][x: encodings], [next change byte..], ..., n]
The first 8 bytes represent the header which has two 4 bytes sections: the number of encoded items and the total nb of bytes for the encoding.
This is followed by the "sequence" (the encoding of the elements). Which follows the pattern: change byte, encodings, change byte, encodings.. Where every bit of every change byte represents if the next value to decode changed from the base array to the new array or not. The encodings will contain a header for each value, which tells us how many bits are required for the encoding of the value.
Encoder:
// NOTE: this is not production code at all and is meant as an example. // Porting it to your favorite language will result in a fully functional encoder though. fn EncodeIntArray(toEncode: i32[], previous: i32[]) -> byte[] { // no changes, decoder needs to check for single byte to decode. if (toEncode == previous) { return {0b_0000_0000}; } // use helper fn to get the nb of items in the array to encode as an int. byte[] header = IntToBytes(toEncode.length); // dynamic array that will contain the encoding. (in real scenario have a // heuristic that initiates with a capacity to avoid too much realloc). byte[] encodedList; // The first 4 bytes of the encoding is the first part of the header: // the number of items to encode. encodedList[0] = header[0]; encodedList[1] = header[1]; encodedList[2] = header[2]; encodedList[3] = header[3]; // the next (and last) part of the header will be known at the end. Declare it now. byte[] subHeader = byte[4]; encodedList[4] = 0; encodedList[5] = 0; encodedList[6] = 0; encodedList[7] = 0; // The "sequence" part of the schema starts here, at index 8 (0-7 is header // defined above). changeByteIndex = 8; // the first change byte is at index 8, after header. encodedList[8] = 0; // set first change byte to 0 // Keeps track of the index of the current bit in the current changeByte. changeByteBitIndex = 7; // last bit in byte from pos 0, we go from left to right // Iterate over all the elements of the "new" array we need to encode. for (int i = 0; i < toEncode.length; i++) { // When true this means the next byte is the next changebyte, we // therefore reset. if (changeByteBitIndex < 2 || changeByteBitIndex > 7) { changeByteBitIndex = 7; encodedList.push(0); // start next change byte with a value of 0. changeByteIndex = encodedList.length - 1; } // this will contain the encoding for the value we are currently encoding. encodedInt = 0; // Only encode with previous when possible (to protect from when the base array (previous // array) has less elements than the new one we are trying to encode). if (i < previous.length) { // Use our "raw" int32 encoder which returns an int. encodedInt = EncodeInt32Raw(toEncode[i]), previous.[i]); // previous[i] == toEncode[i], set/leave current bit of changeByte // is 0 and move on to the next iteration. This is the NO CHANGE scenario if (encodedInt == 0) { changeByteBitIndex--; continue; } } // the new int array (toEncode) is longer than the base (previous) array // and we reached an index that exists in toEncode but not in previous. // This means we can't delta by index, we need to encode the int as is. else { // When we cant delta we still want to encode the sign bit at lsb to save on bits. // Leverage our int32 raw encoder for this, encoding against 0. encodedInt = EncodeInt32Raw(0, toEncode.[i]); } // there is a change: set corresponding bit in changeByteIndex to 1 updatedByteIdx = encodedList[changeByteIndex] | 0x01 << (changeByteBitIndex) & 0xFF); encodedList[changeByteIndex] = updatedByteIdx; changeByteBitIndex = changeByteBitIndex - 1; // move down by 1. // Get the even nb of bytes required to hold the encoded int. nbBits = CpuIntrinsics.numberOfLeadingZeros(encodedInt); nbBytesRequired = ((32 - nbBits) + 7) / 8; // this is to encode the nb of bytes required (1-5) in 4 bits. 00 == 1, 01 == 2... nbBytesRequiredEncoded = nbBytesRequired - 1; // encode nb of bytes required for the encoded int in change byte (-1 because shifting "2" bits) updatedByteIdx = (encodedList[changeByteIndex] | (nbBytesRequiredEncoded << (changeByteBitIndex - 1)) & 0xFF); // If you dont fully grasp the above, just know that what we were doing // is setting the bits in the changeByte that represents the size of the // encoding for the current int we are encoding. encodedList[changeByteIndex] = updatedByteIdx; // -2 because above we used 2 bits to describe the nb of bytes required. changeByteBitIndex = changeByteBitIndex - 2; // finally, encoding the value. nextByteValue = 0; idx = nbBytesRequired - 1; for (int j = 0; j < nbBytesRequired; j++) { nextByteValue = (encodedInt >> (8 * idx)) & 0xFF; encodedList.push(nextByte); idx = idx - 1; } } // We have iterated over all the ints of the "new" array that we want to // encode. We can now finish writing the header and return. subHeader = IntToBytes(encodedList.size()); encodedList[4] = subHeader[0]; encodedList[5] = subHeader[1]; encodedList[6] = subHeader[2]; encodedList[7] = subHeader[3]; return encodedList; }
phew! we got it. Hopefully that makes sense, I understand it can be a bit of an headache. Now for the decoder...
// Takes in the encoding (byte[]) and the base i32[]. Returns our initially encoded i32[]. fn DecodeIntArray(toDecode: byte[], previous: i32[]) -> i32[] { // no changes! new == base. So return base (previous). if (toDecode.length == 1 && toDecode[0] == 0) return previous; // get the first section of the header, to know how many elements we need to decode. header = BytesToInt(toDecode[0..3]); i32[] decodedList = i32[header]; // init with exactly the amount we'll need. // the first change byte starts at 8 (after the header) as we know. changeByte = toDecode[8]; // First encoding index is unknown at this point // (but we know that it's at least at changeByteIndex + 1) index = changeByteIndex + 1; changeByteBitIndex = 7; // last bit in byte from pos 0 // iterate "header" times: the exact nb of items we will decode. for (int i = 0; i < header; i++) { // hop to next change byte if needed. if (changeByteBitIndex < 2 || changeByteBitIndex > 7) { changeByteBitIndex = 7; // this implies the past 8 values were all encoded as 0s (so all the // same between the new array and the base array during encoding) in the // changeByte itself, the index therefore did not move. if (changeByteIndex == index) { changeByteIndex = index + 1; } else { changeByteIndex = index; } changeByte = toDecode[changeByteIndex]; index = index + 1; } currentChangeBit = (changeByte >> changeByteBitIndex & 0x01) & 0xFF; nbBytesRequired = 0; // implies there was a change, read the pseudo header at current // position in the change byte which tells us the length of the encoding. if (currentChangeBit != 0) { // When a change bit is set to 1, get the next 2 bits in the // change byte which inform about the width of the // encoding. Do +1 since 00 == 1, 01 == 2.. nbBytesRequired = (changeByte >> (changeByteBitIndex - 2)) & 0xFF) & 0b0000_0011; nbBytesRequired = nbBytesRequired + 1; changeByteBitIndex = changeByteBitIndex - 3; } // will contain the encoding for the current value we try to decode. encoded = 0; // situation where there's going to be a delta (this index exists in both the new array and the base/previous array). if (i < previous.length) { // no change, so simply add what was in the base/previous array at index i in the new array if (currentChangeBit == 0) { decodedList.push(previous[i]); changeByteBitIndex--; continue; } for (int j = 0; j < nbBytesRequired; j++) { encoded = encoded << 8 | (toDecode[index]) & 0xFF; index = index + 1; } decodedList.push(DecodeInt32Raw(encoded, previous.[i])); continue; } for (int j = 0; j < nbBytesRequired; j++) { encoded = encoded << 8 | (toDecode[index]) & 0xFF; index = index + 1; } // remove encoded sign bit delta = encoded >> 1; if ((encoded & 0x01) == 1) { // nb is negative delta = delta * -1; } decodedList.push(delta); } return decodedList; }
Alright we got it. An encoder and decoder for i32[]!
Strategy for algebraic data types (structs/classes)
// todo!();
This article is getting quite long. A part 2 will come at some point.