What's behind Automerge? (Part II)

2023-06-13

16 min read

Storage

From operations to changes

Before applying operations received from other actors, it’s necessary to ensure that they are causally ready. This rule applies to most Operation-based CRDTs ¹.

We can use vector clocks to ensure that all operations are causally applied, but it is not Byzantine fault tolerant. Another way is to use Hash DAG ². By recording the dependencies of each operation, other actors can determine if it is causally ready when they receive it by checking if its dependencies exist in the current OpSet.

As the number of operations increases, maintaining the Hash DAG can become cumbersome. In Automerge, we take a different approach by using change. A change is a combination of operations, as you have seen before:

let mut doc = Automerge::new();
let mut tx = doc.transaction();
tx.put(ROOT, "name", "Alice")?;
tx.put(ROOT, "age", "21")?;
tx.commit();

In this example, put(root, "name", "Alice") and put(root, "age", 21) belong to the same transaction. When committing, we combine all operations to create a change. A change is the minimum unit of applying, which implies that all the operations in a change are either fully applied or not applied at all.

By doing so, we can manage the dependencies between changes rather than dependencies between operations. In Automerge, we use a struct ChangeGraph to maintain this Hash DAG:

pub(crate) struct ChangeGraph {
    nodes: Vec<ChangeNode>,
    edges: Vec<Edge>,
    hashes: Vec<ChangeHash>,
    nodes_by_hash: BTreeMap<ChangeHash, NodeIdx>,
}

For example, a change graph created by actors may look like this:

Before applying change D, it is necessary to ensure that changes B and C have already been applied, as illustrated in the diagram.

Moreover, using changes not only avoids the need to maintain a large number of dependencies but also enables us to compress operations efficiently. These operations can be viewed as a table, where each operation represents a row in the table. By utilizing columnar encoding, we can encode this table and greatly reduce the space required to store it.

Moving forward, we will explore the process of encoding these operations into a compact format. Before proceeding, you have the option to familiarize yourself with the specification of Automerge binary encoding. However, it is also possible to skip the specification for now, as we will provide essential information on encoding a basic change in the following sections. In case you encounter any difficulties, you can refer back to the specification to resolve them.

Change encoding

In order to exchange changes between actors, we need to encode them. If we serialize all operations directly in sequence, it would take up a lot of space. This is because we use a large amount of metadata to describe elements in Automerge. For example, when working with text, we use UTF-8 to describe a character, but we need to use hundreds of bytes to describe metadata related to it, such as OpId, objId, and so on.

Therefore, to compress operations during the encoding of changes, we utilize columnar encoding. To give you an intuitive impression, we provide an example:

When we perform columnar encoding on a table consisting of such operations, we can use [5, 0@actor0] to encode the obj column, which represents 5 occurrences of 0@actor0.

In the next section, we will describe how to perform columnar encoding on a table of operations specifically.

Encoding

Run Length Encoding (RLE)

Run length encoded column consists of multiple (length, value) pairs. For example, [5, 5, 5, 3, 3] can be encoded as [(3, 5), (2, 3)].

To elaborate further, assuming the length is k, we follow these rules:

If k > 0, it means that the value is repeated k times.
If k = 0, it means that null is repeated value times.
If k < 0, it means that the next |k| values are not repeated. In other words, there is no compression in this case.

In Automerge, we use RLE on the following columns:

Column Name	Description	Format
object counter	counter of the object ID each operation targets	RLE of uLEB
key counter	counter of the operation ID of the key of each operation	RLE of uLEB
key string	The string key each operation targets	RLE of UTF8 Strings
action	The Action of each operation	RLE of uLEB

uLEB (unsigned Little-Endian Base) is a variable-length encoding for unsigned integer.

Delta encoding

When dealing with certain columns, RLE alone may not be efficient enough. For instance, if we have a column [1, 2, 3, 5, 7], we cannot use RLE to compress it. One solution is to use delta encoding first, encoding it as [+1, +1, +1, +2, +2], and then use RLE to encode it as [(3, +1), (2, +2)].

In Automerge, we use delta encoding on the following columns:

Column Name	Description	Format
counter	The counter of each operations ID	Delta encoding of uLEB

Boolean encoding

For the boolean type, we can make it more compact. The column contains sequences of uLEB integers that represent the lengths of alternating sequences of false/true. The initial value of the column is always false. For example, [false, true, true, true, false, false] would be encoded as [1, 3, 2].

In Automerge, we use boolean encoding on the following columns:

Column Name	Description
insert	Whether or not this is an insert operation

Actor encoding

Automerge utilizes a 16-byte random number to denote an actor ID. Due to its high byte count and small cardinality, it can be more efficiently managed by first saving it in a index -> actor ID map and then storing only the index. To store the index, RLE of uLEB is used.

In Automerge, we use actor encoding on the following columns:

Column Name	Description
object actor ID	actor index of object ID each operation targets
key actor ID	actor of the operation ID of the key of each operation
actor ID	The actor of each operations ID

Group encoding

In Automerge, some columns can contain multiple values per operation. To handle this, group encoding is used. It consists of two columns:

The first column represents the number of elements in the row and is encoded using RLE of uLEB. For instance, [[1, 2], [1, 2, 3]] would be encoded as [2, 3].
The second column contains the actual stored values and is encoded as a flat list. For example, [[1, 2], [1, 2, 3]] would be encoded as [1, 2, 1, 2, 3]. Additionally, group encoding can be combined with other encodings such as delta encoding, which would encode [1, 2, 1, 2, 3] as [+1, +1, -1, +1, +1].

In Automerge, we use group encoding on the following columns:

Column Name	Description	Format
predecessor actor IDs	The actor ID of each predecessor’s operation ID	Grouped Actor encoding
predecessor counters	The counter of each predecessor’s operation ID	Grouped Delta encoding
successor actor IDs	The actor ID of each successor’s operation ID	Grouped Actor encoding
successor counters	The counter of each successor’s operation ID	Grouped Delta encoding

Value encoding

For primitive values, we need to indicate their type and length. Therefore, we divide the value encoding into two columns:

The first column stores the value metadata, which looks like this：

The types represented by numbers 0 to 9 currently include: null, false, true, unsigned integer, signed integer, IEEE754 float, UTF8 string, bytes, counter, and timestamp.

The second column stores the actual bytes.

In Automerge, we use value encoding on the following columns:

Column Name	Description
value	The value of this operation

Change metadata

In addition to including all the operations, a change also needs to include some of its own metadata. In memory, we represent a change as follows:

pub(crate) struct Change<'a, O: OpReadState> {
    /// The raw bytes of the entire chunk containing this change, including the header.
    bytes: Cow<'a, [u8]>,
    header: Header,
    dependencies: Vec<ChangeHash>,
    actor: ActorId,
    other_actors: Vec<ActorId>,
    seq: u64,
    start_op: NonZeroU64,
    timestamp: i64,
    message: Option<String>,
    ops_meta: ChangeOpsColumns,
    /// The range in `Self::bytes` where the ops column data is
    ops_data: Range<usize>,
    extra_bytes: Range<usize>,
    _phantom: PhantomData<O>,
}

Here’s a brief explanation of what these metadata mean:

header contains information about the change chunk, such as the chunk type and checksum.
dependencies are the hash values of the changes on which this change depends.
actor refers to the actor ID that constructed the current change.
other_actors includes all the actors that appear later. It is actually the index -> ActorID map.
seq refers to the number assigned to the current change within the actor. Note that it is not the Lamport clock. <actor ID, seq> uniquely identifies a change.
The start_op refers to the operation ID at the beginning of the current change.
timestamp and message are specified during commit.
ops_meta describes the metadata of the operation table, such as which columns are included.
ops_data includes the actual data of the operation table.

Example

We use an example to illustrate how a change is encoded. Suppose we have the following code:

let mut doc = Automerge::new();
let mut tx = doc.transaction();
tx.put(ROOT, "name", "Alice")?;
tx.put(ROOT, "age", 21)?;
tx.commit();

It will be encoded to the byte sequence like this:

856f4a83264ba5060140001003ebab6d29df47f39c5ea7d4cd9d6e03010100000006150a340142025604570970027e046e616d65036167650202017e8601144c69616e6772756e150200
856f4a83fc117446013c0010ba92a37960334606aa47606579716f20010100000006150a340142025603570670027e046e616d65036167650202017e5614416c696365150200

We can deconstruct it into different parts:

0x856f4a83 // magic
0xfc117446 // checksum
0x01 // chuck type
0x3c // chunk length
0x00 // dependencies
0x10 // actor length
0xba92a37960334606aa47606579716f20 //actor 
0x01 // sequence number
0x01 // start op
0x00 // time
0x00 // message length
0x00 // other actors
0x06150a34014202560357067002 // operation column metadata
0x7e046e616d65036167650202017e5614416c696365150200 // operation columns

Change metadata

magic

The magic is defined as [0x85, 0x6f, 0x4a, 0x83].

checksum

The checksum is calculated by computing the SHA256 of all subsequent parts and taking the first 4 bytes.

SHA256(013c0010ba92a37960334606aa47606579716f20010100000006150a340142025603570670027e046e616d65036167650202017e5614416c696365150200) = fc117446c2701317ab462d610d17981fc12ac4cae6e242515d401db831a6e6d4

chunk type

The chunk type of a change is always equal to 0x01.

chunk length

The chunk length indicates how many bytes follow and is of type uLEB. In this case, 0x3c is equal to 0b00111100, so the length of the following bytes is 0b011 1100, which is 60 bytes.

dependencies

The format for dependencies is as follows: first, there is a uLEB that indicates how many dependencies there are. In this case, the value is 0, which means there are no dependencies

actor length

This is in uLEB format, which indicates the length of the actor in bytes. 0x10 represents 16 bytes, so the next 16 bytes represent the actor.

actor

0xba92a37960334606aa47606579716f20 is the actor ID.

sequence number

In uLEB format, it represents the seq of the change. Here, 0x01 = 1, so the sequence number is 1.

start op

In uLEB format, 0x01 = 1, so the start op is 1.

time

In uLEB format, the time is empty by default, so here 0x00 = 0.

message length

In uLEB format, the message is empty by default, so here 0x00 = 0.

other actors

It uses a uLEB to indicate how many actors there are. Here, since we have no other actors, 0x00 = 0.

Operation column metadata

Following the uLEB 0x06 = 6 indicating the number of columns, there are 6 pairs of <Column Specification, Column Length>:

Column Specification	Column Length
0x15 = 21, key string	0x0a
0x34 = 52, insert	0x01
0x42 = 66, action	0x02
0x56 = 86, value metadata	0x03
0x57 = 87, value	0x06
0x70 = 112, predecessor group	0x02

The sum of 0x0a + 0x01 + 0x02 + 0x03 + 0x06 + 0x02 is 24, so the next 24 bytes are operation columns.

Operation columns

key string：

7e046e616d650361676

In Automerge, key string is encoded using Run Length Encoding (RLE).

Here, the length is 0x7e = 0b01111110, which in two’s complement is -2, indicating that there is no compression.

The value part is 046e616d6503616765, which consists of several <uLEB byte count + actual bytes> pairs.

The first byte count is 0x04 = 4, indicating that the next 4 bytes is the first string, which is 0x6e616d65, resulting in “name” when decoded using UTF-8.
The second byte count is 0x03 = 3, indicating that the next 3 bytes is the second string, which is 0x616765, resulting in “age” when decoded using UTF-8.

Therefore, we end up with the key strings ["name", "age"].

insert

The type of insert is boolean, which is encoded using Boolean Encoding.

Therefore, [0x02] represents [false, false].

action

RLE is used for the action column. Here, 0x02 = 2, indicating that the value is repeated twice.

The value part is 0x01. Repeating it twice gives us [0x01, 0x01].

According to the Automerge specification, 0x01 as the action value means “set”. Therefore, the action column is [set, set].

value metadata

7e5614

The value metadata is encoded using the Value Encoding.

0x56 = 0101 0110 can be interpreted as 0b 101 0110, which means the first type is 0b0110 (String), and the length is 0b101 = 5 bytes.

The following 0x14 = 0b0001 0100, indicates that the second type is 0b100 (Signed Integer), and the length is 0b1 = 1 byte.

So the entire value metadata column can be interpreted as [<String, 8>, <Signed Integer, 1>].

value

416c69636515

0x416c696365 is converted to the UTF-8 string “Alice”. And 0x15 = 21.

Therefore, the value column can be interpreted as ["Alice", 21].

predecessor group

The first column of the predecessor group is encoded using RLE, which is equivalent to [0x00, 0x00] here. This means that there are no predecessors for these two operations. Therefore, the predecessor group is interpreted as [[], []].

Document encoding

A new actor may start joining the synchronization of a document without any changes. It is inefficient to transmit each change individually. In this scenario, we can serialize and transmit the entire document. Furthermore, serialization of the document is also useful when we intend to persist it in disk storage.

The document is essentially a set of changes. However, when we encode it, we divide it into two distinct tables: the change table and the operation table. The change table stores the metadata related to the changes, whereas the operation table stores all the operations of the document.

The reason why we separate the change table and the operation table is that doing so allows us to better utilize columnar encoding, which results in higher compression rates.

pub(crate) struct Document<'a> {
    bytes: Cow<'a, [u8]>,
    #[allow(dead_code)]
    compressed_bytes: Option<Cow<'a, [u8]>>,
    header: Header,
    actors: Vec<ActorId>,
    heads: Vec<ChangeHash>,
    op_metadata: DocOpColumns,
    op_bytes: Range<usize>,
    change_metadata: DocChangeColumns,
    change_bytes: Range<usize>,
    #[allow(dead_code)]
    head_indices: Vec<u64>,
}

Example

We use an example to illustrate how a document is encoded. Suppose we have the following code:

let mut doc = Automerge::new();
let mut tx1 = doc.transaction();
tx1.put(ROOT, "name", "Bob")?;
tx1.put(ROOT, "age", 21)?;
tx1.commit();

let mut tx2 = doc.transaction();
tx2.put(ROOT, "gender", "male")?;
tx2.commit();
doc.save();

It will be encoded to the byte sequence like this:

856f4a83e7a6f50e009301011013336ec1ed354befa60b3e3f05346028012f2f0a65b40461263a496749d8bb0b0746c234cbddb092e11473861242638a0c07010203021303230240034302560208151121022304340142025605570d800102020002017e020102007e00017f0002077d036167650667656e646572046e616d6503007d02017e0303017d14468601156d616c654c69616e6772756e030001
856f4a834afcae9c008d01011015cb7623f0314fc09773daafcf4138d7016cdffc539c7e02a93ab4f9762fc4466b90fc4134c6662382d067f02d9e9418bf070102030213032302400343025602081511210223043401420256045708800102020002017e020102007e00017f0002077d036167650667656e646572046e616d6503007d02017e0303017d144636156d616c65426f62030001

We can deconstruct it into different parts:

0x856f4a83 // magic
0x4afcae9c // checksum
0x00 // chunk type
0x8d01 // chunk length
0x011015cb7623f0314fc09773daafcf4138d7 // actors
0x016cdffc539c7e02a93ab4f9762fc4466b90fc4134c6662382d067f02d9e9418bf // heads
0x070102030213032302400343025602 // change column metadata
0x081511210223043401420256045708800102 // operation column metadata
0x020002017e020102007e00017f000207 // change column
0x7d036167650667656e646572046e616d6503007d02017e0303017d144636156d616c65426f620300 // operation columns
0x01 // heads index

As most of the fields are similar to change encoding, we will not explain them here. Instead, we will only illustrate the change column metadata , operation column metadata, change column and operation column .

change columns metadata

0x070102030213032302400343025602

Following the uLEB 0x07 = 7 indicating the number of columns, there are 7 pairs of <Column Specification, Column Length>:

Column Specification	Column Length
0x01 = 1, actor	0x02
0x03 = 3, sequence number	0x02
0x13 = 19, maxop	0x03
0x23 = 35, time	0x02
0x40 = 64, dependencies group	0x03
0x43 = 67, dependencies index	0x02
0x56 = 86, extra metadata	0x02

0x02 + 0x02 + 0x03 + 0x02 + 0x03 + 0x02 + 0x02 = 16 bytes.

operation columns metadata

0x081511210223043401420256045708800102

Following the uLEB 0x08 = 8 indicating the number of columns, there are 8 pairs of <Column Specification, Column Length>:

Column Specification	Column Length
0x15 = 21, key string	0x11
0x21 = 33, actor ID	0x02
0x23 = 35, counter	0x04
0x34 = 52, insert	0x01
0x42 = 66, action	0x02
0x56 = 86, value metadata	0x04
0x57 = 87, value	0x08
0x8001 = 128, successor group	0x02

0x11 + 0x02 + 0x04 + 0x01 + 0x02 + 0x04 + 0x08 + 0x02 = 40 bytes

change columns

0x020002017e020102007e00017f000207

actor

The actor column uses RLE. First, the length is 0x02, indicating that the value is repeated twice. Then, the value is 0x00, indicating that the actor column is [0x00, 0x00].

In this case, 0x00 refers to the first actor in the actors list, which is 0x011015cb7623f0314fc09773daafcf4138d7.

sequence number

The sequence number uses RLE. Here, the length is 0x02, indicating that the value is repeated twice. The value is 0x01, which means that the actual value is [0x01, 0x01].

The value part of the sequence number uses delta encoding. Therefore, [0x01, 0x01] is equivalent to [1, 2].

maxop

7e0201

maxop uses RLE. Here, the length is 0x7e, which is equivalent to -2 in 2’s comlement number format. The value is 0x0201, which mean the actual value is [0x02, 0x01].

Since it is delta encoded, this value represents [0x02, 0x03].

time

The length of the RLE is 0x02, indicating that 0x00 is repeated twice. Therefore, time is not recorded here.

dependencies group

7e0001

Dependencies group can be interpreted as [0x00, 0x01]. As a result, the first change has no dependencies, while the second change has one dependency.

dependencies index

7f00

0x7f is equivalent to -1 in RLE encoding, indicating that the value is not repeated. The following value is 0x00. The dependencies index for the two changes is therefore: [[], [0x00]].

operation columns

0x7d036167650667656e646572046e616d6503007d02017e0303017d144636156d616c65426f620300

key string

7d036167650667656e646572046e616d65

0x7d is -3 in 2’s complement, indicating that the value is not repeated. The value part is as follows:

03 represents the length of the first value, so the next three bytes are 0x616765, which corresponds to “age”.
06 represents the length of the second value, so the next six bytes are 0x67656e646572, which corresponds to “gender”.
04 represents the length of the third value, so the next four bytes are 0xe616d65, which corresponds to “name”.

Therefore, the final result is [“age”, “gender”, “name”].

actor ID

Here is RLE encoded, so 0x03 represents repeating 3 times, and the value is 0x00. Therefore, the final decoding result is [0x00, 0x00, 0x00].

counter

7d02017e

The value 0x7d indicates that there is no repetition, and the sequence of values 0x02 0x01 0x7e represents the delta encoding [+2, +1, -2]. The resulting decoded sequence is [2, 3, 1].

insert

0x03 represents 3 times false, which is [false, false, false].

action

0301 represents three times 0x01, which means [0x01, 0x01, 0x01].

According to the Automerge specification, 0x01 as the action value means “set”. Therefore, the action column is [set, set, set].

value metadata

7d144636

0x7d = -3, which means there is no compression used.

Next is the value part, which consists of several 64-bit uLEBs:

14 = 0001 0100, which means type = 0b100 (Signed integer), length = 0b1 = 1 byte.
46 = 0100 0110, which means type = 0b110 (String), length = 0b100 = 4 bytes.
36 = 0011 0110, which means type = 0b110 (String), length = 0b11 = 3 bytes.

Therefore, the entire value metadata column can be interpreted as [<Signed Integer, 1>, <String, 4>, <String, 3>].

value

156d616c65426f62

Signed Integer, 0x15 = 21
String, 0x6d616c65 = “male”
String, 0x426f62 = “Bob”

Therefore, the entire value column can be interpreted as [21, “male”, “Bob”].

successor group

0x0300 means three times 0x00, which is equivalent to [0x00, 0x00, 0x00].

So there is no successor for all the operations. Hence the operation group can be interpreted as [[], [], []].

Reference

[1]. M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski, “Conflict-free replicated data types,” Lecture Notes in Computer Science, pp. 386–400, 2011. doi:10.1007/978-3-642-24550-3_29

[2]. Kleppmann M, Howard H. Byzantine eventual consistency and the fundamental limits of peer-to-peer databases[J]. arXiv preprint arXiv:2012.00472, 2020.

What's behind Automerge? (Part II)

Storage

From operations to changes

Change encoding

Encoding

Change metadata

Example

Change metadata

Operation column metadata

Operation columns

Document encoding

Example

change columns metadata

operation columns metadata

change columns

operation columns

Reference

See Also