Questions about Collapsing Time Machines

19 views
Skip to first unread message

Michiel de Jong

unread,
Jun 10, 2025, 4:35:24 AMJun 10
to Braid
Hi Michael,

I absolutely loved your presentation yesterday, and I have a lot of questions left that I still wanted to ask.

> The term "Time Machine" is not yet popular in Computer Science; but it should be.

Yes, agree! :)

I think I understand what you mean by 'time machine' and I think I'm slowly starting to grasp what you mean by 'collapsing'. Here are my questions:

* Calculating the Operation by looking at a diff
In the "Any CRDT can be extended into a CTM" section you describe how you can determine what the Operation was just by looking at a diff between two Snapshots. How can that work deterministically?

For instance, if the old Snapshot of a number register is `4` and the new Snapshot is `5`, then what was the Operation? Was it 'increment'?
Or was it 'set to 5'? Or even 'times 2, then subtract 3'? There are many possible Operations that would create that same diff, right? Is it out of scope which one you pick there?

A trivial but useless Collapsing Time Machine would be one that translates every operation to "throw away what is there and pick a new value at random". If that is how it defines its operations, then its Travel() function is always correct. The trick is then to define a system that satisfies the requirements of being a CTM and that additionally translates operations in a 'nice' way. Right?

* Definition of Operation
I think an 'Operation' defines the diff between snapshots S and S', so e.g. "DELETE: 4, INSERT: 5",
but in some sense it's also more generalisable, so referring to a patch that when applied to snapshot S, gives S', e.g. "INCREMENT" or "SET TO 5" or ...
In other settings I think of the Operation as a user gesture (a term used in browser technology) so that would be for instance 'the user clicked the checkbox once'. I think gesture is very close to patch, but defined in terms of UI and not in terms of data structure.
And in other settings I think of an Operation as an event in the version history, so what git calls a commit, namely not only the gesture or the patch or the diff, but also the fact that it actually happened at that point in the version history of at least one peer.
What is the proper definition of Operation? Gesture, patch, diff, event, or all of the above?

> A op-based CRDT takes an operation in the Observed History format, and collapses it into an internal format.

Maybe this internal format is the generalisation from diff to patch?
And the read() turns the 'patch' back into a new and different 'diff', one that applies the patch in its new position.

* Understanding the "CRDT that provides OT, and is implemented by doing OT over an internal CRDT" phrase
I love how it's a memorable phrase, but I still haven't fully wrapped my head around this. What is the internal CRDT and how does it relate to the outer CRDT?

The definition combines functional requirements ('that provides...') with implementation requirements ('by doing...'), so that makes me think whether there would be other ways to provide the functionality, or whether the implementation is part of the nice properties. Probably the latter, since the inner CRDT solves the TP2 puzzle, right?

So would TTF qualify here?

> OT is the property of being able to Travel() operations
> CRDT is the property of being able to View() a consistent snapshot

Love that definition! So there is an outer CRDT and an internal one, both define a View() function.
Presumably the outer one defines a user-facing one (e.g. without tombstones), and the internal CRDT provides a more complete history-aware view (e.g. with tombstones).

Requiring the Travel() function is implying a lot. If an op can travel, then that means that this op can be meaningfully generalised, and made specific again in another location in the version history, i.e. conceptually there is some translation from diff to patch to diff.
Maybe the inner CRDT defines this translation?

Are there examples of CRDTs that do not provide OT? I think you can pretty trivially do OT over any internal CRDT, right? As long as the internal CRDT is a 'real' CRDT (with commutativity of operations), not just your more abstract 'being able to View() a consistent snapshot' definition, right?

* Guaranteeing deterministic merging
Putting aside the history pruning for a moment, I think the revolutionary insight here (and that is already soaking through the Braid project in general over the years of course) is that if you run a CTM and I run a different one, we can send each other operations in a format that is independent of your CTM's internals and of my CTM's internals, and this is great for interoperability.

Different peers have freedom to implement their merge algorithm, as long as they are functionally equivalent, right?

So for instance, if you implement 'increment wins' and I implement 'decrement wins', we would arrive at different end results. That's not what we want. We do have to specify the merge type like braid-http does. But we don't need to specify anything more specific than that.

I think 'each peer runs a CTM' is a great description of what braid-http already allows us to do.

Super exciting work!


Cheers,
Michiel de Jong

Michael Toomim

unread,
Jun 28, 2025, 4:53:49 AMJun 28
to Michiel de Jong, Braid

These are great questions! Thank you very much!

The answers weren't clear in the talk, which I've been rewriting to make clearer— but let me address these questions now, since they're so astute, and provide the right mindset for clarifying the revision!

On 6/10/25 3:35 AM, Michiel de Jong wrote:
Hi Michael,

I absolutely loved your presentation yesterday, and I have a lot of questions left that I still wanted to ask.

> The term "Time Machine" is not yet popular in Computer Science; but it should be.

Yes, agree! :)

I think I understand what you mean by 'time machine' and I think I'm slowly starting to grasp what you mean by 'collapsing'. Here are my questions:

* Definition of Operation
I think an 'Operation' defines the diff between snapshots S and S', so e.g. "DELETE: 4, INSERT: 5",
but in some sense it's also more generalisable, so referring to a patch that when applied to snapshot S, gives S', e.g. "INCREMENT" or "SET TO 5" or ...
In other settings I think of the Operation as a user gesture (a term used in browser technology) so that would be for instance 'the user clicked the checkbox once'. I think gesture is very close to patch, but defined in terms of UI and not in terms of data structure.
And in other settings I think of an Operation as an event in the version history, so what git calls a commit, namely not only the gesture or the patch or the diff, but also the fact that it actually happened at that point in the version history of at least one peer.
What is the proper definition of Operation? Gesture, patch, diff, event, or all of the above?


Let's start with definitions!

  • A Snapshot is the value of the state at a point in time
  • An Operation op(snap) = snap' is a function that mutates state, taking as input one Snapshot and producing another.
    • An operation can be represented as a diff.
    • A patch is a diff that is intended to be used as an operation to mutate state.
  • We write a sequence of operations [op1, op2, op3, ...]
  • We can apply an operation to a snapshot: snapop = snap'
  • An Event is specific time (in the partial order) where which a mutation operation can occur.
    • Events have a causal order.
      • Each event has a set of ancestor(event) → {event, event...} that had been seen at the time and place that the event occurred.
    • The frontier of the ancestor set (the leaves of the DAG) is called the parents(event) of the event.
  • A Version is a subgraph of events, specified as the frontier of a cut of the DAG.
    • A merger of two versions is specified as a union: VersionA ∪ VersionB

In this universe of terms, a "UI Gesture" would typically be referred to as an Event, which implies an operation.

For instance, your UI code for gestures probably has an "event handler" that recognize gestures, and runs an "operation" in response that mutates state, like this:

document.addEventListener("swipeleft", (event) => state.history.pop())

In this case:

  • A swipeleft event
  • Runs the operation state.history.pop()
  • Which patches state.history, applying a diff of replace [-1:-0] = [], or "replace the last item of the history with the empty array"
  • This changes the history from the snapshot [a, b, c, d, e] to the snapshot [a, b, c, d]

These are all concepts in Observed History. We haven't done any "collapsing" yet. Note that everything in the above refers to only the blink of time in between two snapshots. When we "collapse", we'll merge spans of history into a blooped data structure. This brings us to your next question:

> A op-based CRDT takes an operation in the Observed History format, and collapses it into an internal format.

Maybe this internal format is the generalisation from diff to patch?
And the read() turns the 'patch' back into a new and different 'diff', one that applies the patch in its new position.

A diff, which can be used to patch, is a type of Operation, and operations can be externally observed. The internal format is not externally observable; it is created internally by collapsing a set of operations across time in a way particular to a particular CRDT.

However, you are right that after collapsing an operation at a point in time, the Travel() function is equivalent to uncollapsing the operation at a different point in time! It transforms the operation into a different operation, which could be described as a diff or patch, that produces the same result when applied at the new point in time.

As for read()— that produces a Snapshot, not an Operation, so it does not Travel() an op.

Now let's get into the details of how we can create a CTM for any CRDT by doing a diff over two snapshots to travel an operation:

* Calculating the Operation by looking at a diff
In the "Any CRDT can be extended into a CTM" section you describe how you can determine what the Operation was just by looking at a diff between two Snapshots. How can that work deterministically?

For instance, if the old Snapshot of a number register is `4` and the new Snapshot is `5`, then what was the Operation? Was it 'increment'?
Or was it 'set to 5'? Or even 'times 2, then subtract 3'? There are many possible Operations that would create that same diff, right? Is it out of scope which one you pick there?


Ah, yes, this is a key issue!

You're right that a diff is ambiguous. There are many different diffs that can represent any Snap1 → Snap2. However, that doesn't matter for how we're constructing a CTM from a CRDT. All we need is for the operation to transition us from Snap1 to Snap2. It's not going to e.g. get transformed by another operation, or get merged with another operation. In fact, we might throw away the transformed operation after applying it to this snapshot. And if we do later find some other operation that occurred in parallel... we will just collapse this newly-received operation itself into the inner CRDT, and uncollapse it to produce a new op that takes us to Snap3.

Now, remember as well that the diff is not what you would do in a practical implementation. This is just a thought experiment to prove to yourself that you *can* produce a transformed op for *any* CRDT, as long as you can diff any two snapshots produced by that CRDT, and produce an op from that diff. But in a real CTM, we'd do something more efficient than the diff. What we'd do is different for each CRDT... so we're talking about the diff right now just to wrap our minds around the general concept.

This also makes clear precisely what property we need of our Travel(op, from, to) → op' function in order to guarantee consistency during a merge. Consider that legacy Operational Transform systems specified the TP1 and TP2 properties to try to guarantee consistency of merges. What these properties actually specify, though, is symmetry of operations when applied and transformed in different orders:

TP1:  [op1, T(op1, op2)]) ≡ [op2, T(op2, op1)]          (where ≡ means "is equivalent to")
TP2:  T(op3, [op1, T(op2, op1)])] = T(op3, [op2, T(op1, op2)])

Symmetry is nice. But remember that this T(op1, op2) → op1' function transforms ops by ops, which (A) doesn't work with 3+ peers, because it loses information when we transform ops by transformed ops; and (B) this op-by-op transform was created—in the end—just as an instrumental function within an iterative approach to get what we really want: which is to Travel an op from one version to another version. And a CTM's inner CRDT gives us the ability to Travel across versions directly through wormholes of collapsed history, rather than trying to trace an externally observable path by one-op-at-a-time through the observable history.

So instead of bothering with TP1 and TP2, we say that a CTM's Travel() function has to satisfy a TP0 property, which is defined in terms of snapshots and versions, rather than symmetric operation chains:

TP0: View(edited ∪ to) = View(to) • Travel(op, from, to)

This is the property you need to consistently update a local snapshot with incoming remote edits that you are Travel()ing to your local version.

TP0 says that if you are merging an incoming op that took a remote peer from version from to version edited, and your peer's state is currently at version to, then it should be true that the snapshot you read from the CRDT when merging the versions edited & to (ie. View(edited ∪ to)) should be equal to the result of applying the traveled op to the snapshot of the receiving peer's state at version to (ie. View(to) • Travel(op, from, to)).

This is trivially satisfied by computing the travel as the diff between View(to) and View(edited ∪ to).

For more on TP0, see meeting-77 where we introduced the TP0 property. (Spoiler alert: We also proposed a TP½ property... but TP½ turned out to be equivalent to TP2.)


A trivial but useless Collapsing Time Machine would be one that translates every operation to "throw away what is there and pick a new value at random". If that is how it defines its operations, then its Travel() function is always correct. The trick is then to define a system that satisfies the requirements of being a CTM and that additionally translates operations in a 'nice' way. Right?

Specifically, the CTM's value has to produce the same value as its inner CRDT. The inner CRDT defines the value of View(version) for both it, and for the outer CRDT. The outer CRDT, however, can be implemented by holding a snapshot in memory, and simply instantiating (and forgetting) various subsets of collapsed history, using the inner CRDT's definition of collapse, in order to Travel() incoming edits and apply them to its current snapshot.


* Understanding the "CRDT that provides OT, and is implemented by doing OT over an internal CRDT" phrase
I love how it's a memorable phrase, but I still haven't fully wrapped my head around this. What is the internal CRDT and how does it relate to the outer CRDT?

In our prior thought experiment to "create a CTM from any CRDT", we use the CRDT's view and collapse functions (called the "query" and "prepare update" methods in shapiro et al) as the inner CRDT's view and collapse functions. We then create a Time Machine around it, which itself happens to be an outer CRDT, by holding a current snapshot in memory, along with all observed history it's received, and implementing a Travel() function that instantiates the inner CRDT twice for the subsets of history at to and at edited ∪ to and runs View() on them and diffs the result to produce the traveled op' to apply to our local snapshot.

This CTM is a Time Machine, that happens to also qualify as an outer CRDT with a no-op collapse function. Rather than collapsing observable edits and sending the collapsed version over the network, this "outer CRDT" simply forwards along the observable history directly to other "outer CRDT" peers. Those peers implement their own inner CRDT (meeting the spec of the original CRDT's merge-type, as defined by its collapse and view functions) in order to travel incoming edits into their local times and apply them to their local snapshots.

We can define the outer CRDT formally in terms of Shapiro et al's op-based CRDT definition (see section 2.4) like this:

S: {history, snapshot} where history ⊂ Events is the observed history received so far, and where each e ∈ Events has e = (eventid, op, parents); and snapshot is the current snapshot.
S0: {history: {}, snapshot: initial_snapshot}
q: The function () → S.snapshot.
t: The identity function e → e.
u: Adds the new event e to S.history. Travels the event's op to its current version, and applies the result to its current snapshot.
P: The precondition predicate (e) → parents(e) ⊂ S.history. Ensures we have ancestors of e so we can interpret e.

Note the identify function for "t" aka "prepare update". This is what we call the "collapse()" function in Time Machine lingo, and you can see that this outer CRDT has a no-op (aka "identity") collapse function.

The definition combines functional requirements ('that provides...') with implementation requirements ('by doing...'), so that makes me think whether there would be other ways to provide the functionality, or whether the implementation is part of the nice properties. Probably the latter, since the inner CRDT solves the TP2 puzzle, right?

So would TTF qualify here?


Yes! Good point. You're right to point out that although a Collapsing Time Machine implements Travel() with an inner CRDT (aka a "collapse"), we don't necessarily need to collapse if all we want is consistency. In fact, there's a larger class of Consistent Time Machines that stay consistent via other means than an inner CRDT's collapse.

You're right to think that TTF might be a candidate; however TTF actually *is* a CRDT. It tries to look like a typical OT algorithm that iterates through observable history; but the states it iterates through are not what the user observes — they encode past edits (traces of tombstones) into the spatial offsets. So it's actually tracing through an internal format. It's collapsing tombstones into its definition of space itself.

In fact, I suspect that the only way to handle 2+ peers in general is via some collapse of history. Maybe I'm wrong. It would be fun to try to prove this one way or the other.

But in general, yeah, you could conceivably use any algorithm that satisfies TP2 to implement the Travel() function, and still have a Consistent Time Machine. Or you could even use a TP1 system, if you know you will only have <= 2 peers exchanging edits at a time. As long as you can implement Travel(op, from, to)  op' and satisfy TP0, for whatever domain and range of from and to your system will encounter in practice, you're good.

This knowledge of which domain and range of from and to you encounter in practice is key to pruning. In a later article I'll show how you can define, for any particular merge-type, which spans of history you need to keep around (and which you can prune) in order to implement View(version) and Travel(op, from, to) for any desired domain and range of version, from, and to. This is how we can produce interoperable specs for peers to know which spans of history they need to keep around in order to work with each other's edits.

> OT is the property of being able to Travel() operations
> CRDT is the property of being able to View() a consistent snapshot

Love that definition! So there is an outer CRDT and an internal one, both define a View() function.
Presumably the outer one defines a user-facing one (e.g. without tombstones), and the internal CRDT provides a more complete history-aware view (e.g. with tombstones).

Requiring the Travel() function is implying a lot. If an op can travel, then that means that this op can be meaningfully generalised, and made specific again in another location in the version history, i.e. conceptually there is some translation from diff to patch to diff.
Maybe the inner CRDT defines this translation?

Yes, the inner CRDT defines a method to collapse time into a wormhole, that lets you time-travel operations through it, while guaranteeing consistency with all other Time Machines that create wormholes following the same CRDT specification— the same merge-type!


Are there examples of CRDTs that do not provide OT? I think you can pretty trivially do OT over any internal CRDT, right? As long as the internal CRDT is a 'real' CRDT (with commutativity of operations), not just your more abstract 'being able to View() a consistent snapshot' definition, right?


Yes, specifically we say that you can implement OT on any differentiable CRDT, which we define as any CRDT where you can diff the viewed snapshot between two states, as in diff(snap1, snap2) → op, where snap1 • op = snap2. If you can define that diff function, then it's possible to implement Travel(op, from, to) → op', and therefore to do OT over it.

But again, the next step is to actually do so efficiently, and this is different for each CRDT.

* Guaranteeing deterministic merging
Putting aside the history pruning for a moment, I think the revolutionary insight here (and that is already soaking through the Braid project in general over the years of course) is that if you run a CTM and I run a different one, we can send each other operations in a format that is independent of your CTM's internals and of my CTM's internals, and this is great for interoperability.

Different peers have freedom to implement their merge algorithm, as long as they are functionally equivalent, right?

Yes! And in fact, this freedom to re-implement our internals while still interoperating *is* what allows us to prune history without introducing central control, because it frees each peer up to make its own decisions of what history to drop without coordination!


So for instance, if you implement 'increment wins' and I implement 'decrement wins', we would arrive at different end results. That's not what we want. We do have to specify the merge type like braid-http does. But we don't need to specify anything more specific than that.

Precisely correct!


I think 'each peer runs a CTM' is a great description of what braid-http already allows us to do.

Excellent conclusion!


Super exciting work!


Cheers,
Michiel de Jong


Thank you, Michiel! It sounds like we're succeeding at communicating the CTM idea now. It's been super helpful articulating my responses to these questions — clear snapshots into your mind — as I revise the article to make it clearer.

Excelsior! See you in the future.

Michael

Michiel de Jong

unread,
Jun 30, 2025, 7:07:36 AMJun 30
to Michael Toomim, Braid
Thanks a lot for your elaborate answers, I think I get it now! This does sound like an exciting new step forward for this research field.
The distributed history pruning you mention also sounds super exciting, looking forward to that too.

Cheers,
Michiel
Reply all
Reply to author
Forward
0 new messages