Fwd: New Version Notification for draft-toomim-httpbis-versions-00.txt

16 views
Skip to first unread message

Michael Toomim

unread,
Jul 15, 2024, 9:26:41 PM (9 days ago) Jul 15
to HTTP Working Group, Braid

Hi everyone in HTTP!

Last fall we solicited feedback on the Braid State Synchronization proposal [draft, slides], which I'd summarize as:

"We're enthusiastic about the general work, but the proposal is too high-level. Break the spec up into multiple independent specs, and work bottom-up. Focus on concrete 'bits-on-the-wire'."

So I'm breaking the spec up, and have drafted up the first chunk for you. I would very much like your review on:
Versioning of HTTP Resources
draft-toomim-httpbis-versions
https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00
Versioning is necessary for state synchronization—and occurs in a range of HTTP systems:
  • Caching
  • Archiving
  • Version Control
  • Collaborative Editing
Today, HTTP has resource versions in the Last-Modified and ETag headers, and sometimes embeds versions in URLs, like with WebDAV. Each of these options serves some needs, but also has specific limitations. An improved general approach is proposed, which provides new features, that could enable cool new applications, such as incrementally-updated RSS feeds, and could simplify existing specifications, such as resumeable uploads, and history compression in OT/CRDT algorithms.

I would love to know if people find this work interesting. I think we could improve performance, interoperability, and be one step closer to having Google Docs power within HTTP URLs.

Michael

-------- Forwarded Message --------
Subject: New Version Notification for draft-toomim-httpbis-versions-00.txt
Date: Mon, 08 Jul 2024 11:02:11 -0700
From: interne...@ietf.org
To: Michael Toomim <too...@gmail.com>


A new version of Internet-Draft draft-toomim-httpbis-versions-00.txt has been
successfully submitted by Michael Toomim and posted to the
IETF repository.

Name: draft-toomim-httpbis-versions
Revision: 00
Title: HTTP Resource Versioning
Date: 2024-07-08
Group: Individual Submission
Pages: 19
URL: https://www.ietf.org/archive/id/draft-toomim-httpbis-versions-00.txt
Status: https://datatracker.ietf.org/doc/draft-toomim-httpbis-versions/
HTMLized: https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions


Abstract:

HTTP resources change over time. Each change to a resource creates a
new "version" of its state. HTTP systems often need a way to
identify, read, write, navigate, and/or merge these versions, in
order to implement cache consistency, create history archives, settle
race conditions, request incremental updates to resources, interpret
incremental updates to versions, or implement distributed
collaborative editing algorithms.

This document analyzes existing methods of versioning in HTTP,
highlights limitations, and sketches a more general versioning
approach that can enable new use-cases for HTTP.



The IETF Secretariat


Rory Hewitt

unread,
Jul 17, 2024, 5:56:19 PM (7 days ago) Jul 17
to Michael Toomim, HTTP Working Group, Braid
Hey Michael,

A few thoughts...

First, I agree that the concept of versioning hasn't been thought about enough, and this is definitely a 'good idea (TM)'.

However, I have a few concerns:

1.1.2 Versioning with ETag

Because ETags are, by definition, unformatted, while it's true to say that you often can't rely on them to establish a version, that's entirely dependent on the format chosen by the user. An ETag *could* validly be specified as a date: 

    ETag: "Sat, 6 Jul 2024 07:28:00 GMT"

or as a version number:

    ETag: "v1.0.2"

or as a random string:

    ETag: "Michael is cool"

IOW, it's totally possible for a site that cares about versioning to use a format that specifies a version number. I recognize this isn't *necessarily* the case, but it helps to be clear here. It should be noted that many web servers that include the creation of ETags natively (e.g. Apache) include an effective version as part of the ETag.

Likewise ETags don't *have* to be sensitive to encoding - there's nothing to stop a server from sending the exact same ETag for two differently-encoded copies of the same underlying resource. It's just that they typically do.

None of this is to say that ETags are better or worse than you describe - just to say that they *can* be better than they are.

2.3 Version and Parents headers

You state that the Parents header can include multiple parents (parents, grandparents, great-grandparents?) and provide an example:

    Parents: "ajtva12kid", "cmdpvkpll2"

and then say "Any version can be recreated by first merging its parents, and then applying the its update onto that merger." (Nit: additional "the" in this sentence). However, you also say that the order of the values in a Parents header makes no difference.

Maybe I'm missing something, but in this scenario, how could that work? Using your example above, here are two possible scenarios:

* Version "ajtva12kid" is earlier. Version "cmdpvkpll2" is later and contains an additional section of HTML
* Version "ajtva12kid" is earlier and contains a section of HTML which is removed in the later "cmdpvkpll2" version

If you merge the two parent versions, then does the outcome (onto which you will apply the update) include that section of HTML?

I guess it just makes sense to me to have the order in the Parents have some meaning - whether oldest first or last. Or you could specify that both Version and Parent values must be integers.

2.4.3 PUT a new version

This seems like it could lead to either race conditions or some other issue with duplicate Version values. Surely it's better to have the client submit a new version of a resource (passing the Parents header but *not* passing the Version header) and have the server, which is presumably the prime source of versioning truth, calculate a version (perhaps after retrieving other PUT requests from other clients) and return that value in the Version response header?

I see you discuss this later with the Current-Version header, so perhaps you covered this and my old eyes missed it.

Rory

Michael Toomim

unread,
Jul 22, 2024, 2:45:02 PM (2 days ago) Jul 22
to HTTP Working Group, Braid, Peter van Hardenberg

We've got divergent discussion threads that I'm merging together.

First, Peter Van Hardenberg (of Ink & Switch, Local-First, and Automerge) wrote this initial review of the draft. He's cc'd, and we can respond in this thread.

------------------------
-- Peter Van Hardenberg: --
------------------------

Hi Michael,

I had a quick look at the spec and gave some thought to whether we'd want to adopt it. I think right now it has quite a lot of per-version overhead, and viewing this through a local-first lens, one can imagine having to publish a large number of versions each as separate PUT calls. You might want to consider supporting ranges for PUT in a single message.

Overall, our goals appear to differ from what you're proposing here so this feedback may not be particularly important. My sense is that the expected granularity of changes for Braid is relatively large and that the frequency is relatively long -- on par with a changed HTML form submission, perhaps. We spend quite a lot of our time thinking about optimizing updates for potentially thousands of edits and trying to minimize the number of round trips required to synchronize state in both directions. You mention that the design intends to be optimizable but I didn't see much in the text that clarified how. 

One other observation is that this spec does not appear to prioritize retention of history: 
>      - If the Parents header is absent, the server SHOULD return a
>      single response, containing the requested version of the resource
>      in its body, with the Version response header set to the same
>      version.
This design may centralize the system, as clients default to receiving "flattened" versions of resources and thus may not be able to merge changes from other sources.

Last, have you considered specifying some kind of signature / validation feature? If clients are applying patches iteratively, it might help for them to be able to validate that they're in the expected state either before or after applying a patch.

All the best,
-p

Michael Toomim

unread,
Jul 22, 2024, 2:45:32 PM (2 days ago) Jul 22
to HTTP Working Group, Braid, Pierre Chapuis

I'm also merging Pierre Chapuis' comments into this thread. Pierre is cc'd, and we can respond here.

------------------
-- Pierre Chapuis: --
------------------

Hello everyone,

this email got me interested. I have been silently following the progress on Braid for a while. I have worked with various data replication and synchronization techniques (including CRDTs and others) and a HTTP standard for resumable feeds is something I have wanted for a long time, to support use cases similar to HTTP Feeds.

Here is a small observation I have from reading the draft. In "4. Version-Type Header", regarding the vector-clock type, did you consider the alternative of just using parents instead of the complex dedicated version format, where each Parent would be of the form "agentidX: counterX"? If the data is modified, the Version response header could potentially be used for the current "server" node.

For instance, considering a CRDT implementation, if the "client" Bob is at {alice: 2, bob: 2, charlie: 3} and the "server" Alice is at {alice: 3, bob: 2, charlie: 4}, and Bob submits a new version, the request headers could be:

Version: "bob: 3"
Parents: "alice: 2", "bob: 2", "charlie: 3"

and the response headers could would be:

Parents: "alice: 3", "bob: 3" , "charlie: 4"

Best,

-- 
Pierre Chapuis

On 7/15/24 6:26 PM, Michael Toomim wrote:

Michael Toomim

unread,
Jul 22, 2024, 3:41:08 PM (2 days ago) Jul 22
to HTTP Working Group, Braid, Peter van Hardenberg

Peter, thank you for your interest! I'm excited that you are bringing up performance for discussion! There's a lot to say on that, and I give an overview below:

== Compression & Performance ==

First, let me correct a big misinterpretation— this work absolutely prioritizes high-performance, realtime data synchronization. It should support thousands of mutations per second. Our implementations are higher-performance than Automerge, for instance. I regularly work today with a doc composed of 110,000 edits. It loads instantly, thanks to some great Version-Types we've designed.

The Version-Type (in the proposed Version-Type header) is the way you get performance increases. The key to performance is managing history growth. You manage that by finding a pattern in history, and then compressing or ignoring history. You can express those patterns as a Version-Type spec. (There's a robust theory behind this called Time Machines.)

I apologize that this wasn't clear in the draft -00. I thought this would be an advanced feature that people wouldn't comment on for a bit — but am pleasantly surprised to hear your interest in it! I will be adding more clarity to the spec on Version-Types, and already have begun doing so in github:

https://github.com/braid-org/braid-spec/blob/master/draft-toomim-httpbis-versions-01.txt#L885

I'd also encourage you to check out this sketch of how to bake RLE into HTTP Header Compression:

https://braid.org/meeting-69/header-compression
https://braid.org/video/https://invisiblecollege.s3.us-west-1.amazonaws.com/braid-meeting-69.mp4#4166

In any case, keep in mind that at this stage, we need to know only whether there is interest in this area of work — not whether this particular spec meets your needs. If we adopt this work into the HTTP WG, we will get a chance to change or rewrite any part of the spec. This spec is just a starting point to get discussion going. So think of this as a problem statement rather than a solution statement.

== PUTs ==

As for PUTs, I suspect you might be thinking about HTTP/1.0 where each PUT might require a new TCP connection with its own TLS handshake. But keep in mind that with HTTP/2 and 3, all HTTP semantics are expressed in binary, and a PUT is usually just a single packet! This is just as efficient as any hand-rolled protocol you have, and it has the advantage of being interoperable with all of HTTP.

== History Retention ==

This versioning model supports Time Machines— the beauty of which is that peers become free to independently choose how much history to store. An archival peer can store the full history. A light client can store just the latest version (see the amazing Simpleton client, which needs zero history).

So each peer can choose how much history to store. If a peer doesn't have enough history to merge an edit, it can simply request that history from another peer. In this draft, you do so by requesting a GET with both Version and Parents headers specified.

== Signatures & Validation ==

This is out of scope for this proposal on versions. However, (a) there are some Version-Types that double as signatures. When this happens, it can be specified by authoring a Version-Type spec to articulate the new constraint. And (b) this is a generally important area of work that I encourage.

Cheers!

Michael

Michael Toomim

unread,
Jul 22, 2024, 6:30:29 PM (2 days ago) Jul 22
to Rory Hewitt, HTTP Working Group, Braid

Rory, thanks for these excellent thoughts! It's exciting to see other people digging into the versioning problem with us. :)

Responses:

== Versioning with ETag ==

You make a good point that ETag headers, like the proposed Version header, are opaque strings that can be formatted to express additional information if we want to. This is true for both ETag and Version:

ETag: "Sat, 6 Jul 2024 07:28:00 GMT"

Version: "Sat, 6 Jul 2024 07:28:00 GMT"

ETag: "v1.0.2"
Version: "v1.0.2"

We propose articulating the structure of these version ids using a Version-Type header. You could, for instance, use "Version-Type: date" for the first example, and "Version-Type: semver" for the second.

The main problem with ETag, though, is that it marks *unique content* rather than *unique time*. If you mutate the state of the resource from "foo" to "bar" and then back to "foo", you'll revert to the same ETag, even though this is at a different point in time. This breaks collaborative editing algorithms.

Finally, I'll note that your claim that ETags don't have to be sensitive to content-encoding is only true for *weak* ETags. Strong ETags must change whenever the byte sequence of the response body changes. This means they should be sensitive to content-encoding. RFC9110 is also explicit that they depend on content-type:

> A strong validator might change for reasons other than a change to the representation data, such as when a semantically significant part of the representation metadata is changed (e.g., Content-Type)
https://datatracker.ietf.org/doc/html/rfc9110#section-8.8.1

Consider the case where a user edits a markdown resource:

PUT /foo
Content-Type: text/markdown
Version: "mike-99"

# This is a markdown file

Hello world!

And the server then shares this as HTML:

GET /foo
Accept: application/html


HTTP/1.1 200 OK
Content-Type: application/html
Version: "mike-99"

<html>
  <body>
    <h1>This is a markdown file</h1>
    <p>Hello world!</p>
  </body>
</html>

Using the Version header, we're able to express that these are two representations of the resource at the same point in time. You can't do this with a strong ETag.

== Version and Parents headers ==

I think there's been a miscommunication here. The reason there are multiple version IDs in the Parents header is for edits that happen *in parallel*, not for edits that happen in sequence. This is to represent a version DAG:

                  a  <-- oldest version
                 / \
                b   c
                 \ /
                  d  <-- current version

In this example, the current version "d" would have:

Parents: "b", "c"

This is not allowed:

Parents: "d", "b"

Because of this language in the spec:

For any two version IDs A and B that are specified in a Version or
Parents header, A cannot be a descendent of B or vice versa.  The
ordering of version IDs within the header carries no meaning.

Good question!

== Client-generated Version IDs on PUT ==

Yes, there would be a problem if two clients generate the same version IDs for two different PUTs. Then the versions would not be unique!

However, requiring the server to generate versions is only one possible solution— and is a solution that requires a server. We also want to support distributed p2p systems, which don't have servers.

In these systems, it's quite common for clients to generate version IDs. There are two common ways to solve this problem:

  1. Use a large random hash space so that collisions are extremely unlikely. This works well enough for git, for instance.
  2. Each client gets a unique ID, possibly by coordinating with a server, and then versions are constructed by concatenating "<client-id>:<counter>" for each client.

Does this all make sense?

Again, good questions, and I am glad to see this interest in the topic! I think we can do a lot with it!

Michael

Michael Toomim

unread,
Jul 22, 2024, 7:49:41 PM (2 days ago) Jul 22
to HTTP Working Group, Braid, Peter van Hardenberg

Peter, I just wrote up an explicit example of how to compress four PUTs into 7 bytes. Check out the new section 5.1 here:

https://github.com/braid-org/braid-spec/blob/master/draft-toomim-httpbis-versions-01.txt#L945

These four puts compress down to 0.0146% of their original size, at least in theory. Note that said compression scheme isn't fully specified in this draft — the focus of this draft is just to gather interest in working on a versioning system that makes such compression possible. The actual compression schemes would be future work.

Pierre Chapuis

unread,
Jul 22, 2024, 11:04:23 PM (2 days ago) Jul 22
to Michael Toomim, Rory Hewitt, HTTP Working Group, Braid
Hello Michael,

regarding the "version and parents headers" ordering issue Rory mentioned, I don't think he was talking about the case where one version descends the other one.

The fact that you say this has very strong implications:

> Any version can be recreated by first merging its parents, and then applying the its update onto that merger. 

It either means there cannot be conflicts between parents - or in other words that conflict resolution is deterministic, commutative *and* associative (like CRDTs), or that updates must always contain the conflict resolution of their parents like Git.

That last solution also means updates can be rejected by the server if its history is incoherent, and comes with its own issues. The way Git works is that conflict resolution is always performed with human intervention on pull, not on push.

I know Braid has answers to this (Merge Types) and you are trying to break up the spec here, but it is not surprising that if you have a spec that says "versions can have several parents and you can merge them" people are going to wonder how.

-- 
Pierre Chapuis
--
You received this message because you are subscribed to the Google Groups "Braid" group.
To unsubscribe from this group and stop receiving emails from it, send an email to braid-http+...@googlegroups.com.

Pierre Chapuis

unread,
Jul 22, 2024, 11:12:12 PM (2 days ago) Jul 22
to Michael Toomim, Rory Hewitt, HTTP Working Group, Braid
I sent this too fast, I should have edited out the part where I said this is what Rory meant. I think he did talk about the case (that is prohibited) where one version descends the other one.

But I think the rest of my comment still holds: specifying multiple concurrent parents without specifying merges and conflict resolution is a bit weird.

-- 
Pierre Chapuis

Rory Hewitt

unread,
Jul 23, 2024, 1:14:09 AM (yesterday) Jul 23
to Pierre Chapuis, Michael Toomim, HTTP Working Group, Braid
Hey Pierre, 

Actually, I kinda WAS talking about my (mis)understanding that the Parents header could contain both parents and grandparents... 

That being said, if the Parents header can ONLY contain direct parents, that seems like a (possibly significant) limitation.

Would it not be an improvement to allow the header to contain a list of ancestors back to whatever level the server feels is appropriate or retains information, complete with level information (ancestor level): 

Parents: "parent1","parent 2":1; "grandparent":2;"greatgrandparent1","greatgrandparent2","greatgrandparent3";3

This indicates two parents (level 1), a single grandparent (level 2) and 3 great grandparents (level 3).

This could be compared with a similar Parents header for another object to determine where differences may be found, and how far back.

Maybe this is getting too far into the weeds - this was, as I noted, based on my misunderstanding, which Pierre obviously understands is a possibility. 

I guess my primary point is that in finding a balance between brevity and flexibility, a design that is able to specify detailed information is better, even if that detailed information is often elided or ignored. 

With these fairly 'generic' header names like Version and Parents, the ability to use them to (in theory) 'build' a history of a file and compare with a later, earlier or 'sibling' file send very useful...

But I defer to the smarter minds here - I am a mere tinkerer and may well have gotten too deep too early.

Rory

Marius Kleidl

unread,
Jul 23, 2024, 4:45:45 AM (yesterday) Jul 23
to Michael Toomim, HTTP Working Group, Braid, Peter van Hardenberg
Hi Michael,

talking about performance, I am curious how it would perform in a real-time, collaborative editing process (similar to Google Docs or the note taker tool during the IETF meeting). To facilitate the real-time aspects of the editing experience, would the client have to send a PUT request after every few keystrokes, so that the changes appear quickly on the peers' screens? Sending these requests is comparatively cheap for the client, especially with HTTP/2 and HTTP/3, but potentially more costly for the server, which has to perform authentication checks for each request and then load the resource's state from some storage. If many requests are sent in short succession, this can induce a higher load on the server. A stateful connection, like with WebSockets, in contrast to stateless HTTP requests could reuse the loaded and checked state - although such a method likely has other caveats attached.

Overall my question is whether you think this draft is suitable to deliver such real-time experiences in an efficient manner? 

Best regards
Marius Kleidl

Michael Toomim

unread,
Jul 23, 2024, 3:25:21 PM (yesterday) Jul 23
to Marius Kleidl, HTTP Working Group, Braid, Peter van Hardenberg

Yes, it works great for collaborative editing. I use it every day in production. It's very fast. We send a PUT per keystroke. I should show you a demo. It's real. :)

It's not true that an HTTP PUT induces more load on the server than a WebSocket message. They are equivalent. Consider that both H2 and WebSocket are TCP streams that stay persistently open. The only difference between these two streams is how the data is formatted. They don't impact how/when the server loads the resource from disk into ram. It's true that HTTP requests often contain a session ID in a cookie on each request, whereas a WebSocket might only send that when the user logs in/out, but that header gets compressed down with H2 header compression and isn't a significant performance problem.

Perhaps you're thinking about old-style threaded web servers? Those have a lot of overhead per request, because a 4mb OS thread has to be allocated to each request. But those don't support persistent connections (like WebSockets) at all. That's why everyone's moved to evented servers, like nodejs, which make persistent connections cheap, whether formatted as a WebSocket message stream or a H2 message stream.

Ben Schwartz

unread,
Jul 23, 2024, 11:57:29 PM (20 hours ago) Jul 23
to Michael Toomim, Marius Kleidl, HTTP Working Group, Braid, Peter van Hardenberg
HTTP requests tend to run through a lot more middleware than application-layer messages like WebSocket.  Every request is likely to be checked for compliance with various Web Application Firewall policies, passed through various caching gateways, logged to various request logs, etc.

I would recommend thinking carefully about whether the using individual HTTP requests, instead of an application-layer stream of some kind, is semantically correct and useful.

--Ben

From: Michael Toomim <too...@gmail.com>
Sent: Tuesday, July 23, 2024 3:25 PM
To: Marius Kleidl <mar...@transloadit.com>
Cc: HTTP Working Group <ietf-h...@w3.org>; Braid <braid...@googlegroups.com>; Peter van Hardenberg <p...@pvh.ca>
Subject: Re: Review of draft-toomim-httpbis-versions-00
 
Reply all
Reply to author
Forward
0 new messages