Hi everyone in HTTP!
Last fall we solicited feedback on the Braid State Synchronization proposal [draft, slides], which I'd summarize as:
"We're enthusiastic about the general work, but the proposal is too high-level. Break the spec up into multiple independent specs, and work bottom-up. Focus on concrete 'bits-on-the-wire'."
Versioning of HTTP Resources
draft-toomim-httpbis-versions
https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00
Subject: | New Version Notification for draft-toomim-httpbis-versions-00.txt |
---|---|
Date: | Mon, 08 Jul 2024 11:02:11 -0700 |
From: | interne...@ietf.org |
To: | Michael Toomim <too...@gmail.com> |
We've got divergent discussion threads that I'm merging together.
First, Peter Van Hardenberg (of Ink & Switch, Local-First, and Automerge) wrote this initial review of the draft. He's cc'd, and we can respond in this thread.
------------------------
-- Peter Van Hardenberg: --
------------------------
Hi Michael,
I'm also merging Pierre Chapuis' comments into this thread.
Pierre is cc'd, and we can respond here.
------------------
-- Pierre Chapuis: --
------------------
Hello everyone,
Peter, thank you for your interest! I'm excited that you are
bringing up performance for discussion! There's a lot to say on
that, and I give an overview below:
== Compression & Performance ==
First, let me correct a big misinterpretation— this work
absolutely prioritizes high-performance, realtime
data synchronization. It should support thousands of mutations per
second. Our implementations are higher-performance
than Automerge, for instance. I regularly work today with a doc
composed of 110,000 edits. It loads instantly, thanks to some
great Version-Types we've designed.
The Version-Type (in the proposed Version-Type header) is the way
you get performance increases. The key to performance is managing
history growth. You manage that by finding a pattern in history,
and then compressing or ignoring history. You can express those
patterns as a Version-Type spec. (There's a robust theory behind
this called Time Machines.)
I apologize that this wasn't clear in the draft -00. I thought this would be an advanced feature that people wouldn't comment on for a bit — but am pleasantly surprised to hear your interest in it! I will be adding more clarity to the spec on Version-Types, and already have begun doing so in github:
https://github.com/braid-org/braid-spec/blob/master/draft-toomim-httpbis-versions-01.txt#L885
I'd also encourage you to check out this sketch of how to bake RLE into HTTP Header Compression:
https://braid.org/meeting-69/header-compression
https://braid.org/video/https://invisiblecollege.s3.us-west-1.amazonaws.com/braid-meeting-69.mp4#4166
In any case, keep in mind that at this stage, we need to know
only whether there is interest in this area of work — not
whether this particular spec meets your needs. If we adopt this
work into the HTTP WG, we will get a chance to change or rewrite
any part of the spec. This spec is just a starting point to get
discussion going. So think of this as a problem statement rather
than a solution statement.
== PUTs ==
As for PUTs, I suspect you might be thinking about HTTP/1.0 where each PUT might require a new TCP connection with its own TLS handshake. But keep in mind that with HTTP/2 and 3, all HTTP semantics are expressed in binary, and a PUT is usually just a single packet! This is just as efficient as any hand-rolled protocol you have, and it has the advantage of being interoperable with all of HTTP.
== History Retention ==
This versioning model supports Time Machines— the beauty of which is that peers become free to independently choose how much history to store. An archival peer can store the full history. A light client can store just the latest version (see the amazing Simpleton client, which needs zero history).
So each peer can choose how much history to store. If a peer doesn't have enough history to merge an edit, it can simply request that history from another peer. In this draft, you do so by requesting a GET with both Version and Parents headers specified.
== Signatures & Validation ==
This is out of scope for this proposal on versions. However, (a)
there are some Version-Types that double as signatures. When this
happens, it can be specified by authoring a Version-Type spec to
articulate the new constraint. And (b) this is a generally
important area of work that I encourage.
Cheers!
Michael
Rory, thanks for these excellent thoughts! It's exciting to see other people digging into the versioning problem with us. :)
Responses:
== Versioning with ETag ==
You make a good point that ETag headers, like the proposed
Version header, are opaque strings that can be formatted to
express additional information if we want to. This is true for
both ETag and Version:
ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
Version: "Sat, 6 Jul 2024 07:28:00 GMT"
ETag: "v1.0.2"
Version: "v1.0.2"
We propose articulating the structure of these version ids using a Version-Type header. You could, for instance, use "Version-Type: date" for the first example, and "Version-Type: semver" for the second.
The main problem with ETag, though, is that it marks *unique content* rather than *unique time*. If you mutate the state of the resource from "foo" to "bar" and then back to "foo", you'll revert to the same ETag, even though this is at a different point in time. This breaks collaborative editing algorithms.
Finally, I'll note that your claim that ETags don't have to be sensitive to content-encoding is only true for *weak* ETags. Strong ETags must change whenever the byte sequence of the response body changes. This means they should be sensitive to content-encoding. RFC9110 is also explicit that they depend on content-type:
> A strong validator might change for reasons other than a change to the representation data, such as when a semantically significant part of the representation metadata is changed (e.g., Content-Type)
https://datatracker.ietf.org/doc/html/rfc9110#section-8.8.1
Consider the case where a user edits a markdown resource:
PUT /foo
Content-Type: text/markdown
Version: "mike-99"
# This is a markdown file
Hello world!
And the server then shares this as HTML:
GET /foo
Accept: application/html
HTTP/1.1 200 OK
Content-Type: application/html
Version: "mike-99"
<html>
<body>
<h1>This is a markdown file</h1>
<p>Hello world!</p>
</body>
</html>
Using the Version header, we're able to express that these are two representations of the resource at the same point in time. You can't do this with a strong ETag.
== Version and Parents headers ==
I think there's been a miscommunication here. The reason there are multiple version IDs in the Parents header is for edits that happen *in parallel*, not for edits that happen in sequence. This is to represent a version DAG:
a <-- oldest version
/ \
b c
\ /
d <-- current version
In this example, the current version "d" would have:
Parents: "b", "c"
This is not allowed:
Parents: "d", "b"
Because of this language in the spec:
For any two version IDs A and B that are specified in a Version or
Parents header, A cannot be a descendent of B or vice versa. The
ordering of version IDs within the header carries no meaning.
Good question!
== Client-generated Version IDs on PUT ==
Yes, there would be a problem if two clients generate the same version IDs for two different PUTs. Then the versions would not be unique!
However, requiring the server to generate versions is only one possible solution— and is a solution that requires a server. We also want to support distributed p2p systems, which don't have servers.
In these systems, it's quite common for clients to generate version IDs. There are two common ways to solve this problem:
Does this all make sense?
Again, good questions, and I am glad to see this interest in the topic! I think we can do a lot with it!
Michael
Peter, I just wrote up an explicit example of how to compress four PUTs into 7 bytes. Check out the new section 5.1 here:
https://github.com/braid-org/braid-spec/blob/master/draft-toomim-httpbis-versions-01.txt#L945
These four puts compress down to 0.0146% of their original size,
at least in theory. Note that said compression scheme isn't fully
specified in this draft — the focus of this draft is just to
gather interest in working on a versioning system that makes such
compression possible. The actual compression schemes would be
future work.
--You received this message because you are subscribed to the Google Groups "Braid" group.To unsubscribe from this group and stop receiving emails from it, send an email to braid-http+...@googlegroups.com.To view this discussion on the web visit https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/braid-http/fdce4502-6888-4c27-bb14-c239e19e8771%40app.fastmail.com.
Yes, it works great for collaborative editing. I use it every day
in production. It's very fast. We send a PUT per keystroke. I
should show you a demo. It's real. :)
It's not true that an HTTP PUT induces more load on the server
than a WebSocket message. They are equivalent. Consider that both
H2 and WebSocket are TCP streams that stay persistently open. The
only difference between these two streams is how the data is
formatted. They don't impact how/when the server loads the
resource from disk into ram. It's true that HTTP requests often
contain a session ID in a cookie on each request, whereas a
WebSocket might only send that when the user logs in/out, but that
header gets compressed down with H2 header compression and isn't a
significant performance problem.
Perhaps you're thinking about old-style threaded web servers?
Those have a lot of overhead per request, because a 4mb OS thread
has to be allocated to each request. But those don't support
persistent connections (like WebSockets) at all. That's why
everyone's moved to evented servers, like nodejs, which make
persistent connections cheap, whether formatted as a WebSocket
message stream or a H2 message stream.