Hey Everyone,
I'm thinking about building out a Cap'n Proto implementation in Elm for the fun of it. Thought I'd send an email to this list as suggested and get some feedback on the initial design which I've also linked below. Any thoughts, comments, or concerns are appreciated.
--Prasanth
Another link to the doc in case you missed it: https://docs.google.com/document/d/12qMVyQPOWTXviFKIpjKLXgusKZ95miuRmu9AxacyGOA/edit?usp=sharing
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/155916089026.23375.1007569258138270396%40localhost.localdomain.
Quoting David Renshaw (2019-05-29 21:33:03)
> This has piqued my interest. Which parts of the schema language don't
> map well to Haskell/Elm?
The biggest one is nested namespaces, per discussion. Neither language
has intra-module namespaces, so you either end up doing a bunch of
complex logic to split stuff across multiple modules and still break
dependency cycles (in Haskell; per my earlier message, in Elm you're
just SOL, since mutually recursive modules are just not supported, full
stop), or you deal with long_names_with_underscores (Haskell actually
uses the single quote as a namespace separator). This is a problem
for the Go implementation as well; some of the stuff from sandstorm's
web-session.capnp spits out identifiers that are pushing 100 characters.
(I actually bumped into @glycerine at a meetup just the other day; we
talked about this among other things).
The fact that union field names are scoped to the struct is a bit
awkward, since union tag names are scoped at the module level in
most ML-family languages. More makeshift namespacing.
The lack of a clean separation between unions and structs introduces a
bit of an impedance mismatch as well; if you do things naively you end
up with an awkward situation where *every* sum type is wrapped in a
struct, which is a bit odd since they are used so liberally (and are
normally so lightweight) in these languages. The Haskell implementation
specifically looks for structs which are one big anonymous union so it
can omit the wrapper.
If you have an anonymous union you also need to invent a name for the
field, since you can't actually have "anonymous" fields in records.
For Haskell, there's no way to talk about a record type without giving
it a name, so every group needs an auxiliary type defined. There's not
really anything clearly nicer to do than just name it <Type>'<field> or
such, which makes the long name problem worse. Along similar lines, in
Haskell you end up having to define auxiliary types for parameter and
return types, and without more of a hint the end up being things like
<Type>'<method>'params and <Type>'<method>'results -- a mouthful even
for short type names. I've taken to just always manually giving my
parameter and return arguments names to avoid this kind of compiler
output; the schema is much more verbose, but the call site is much
nicer. None of this section applies to Elm since you can just have
anonymous record types.
I intentionally decided to just not support custom default values for
pointer fields; it gets really awkward because messages can be mutable
or immutable, and you end up needing different implementation strategies
for each type; for immutable messages you can't do what most
implementations do (copy the value in place on first access),
but you
could "follow" the pointer into some constant defined in the generated
code without a copy. But that gets weird because there are functions to
access the underlying message/segment, so you could run into situations
where you've jumped to a whole other message silently.
With mutable
messages you can do the normal thing, but writing code that's generic
over both of these gets really weird. At some point I ended up checking
the schema that ship with capnproto, and with sandstorm, and discovered
that, in >9000 lines of schema source, the feature was used exactly
twice, both to set the default value of a text parameter to the empty
string. So I just said "screw it, this is a waste of time." The plugin
just prints a warning to stderr and ignores the custom default.
I actually have a much longer critique that I think would be worth
writing, including some things that aren't a problem for Haskell
specifically, but cause problems for other languages -- and I am being
bothered to go help with dinner, so I'll leave it at this for now.
-Ian
--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/155918727572.10312.15632533580192568031%40localhost.localdomain.
> One bit of food for thought: you can't exactly mmap() in elm, and even to get from bytes to `Array Int` you have to do some non-trivial unmarshalling. It may make as much sense to just bite the bullet and parse the whole thing (deeply) into an idiomatic data type up front, just like implementations protobufs; by the time you take into account all of Elm's limitations, it's not clear to me how much keeping it in the wire format like the C++ implementation does actually buys you.
> Doing an up front parse solves a lot things, so if you go another route, be clear on why.
This is intended to be a prototype. `Array Int` is an awful data type for this use case, but it works well enough for prototyping. Specifically, for the context of others reading this thread, `Array` in Elm is a tree structure and will not provide reasonable performance. That said, I’m hoping that I can use `elm-bytes` in a better way than being forced to decode it into a full Elm data type – though as I say this, it seems that I’d need some buy-in that I can’t be guaranteed. I may end up with that solution in the long run, but I want to implement one with the double array to start and see if I can convince a few people.
Still, it’s not clear to me why you’d use Cap’n Proto if you’re going to do a full serialization/deserialization. Just use Protobufs at that point. You could argue that this existing for completeness is valuable i.e. you can run capnp on your backend and not be forced to translate into a protobuf on your frontend, but at that point. I’m not sure that this is a good enough reason to write a library like this. Additionally, it’s not like JavaScript doesn’t have more complex capabilities like Uint8Array that Elm could take advantage of.
That said, I’m more or less treating this as immutable data and providing ways of reducing the cost of updates (such as batching updates). Haskell at least has the ST monad for performance. There just isn’t a better way of doing this in Elm as far as I know.
> It would make sense to publish this as a package by itself; it's a nice
> conceptual unit that would be useful as a library for other projects.
Sure, I was thinking the same thing. Just thought that I’d focus on the Capnproto implementation before publishing. It’s fairly separate though, so I’m not worried about separating it out once I’m ready.
> This was a somewhat awkward thing to cover with the Haskell
> implementation; what I ended up doing amounts to a glorified state
> monad:
So the `Struct` type is a glorified state monad. `fields` holds the record that acts as the struct’s definition. I’ve attached an example below that shows how I think this should work. Let me know if that makes sense and feels reasonably ergonomic.
Regarding namespacing in the parallel conversation: I think it’s kind of awful that Haskell records are accessed via functions instead of some scoped operator or the like. Not really useful as a comment, but I thought I’d add my displeasure.
Pointer field defaults: Field defaults in general are not features I feel super great about. Not that I’ve thought about this in horribly great depth, but they seem to be very problematic if they are ever updated – your binaries will read the same bytes as two different structs. I always assumed that’s why they were removed from proto3. They also don’t seem *that* useful as you can handle this on the application layer sufficiently well. I’m curious if others think differently and feel strongly about their inclusion.
getMainPhone :
Struct
AddressBook ->
Struct
PhoneNumber -- assume d : Data exists. This is an `Array (Array Int)` -- Struct -- Outputs: -- -- and is simply a pointer, but I’m not sure how this works exactly. -- -- If I have to, I can always separate d from the struct definition. |
From: David Renshaw <dwre...@gmail.com>
Sent: Thursday, May 30, 2019 5:30 AM
To: Ian Denhardt <i...@zenhack.net>
Cc: prasanth somasundar <Mez...@live.com>; capnproto <capn...@googlegroups.com>
Subject: Re: [capnproto] Cap'n Proto for Elm
Thanks! I wrote some comments inline below.
The biggest one is nested namespaces, per discussion.
The fact that union field names are scoped to the struct is a bit
awkward, since union tag names are scoped at the module level in
most ML-family languages. More makeshift namespacing.
The lack of a clean separation between unions and structs introduces a
bit of an impedance mismatch as well; if you do things naively you end
up with an awkward situation where *every* sum type is wrapped in a
struct, which is a bit odd since they are used so liberally (and are
normally so lightweight) in these languages. The Haskell implementation
specifically looks for structs which are one big anonymous union so it
can omit the wrapper.
For Haskell, there's no way to talk about a record type without giving
it a name, so every group needs an auxiliary type defined.
I intentionally decided to just not support custom default values for
pointer fields;
I actually have a much longer critique that I think would be worth
writing, including some things that aren't a problem for Haskell
specifically, but cause problems for other languages -- and I am being
bothered to go help with dinner, so I'll leave it at this for now.
One bit of food for thought: you can't exactly mmap() in elm,
Pointer field defaults: Field defaults in general are not features I feel super great about. Not that I’ve thought about this in horribly great depth, but they seem to be very problematic if they are ever updated – your binaries will read the same bytes as two different structs. I always assumed that’s why they were removed from proto3. They also don’t seem *that* useful as you can handle this on the application layer sufficiently well. I’m curious if others think differently and feel strongly about their inclusion.
> Actual line of code from Cloudflare Workers:
> case PipelineDef::Stage::Worker::Global::Value::JSON:
> So again, I'm not sure this is a problem specific to certain languages.
> :)
Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):
websession.WebSession_WebSocketStream_sendBytes_Results_Future
using
Global = Pipeline::Stage::Worker::Global; |
Arrays are definitely workable for prototyping, but if you switch over
to a flat buffer representation in the future your current design for
updates will start being O(length of segment) instead of O(log(length of
segment)), so that's something to keep in mind.
One possible design, which I think I'd do for the Haskell implementation
if I were to start from scratch: don't support in-place modifications.
Have encode and decode. This way your read support can do the obvious
thing with buffers, and your write support can be an 'Encoder' type
which is a wrapper around Bytes.Encoder + some metadata about addresses.
> Still, it's not clear to me why you'd use Cap'n Proto if you're going
> to do a full serialization/deserialization. Just use Protobufs at that
> point. You could argue that this existing for completeness is valuable
> i.e. you can run capnp on your backend and not be forced to translate
> into a protobuf on your frontend, but at that point. I'm not sure that
> this is a good enough reason to write a library like this.
Fair enough. I'd say for RPC (which you've said you're not shooting for,
so maybe moot in this case) or if you've got some existing system you
want to talk to; it would be neat to have this for writing sandstorm
apps.
The salient difference in behavior here is that the traversal limit is
part of the struct, rather than the message, so if you have a branching
structure (like a tree), it can't really protect you, since it isn't
shared across branches.
I don't see an ergonomic way to address this.
Proto3's motivation for removing defaults, as I understand it, is that the designers of Go very much wanted for Protobufs to be represented as raw structs. Go does not have a concept of constructors for raw structs; they are simply zero-initialized. Instead of improving their language, they asked for changes to Protobuf.
Re: "mmap()" in elm in another of your messages, this is why I suggested that Mezuzza get in touch with Evan early
That's pretty bad in both cases. However, in C++, you get the benefit of using declarations on namespaces and classes, so if you wished, you could do something like:> Actual line of code from Cloudflare Workers:
> case PipelineDef::Stage::Worker::Global::Value::JSON:
> So again, I'm not sure this is a problem specific to certain languages.
> :)
Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):
websession.WebSession_WebSocketStream_sendBytes_Results_Future
using Global = Pipeline::Stage::Worker::Global;
case Global::Value::JSON:
I can say from experience writing quite a bit of C++ code that consumed protobufs that this was the only thing that made it look sane. Unfortunately, if we use function name based namespacing (if you want to call it that), we cannot get this benefit easily as you'd have to rename all the functions consistently and on an ad-hoc basis.
With Elm and Haskell, I believe the only layer of namespacing that's provided is the module name which is why both Ian and I gravitated to it initially.
Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):
websession.WebSession_WebSocketStream_sendBytes_Results_Future
With Elm and Haskell, I believe the only layer of namespacing that's provided is the module name which is why both Ian and I gravitated to it initially.Are you saying Elm and Haskell don't provide any way to declare aliases? That seems surprising to me.
Note that Prasanth's example aliases a module that's at an intermediate point in the nested hierarchy. I doubt that Haskell would let you do something like:type Global = Pipeline'Stage'Worker'Global;
...
case Global'Value'JSON: ...case Global'Value'NUMBER: ...
Because you're giving an alias to the *type*, not the variant tag.
Haskell has a PatternSynonyms extension that I've not used heavily, but
Elm doesn't give you anything to work with here.
Oh duh, good point... When the namespace is flattened, aliases only shorten the specific declaration aliased and not all of its nested declarations.
Hey guys,
I've updated the doc with a new write API suggestion. The gist is that writes never happen to the byte array. Instead, we build a virtual proto on top of the byte array that's accessed first for any modifications. It's built using `Maybe` all the way down. `Nothing` implies that there are no changes from the byte array. When you transmit/convert it into bytes, the framework performs a "merge" operation before sending over the wire. We will also provide this merge operation to advanced users who wish to manually do this compaction at some points.