Cap'n Proto for Elm

prasanth somasundar

unread,

May 29, 2019, 5:00:49 AM5/29/19

to capn...@googlegroups.com

Hey Everyone,

I'm thinking about building out a Cap'n Proto implementation in Elm for the fun of it. Thought I'd send an email to this list as suggested and get some feedback on the initial design which I've also linked below. Any thoughts, comments, or concerns are appreciated.

--Prasanth

Another link to the doc in case you missed it: https://docs.google.com/document/d/12qMVyQPOWTXviFKIpjKLXgusKZ95miuRmu9AxacyGOA/edit?usp=sharing

Ian Denhardt

unread,

May 29, 2019, 4:19:00 PM5/29/19

to prasanth somasundar, capn...@googlegroups.com

Oops, forgot to cc the list.

Quoting Ian Denhardt (2019-05-29 16:13:32)
> Neat. Some feedback:
>
> * Re: why not the Haskell implementation, it's definitely stable enough
> and complete enough to write a compiler plugin. At the serialization
> layer there are only a couple things missing, and they shouldn't be a
> problem for this use case. I would say don't like the stability
> disclaimers in the README scare you away (note: I am the author of the
> Haskell implementation).
>
> You may even be able to cannibalize parts of the Haskell code
> generator itself; especially the first bits of the translation process
> would look very similar if I were writing an elm backend. Feel free to
> pick my brain about it if you try to do this.
>
> * Re: Single module vs. Splitting things up for namespacing, note that
> Elm does not allow cyclic dependencies between modules, so this
> probably just won't work, since you can have cycles within a single
> capnp schema.
>
> Also, from having had the same initial instinct with the Haskell
> implementation (where it is possible to break cycles with .hs-boot
> files), I will say that I found the massive pile of imports wasn't any
> better than Long_Names_With_Underscores -- especially since you still
> have to distinguish identifiers at call sites, so you don't even save
> much typing outside of the imports. From an implementer's perspective
> it was also much simpler to do it in one file.
>
> I ultimately ended up going with the one-module-per-capnp-file
> approach, and my usual advice re: long names is to tell people to just
> not use nested namespaces in their schema file. It's always possible
> to change a schema to avoid these in a wire-compatible way.
>
> * More generally, if you try to make *every* feature of the schema
> language map to Elm in an ergonomic way, you will be in for a rough
> time. I suggest optimizing for "well-written" schema, and making peace
> with the fact that certain constructs may generate unpleasant APIs.
> This applies especially if the user can easily just avoid those
> features.
>
> * Re: built-in types, note that in addition to run-time checks, you will
> also have to do some special logic for 64-bit arithmetic, since Elm
> uses JavaScript's numbers internally, which are double-precision
> floating point, and thus can only faithfully represent integers up to
> 53 bits.
>
> * Re: interfaces, producing a warning strikes me probably unnecessary;
> if the library clearly marked serialization-only, users will not
> expect it to pay attention to interfaces.
>
> * It's not entirely clear to me how you plan to implement the traversal
> limit with the API you've described.
>
> * Note that as of 0.19 Elm no longer allows single quotes in
> identifiers, so you'll need to do something else for union names.
> Fortunately, underscores are also illegal in capnp names, but legal in
> Elm.
>
> Hope this is helpful. I'll keep an eye on this; interested to see where
> it goes.
>
> -Ian
>
> Quoting prasanth somasundar (2019-05-29 05:00:46)

> > Hey Everyone,
> >
> > I'm thinking about building out a Cap'n Proto implementation in Elm for
> > the fun of it. Thought I'd send an email to this list as suggested and

> > get some feedback on [1]the initial design which I've also linked

> > below. Any thoughts, comments, or concerns are appreciated.
> >
> >
> > --Prasanth
> >
> >
> > Another link to the doc in case you missed it:
> > https://docs.google.com/document/d/12qMVyQPOWTXviFKIpjKLXgusKZ95miuRmu9
> > AxacyGOA/edit?usp=sharing
> >
> >

> > --
> > You received this message because you are subscribed to the Google
> > Groups "Cap'n Proto" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to [2]capnproto+...@googlegroups.com.
> > Visit this group at [3]https://groups.google.com/group/capnproto.
> > To view this discussion on the web visit
> > [4]https://groups.google.com/d/msgid/capnproto/BYAPR11MB259933537902814
> > B8C0041F4C51F0%40BYAPR11MB2599.namprd11.prod.outlook.com.
> >
> > Verweise
> >
> > 1. https://docs.google.com/document/d/12qMVyQPOWTXviFKIpjKLXgusKZ95miuRmu9AxacyGOA/edit?usp=sharing
> > 2. mailto:capnproto+...@googlegroups.com
> > 3. https://groups.google.com/group/capnproto
> > 4. https://groups.google.com/d/msgid/capnproto/BYAPR11MB259933537902814B8C0041F4C51F0%40BYAPR11MB2599.namprd11.prod.outlook.com?utm_medium=email&utm_source=footer

David Renshaw

unread,

May 29, 2019, 9:33:15 PM5/29/19

to Ian Denhardt, prasanth somasundar, capnproto

> I suggest optimizing for "well-written" schema, and making peace
> with the fact that certain constructs may generate unpleasant APIs.

This has piqued my interest. Which parts of the schema language don't map well to Haskell/Elm?

- David

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/155916089026.23375.1007569258138270396%40localhost.localdomain.

prasanth somasundar

unread,

May 29, 2019, 10:08:30 PM5/29/19

to Ian Denhardt, capn...@googlegroups.com

Hey Ian,
Thanks for the feedback. This is really helpful.

* Haskell vs C++
I'll think on this one. I might want to just implement this in C++ for personal learning. I don't think there's a clearly optimal call here.

* Single vs Multi-module export.
This is really helpful. I hadn't thought deeply about cyclic messages. I'll almost certainly build a single file export. One reason that I really like single file per struct is that it respects how the language would build this data. This means that for a user of the language, there's less cognitive dissonance when using Capnp.Foo.Bar.getPerson than when using Capnp.foo_bar_getPerson. The former fits into the model of the language uses, the latter does not.

That said, there's a somewhat complex way that I *could* resolve this. Arbitrarily pick one to make it polymorphic:

module A.Base

type A_Base a = A_Base a

module B

type B = B A

module A

import A.Base
import B

type A = A.Base.A_Base B

I doubt it's truly worth the complexity, but it is possible.

* Not supporting every feature/Optimize for well written schema
Very interesting point. I'm not generally good at making those sorts of concessions, but I probably should do that more often. I'll think about where I can relax some of the requirements to allow a better API. If you have a suggestion on any place in the API, let me know.

* 64-bit ints/Numeric library
Yup, I’m aware. It's not particularly fun code to write (I find it boring at least), but I think I'll have to build it either way.

* Warning for interfaces
I agree that it's probably not necessary, but given that I'll be checking pointer type anyway, I can log this with a simple `Debug.log` statement. I just don't feel comfortable swallowing information like that. Also, if someone doesn't want the message, they can just not send a capability over the network.

* Traversal Limit
So I forgot to add a field to the struct data type. It was missing a `traversalLimit` field. This field is set on a call to `Capnp.init`. Once it's set, we keep track of the traversal depth by incrementing the `currentTraversalDistance` field. If a call to `Capnp.get` or similar function ever encounters a situation where `currentTraversalDistance > traversalLimit`, we return `Err TraversalLimitExceeded`. If they succeed, the function returns an `Ok` with the `currentTraversalDistance` updated.

One thing that isn't very clear to me is whether this is a limit per call to `Capnp.get` or a limit per message (you can only traverse 64 Mib in depth before you give up). I've interpreted this as the latter for this implementation. The former is almost trivial to implement within the `get` itself.

* Underscores in names
Sounds good. I really wish I could find a BNF notation for these things...

Thanks again for the advice. I'll probably get hacking sometimes this weekend and see where this goes.

-- Prasanth

> Hey Everyone,
>
> I'm thinking about building out a Cap'n Proto implementation in Elm for
> the fun of it. Thought I'd send an email to this list as suggested and

> get some feedback on [1]the initial design which I've also linked

> below. Any thoughts, comments, or concerns are appreciated.
>
>
> --Prasanth
>
>
> Another link to the doc in case you missed it:

Ian Denhardt

unread,

May 29, 2019, 11:38:51 PM5/29/19

to David Renshaw, prasanth somasundar, capnproto

Quoting David Renshaw (2019-05-29 21:33:03)

> This has piqued my interest. Which parts of the schema language don't
> map well to Haskell/Elm?

The biggest one is nested namespaces, per discussion. Neither language
has intra-module namespaces, so you either end up doing a bunch of
complex logic to split stuff across multiple modules and still break
dependency cycles (in Haskell; per my earlier message, in Elm you're
just SOL, since mutually recursive modules are just not supported, full
stop), or you deal with long_names_with_underscores (Haskell actually
uses the single quote as a namespace separator). This is a problem
for the Go implementation as well; some of the stuff from sandstorm's
web-session.capnp spits out identifiers that are pushing 100 characters.
(I actually bumped into @glycerine at a meetup just the other day; we
talked about this among other things).

The fact that union field names are scoped to the struct is a bit
awkward, since union tag names are scoped at the module level in
most ML-family languages. More makeshift namespacing.

The lack of a clean separation between unions and structs introduces a
bit of an impedance mismatch as well; if you do things naively you end
up with an awkward situation where *every* sum type is wrapped in a
struct, which is a bit odd since they are used so liberally (and are
normally so lightweight) in these languages. The Haskell implementation
specifically looks for structs which are one big anonymous union so it
can omit the wrapper.

If you have an anonymous union you also need to invent a name for the
field, since you can't actually have "anonymous" fields in records.

For Haskell, there's no way to talk about a record type without giving
it a name, so every group needs an auxiliary type defined. There's not
really anything clearly nicer to do than just name it <Type>'<field> or
such, which makes the long name problem worse. Along similar lines, in
Haskell you end up having to define auxiliary types for parameter and
return types, and without more of a hint the end up being things like
<Type>'<method>'params and <Type>'<method>'results -- a mouthful even
for short type names. I've taken to just always manually giving my
parameter and return arguments names to avoid this kind of compiler
output; the schema is much more verbose, but the call site is much
nicer. None of this section applies to Elm since you can just have
anonymous record types.

I intentionally decided to just not support custom default values for
pointer fields; it gets really awkward because messages can be mutable
or immutable, and you end up needing different implementation strategies
for each type; for immutable messages you can't do what most
implementations do (copy the value in place on first access), but you
could "follow" the pointer into some constant defined in the generated
code without a copy. But that gets weird because there are functions to
access the underlying message/segment, so you could run into situations
where you've jumped to a whole other message silently. With mutable
messages you can do the normal thing, but writing code that's generic
over both of these gets really weird. At some point I ended up checking
the schema that ship with capnproto, and with sandstorm, and discovered
that, in >9000 lines of schema source, the feature was used exactly
twice, both to set the default value of a text parameter to the empty
string. So I just said "screw it, this is a waste of time." The plugin
just prints a warning to stderr and ignores the custom default.

I actually have a much longer critique that I think would be worth
writing, including some things that aren't a problem for Haskell
specifically, but cause problems for other languages -- and I am being
bothered to go help with dinner, so I'll leave it at this for now.

-Ian

Ian Denhardt

unread,

May 30, 2019, 2:27:09 AM5/30/19

to prasanth somasundar, capn...@googlegroups.com

Quoting prasanth somasundar (2019-05-29 22:08:27)

> Thanks for the feedback. This is really helpful.

You're welcome.

> * Not supporting every feature/Optimize for well written schema
> Very interesting point. I'm not generally good at making those sorts of concessions, but I probably should do that more often. I'll think about where I can relax some of the requirements to allow a better API. If you have a suggestion on any place in the API, let me know.

My advice: pin down some use cases that you care about. Once you have
actual requirements, those can help drive the design.

One bit of food for thought: you can't exactly mmap() in elm, and even
to get from bytes to `Array Int` you have to do some non-trivial
unmarshalling. It may make as much sense to just bite the bullet and
parse the whole thing (deeply) into an idiomatic data type up front,
just like implementations protobufs; by the time you take into account
all of Elm's limitations, it's not clear to me how much keeping it in
the wire format like the C++ implementation does actually buys you.
Doing an up front parse solves a lot things, so if you go another
route, be clear on why.

> * 64-bit ints/Numeric library
> Yup, I’m aware. It's not particularly fun code to write (I find it boring at least), but I think I'll have to build it either way.

It would make sense to publish this as a package by itself; it's a nice
conceptual unit that would be useful as a library for other projects.

> * Traversal Limit
> So I forgot to add a field to the struct data type. It was missing a `traversalLimit` field. This field is set on a call to `Capnp.init`. Once it's set, we keep track of the traversal depth by incrementing the `currentTraversalDistance` field. If a call to `Capnp.get` or similar function ever encounters a situation where `currentTraversalDistance > traversalLimit`, we return `Err TraversalLimitExceeded`. If they succeed, the function returns an `Ok` with the `currentTraversalDistance` updated.
>
> One thing that isn't very clear to me is whether this is a limit per call to `Capnp.get` or a limit per message (you can only traverse 64 Mib in depth before you give up). I've interpreted this as the latter for this implementation. The former is almost trivial to implement within the `get` itself.

Most implementations track this per message. It's also not just about
limiting depth but amplification attacks; consider:

struct Tree {
union {
leaf @0 :Int32;
branch :group {
left @1 :Tree;
right @2 :Tree;
}
}
}

It's possible to encode a tree using the above where parts of the
structure are shared. Even if it's finite and has reasonable depth,
traversing it could still take exponential time in the size of the
message, if enough nodes are shared.

This was a somewhat awkward thing to cover with the Haskell
implementation; what I ended up doing amounts to a glorified state
monad:

https://hackage.haskell.org/package/capnp-0.4.0.0/docs/Capnp-TraversalLimit.html

But I don't think this would translate especially easily to elm, at
least not in a way that was at all ergonomic.

-Ian

David Renshaw

unread,

May 30, 2019, 8:30:06 AM5/30/19

to Ian Denhardt, prasanth somasundar, capnproto

Thanks! I wrote some comments inline below.

On Wed, May 29, 2019 at 11:38 PM Ian Denhardt <i...@zenhack.net> wrote:

Quoting David Renshaw (2019-05-29 21:33:03)

> This has piqued my interest. Which parts of the schema language don't
> map well to Haskell/Elm?

The biggest one is nested namespaces, per discussion. Neither language
has intra-module namespaces, so you either end up doing a bunch of
complex logic to split stuff across multiple modules and still break
dependency cycles (in Haskell; per my earlier message, in Elm you're
just SOL, since mutually recursive modules are just not supported, full
stop), or you deal with long_names_with_underscores (Haskell actually
uses the single quote as a namespace separator). This is a problem
for the Go implementation as well; some of the stuff from sandstorm's
web-session.capnp spits out identifiers that are pushing 100 characters.
(I actually bumped into @glycerine at a meetup just the other day; we
talked about this among other things).

That's unfortunate.

The fact that union field names are scoped to the struct is a bit
awkward, since union tag names are scoped at the module level in
most ML-family languages. More makeshift namespacing.

Sounds like this is awkward mainly because of the previous problem, i.e. Haskell lacks

nested namespaces. With nested namespaces, you would define your union datatype

within the namespace of the enclosing struct, and the tag names would have exactly

the right namespace.

The lack of a clean separation between unions and structs introduces a
bit of an impedance mismatch as well; if you do things naively you end
up with an awkward situation where *every* sum type is wrapped in a
struct, which is a bit odd since they are used so liberally (and are
normally so lightweight) in these languages. The Haskell implementation
specifically looks for structs which are one big anonymous union so it
can omit the wrapper.

If you have an anonymous union you also need to invent a name for the
field, since you can't actually have "anonymous" fields in records.

`which` is the usual name for such a field, as in: https://github.com/capnproto/capnproto/blob/0f368d5781872ffc3e63db54b0ac4a138b0e0a05/c%2B%2B/src/capnp/encoding-test.c%2B%2B#L121

For Haskell, there's no way to talk about a record type without giving
it a name, so every group needs an auxiliary type defined. There's not
really anything clearly nicer to do than just name it <Type>'<field> or
such, which makes the long name problem worse. Along similar lines, in
Haskell you end up having to define auxiliary types for parameter and
return types, and without more of a hint the end up being things like
<Type>'<method>'params and <Type>'<method>'results -- a mouthful even
for short type names. I've taken to just always manually giving my
parameter and return arguments names to avoid this kind of compiler
output; the schema is much more verbose, but the call site is much
nicer. None of this section applies to Elm since you can just have
anonymous record types.

Again, sounds like this is awkward mainly as a consequence of Haskell's lack of nested namespaces.

I intentionally decided to just not support custom default values for
pointer fields; it gets really awkward because messages can be mutable
or immutable, and you end up needing different implementation strategies
for each type; for immutable messages you can't do what most
implementations do (copy the value in place on first access),

Copying into an immutable message would mean mutating it,

so I agree that's not a good way to go.

but you
could "follow" the pointer into some constant defined in the generated
code without a copy. But that gets weird because there are functions to
access the underlying message/segment, so you could run into situations
where you've jumped to a whole other message silently.

Are there reasons that client code needs to use these functions? If not,

is there a way for you to hide them or mark them as internal-use-only?

With mutable
messages you can do the normal thing, but writing code that's generic
over both of these gets really weird. At some point I ended up checking
the schema that ship with capnproto, and with sandstorm, and discovered
that, in >9000 lines of schema source, the feature was used exactly
twice, both to set the default value of a text parameter to the empty
string. So I just said "screw it, this is a waste of time." The plugin
just prints a warning to stderr and ignores the custom default.

For what it's worth, I actually had someone request this feature last month: https://github.com/capnproto/capnproto-rust/issues/127

I'm not sure what their use case is, though.

I actually have a much longer critique that I think would be worth
writing, including some things that aren't a problem for Haskell
specifically, but cause problems for other languages -- and I am being
bothered to go help with dinner, so I'll leave it at this for now.

I'd be eager to read the longer critique!

-Ian

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/155918727572.10312.15632533580192568031%40localhost.localdomain.

prasanth somasundar

unread,

May 30, 2019, 1:50:31 PM5/30/19

to David Renshaw, Ian Denhardt, capnproto

> One bit of food for thought: you can't exactly mmap() in elm, and even to get from bytes to `Array Int` you have to do some non-trivial unmarshalling. It may make as much sense to just bite the bullet and parse the whole thing (deeply) into an idiomatic data type up front, just like implementations protobufs; by the time you take into account all of Elm's limitations, it's not clear to me how much keeping it in the wire format like the C++ implementation does actually buys you.

> Doing an up front parse solves a lot things, so if you go another route, be clear on why.

This is intended to be a prototype. `Array Int` is an awful data type for this use case, but it works well enough for prototyping. Specifically, for the context of others reading this thread, `Array` in Elm is a tree structure and will not provide reasonable performance. That said, I’m hoping that I can use `elm-bytes` in a better way than being forced to decode it into a full Elm data type – though as I say this, it seems that I’d need some buy-in that I can’t be guaranteed. I may end up with that solution in the long run, but I want to implement one with the double array to start and see if I can convince a few people.

Still, it’s not clear to me why you’d use Cap’n Proto if you’re going to do a full serialization/deserialization. Just use Protobufs at that point. You could argue that this existing for completeness is valuable i.e. you can run capnp on your backend and not be forced to translate into a protobuf on your frontend, but at that point. I’m not sure that this is a good enough reason to write a library like this. Additionally, it’s not like JavaScript doesn’t have more complex capabilities like Uint8Array that Elm could take advantage of.

That said, I’m more or less treating this as immutable data and providing ways of reducing the cost of updates (such as batching updates). Haskell at least has the ST monad for performance. There just isn’t a better way of doing this in Elm as far as I know.

> It would make sense to publish this as a package by itself; it's a nice
> conceptual unit that would be useful as a library for other projects.

Sure, I was thinking the same thing. Just thought that I’d focus on the Capnproto implementation before publishing. It’s fairly separate though, so I’m not worried about separating it out once I’m ready.

> This was a somewhat awkward thing to cover with the Haskell
> implementation; what I ended up doing amounts to a glorified state
> monad:

So the `Struct` type is a glorified state monad. `fields` holds the record that acts as the struct’s definition. I’ve attached an example below that shows how I think this should work. Let me know if that makes sense and feels reasonably ergonomic.

Regarding namespacing in the parallel conversation: I think it’s kind of awful that Haskell records are accessed via functions instead of some scoped operator or the like. Not really useful as a comment, but I thought I’d add my displeasure.

Pointer field defaults: Field defaults in general are not features I feel super great about. Not that I’ve thought about this in horribly great depth, but they seem to be very problematic if they are ever updated – your binaries will read the same bytes as two different structs. I always assumed that’s why they were removed from proto3. They also don’t seem *that* useful as you can handle this on the application layer sufficiently well. I’m curious if others think differently and feel strongly about their inclusion.

getMainPhone : Struct AddressBook -> Struct PhoneNumber
getMainPhone s =
let s : Struct AddressBook
in s
    |> Capnp.get .people
    |> Capnp.List.get 0 AddressBook.person
    |> Capnp.get .mainPhone AddressBook.person_phoneNumber

-- assume d : Data exists. This is an `Array (Array Int)`
-- Inputs:

-- Struct
--    { data = d
--    , fields =
--      -- Field AddressBook (Capnp.List.List (StructField Person))
--      { people = ...
--      }
--    , viewOffset = (0, 0)
--    , currentTraversalDistance = 0
--    , traversalLimit = 67108864
--    }

-- Outputs:
-- Struct
-- { -- Data has not been updated. Hopefully, d is not actually copied,

-- -- and is simply a pointer, but I’m not sure how this works exactly.

--      -- If I have to, I can always separate d from the struct definition.
--      data = d
--    , fields =
--      -- Fields have been updated to a PhoneNumber
--      { number = ...
--      , type = ...
--      }
--    , -- View Offset represents the index into the data above.
--      -- Updated as necessary. We assume that the new offset is 40 here.
--      viewOffset = (0, 40)
--    , -- Data traversed so far. Assume that we've only traversed 30 bytes for
--         w/e reason.
--      currentTraversalDistance = 30
--    , traversalLimit = 67108864
--    }

From: David Renshaw <dwre...@gmail.com>
Sent: Thursday, May 30, 2019 5:30 AM
To: Ian Denhardt <i...@zenhack.net>
Cc: prasanth somasundar <Mez...@live.com>; capnproto <capn...@googlegroups.com>
Subject: Re: [capnproto] Cap'n Proto for Elm

Thanks! I wrote some comments inline below.

Kenton Varda

unread,

May 30, 2019, 2:23:36 PM5/30/19

to Ian Denhardt, David Renshaw, prasanth somasundar, capnproto

On Wed, May 29, 2019 at 8:38 PM Ian Denhardt <i...@zenhack.net> wrote:

The biggest one is nested namespaces, per discussion.

If it's any consolation, even though C++ has nested namespaces, the code doesn't end up any less verbose. When you're using the declarations from a capnp file in C++ code, you either need to write out the declaration's full path wherever it is used, or you need to declare a shorter alias and use that. That is to say, I don't think there's much practical difference between "Foo::Bar::Baz" vs. "Foo_Bar_Baz" or "Foo'Bar'Baz" -- in fact, the latter two are technically shorter.

The fact that union field names are scoped to the struct is a bit
awkward, since union tag names are scoped at the module level in
most ML-family languages. More makeshift namespacing.

Actual line of code from Cloudflare Workers:

case PipelineDef::Stage::Worker::Global::Value::JSON:

So again, I'm not sure this is a problem specific to certain languages. :)

The lack of a clean separation between unions and structs introduces a
bit of an impedance mismatch as well; if you do things naively you end
up with an awkward situation where *every* sum type is wrapped in a
struct, which is a bit odd since they are used so liberally (and are
normally so lightweight) in these languages. The Haskell implementation
specifically looks for structs which are one big anonymous union so it
can omit the wrapper.

Hmm. It's unfortunate that this means that if someone adds a non-union field to a struct that previously contained only a union, Haskell code using the protocol will break and probably need major rewrites.

But I do see why you'd want to use the language's built-in variant types if at all possible.

FWIW I feel this is a fundamental flaw in variant types as seen in most functional languages: if you ever discover there's some field that is needed by *all* the variants, you can't add it without completely changing the type and updating every use site.

More generally, I feel like Haskell made many design choices that make code concise and beautiful but difficult to change and evolve. :/

For Haskell, there's no way to talk about a record type without giving
it a name, so every group needs an auxiliary type defined.

FWIW C++ does this too. The type name is formed by capitalizing the first letter of the field name.

I intentionally decided to just not support custom default values for
pointer fields;

I admit that, in practice, I've never had a serious need for default values for pointer fields. They probably were not worth the implementation complexity.

And, admittedly, I knew that they were unlikely to be used much when I designed the language.

My main motivation was just that it feels inconsistent to allow defaults only for primitives but not for pointers. And default values for primitives are used all the time.

But perhaps I should have been more practical here.

I actually have a much longer critique that I think would be worth
writing, including some things that aren't a problem for Haskell
specifically, but cause problems for other languages -- and I am being
bothered to go help with dinner, so I'll leave it at this for now.

Looking forward to more feedback.

-Kenton

Kenton Varda

unread,

May 30, 2019, 2:37:45 PM5/30/19

to Ian Denhardt, prasanth somasundar, capn...@googlegroups.com

On Wed, May 29, 2019 at 11:27 PM Ian Denhardt <i...@zenhack.net> wrote:

One bit of food for thought: you can't exactly mmap() in elm,

That's not as true that you might think.

I'm not familiar with Elm specifically, but as I understand it, it transpiles to JavaScript for execution in the browser.

JavaScript has ArrayBuffer, and it has the File and Blob APIs, which do in fact allow you to mmap a file as an ArrayBuffer, once you've been granted a capability to the file e.g. via the file-open dialog. Admittedly, that's still probably an obscure use case.

An ArrayBuffer can also point into the memory space of a WebAssembly module, and Cap'n Proto may in fact be an excellent way to share complex structures between WASM and JavaScript (or Elm).

With all that said, it certainly makes sense to consider eager parsing as a trade-off. I've been meaning to add an eagerly-parsed mode to C++ ("POCS" support), since it could allow for an easier-to-use API in cases where zero-copy is not important.

-Kenton

Ian Denhardt

unread,

May 30, 2019, 2:44:06 PM5/30/19

to David Renshaw, prasanth somasundar, capnproto

Quoting prasanth somasundar (2019-05-30 13:50:26)

> I'm hoping that I can use `elm-bytes` in a better way than being

> forced to decode it into a full Elm data type - though as I say this,

> it seems that I'd need some buy-in that I can't be guaranteed. I may
> end up with that solution in the long run, but I want to implement one
> with the double array to start and see if I can convince a few people.

I would suggest prodding Evan about this sooner rather than later. He's
definitely not going to add random access support without some
motivating use cases. Also probably useful for planning purposes to know
how receptive he is to the idea.

Arrays are definitely workable for prototyping, but if you switch over
to a flat buffer representation in the future your current design for
updates will start being O(length of segment) instead of O(log(length of
segment)), so that's something to keep in mind.

One possible design, which I think I'd do for the Haskell implementation
if I were to start from scratch: don't support in-place modifications.
Have encode and decode. This way your read support can do the obvious
thing with buffers, and your write support can be an 'Encoder' type
which is a wrapper around Bytes.Encoder + some metadata about addresses.

> Still, it's not clear to me why you'd use Cap'n Proto if you're going
> to do a full serialization/deserialization. Just use Protobufs at that
> point. You could argue that this existing for completeness is valuable
> i.e. you can run capnp on your backend and not be forced to translate
> into a protobuf on your frontend, but at that point. I'm not sure that
> this is a good enough reason to write a library like this.

Fair enough. I'd say for RPC (which you've said you're not shooting for,
so maybe moot in this case) or if you've got some existing system you
want to talk to; it would be neat to have this for writing sandstorm
apps.

> Additionally, it's not like JavaScript doesn't have more complex
> capabilities like Uint8Array that Elm could take advantage of.

Sure, but again poke Evan sooner rather than later.

> Regarding namespacing in the parallel conversation: I think it's kind
> of awful that Haskell records are accessed via functions instead of
> some scoped operator or the like. Not really useful as a comment, but I
> thought I'd add my displeasure.

+1. The Haskell implementation expects folks to use the
DuplicateRecordFields extension, which makes this a bit nicer, but yes,
records in Haskell are garbage.

> Pointer field defaults: Field defaults in general are not features I
> feel super great about. Not that I've thought about this in horribly
> great depth, but they seem to be very problematic if they are ever

> updated - your binaries will read the same bytes as two different

> structs. I always assumed that's why they were removed from proto3.
> They also don't seem *that* useful as you can handle this on the
> application layer sufficiently well. I'm curious if others think
> differently and feel strongly about their inclusion.

I definitely think they don't carry their weight in terms of
implementation cost, especially given how rarely I've seen them used in
the wild. I'll probably write that larger critique in the next week or
two; will link it to this mailing list when I do.

>
>
> getMainPhone : Struct AddressBook -> Struct PhoneNumber
> getMainPhone s =
> let s : Struct AddressBook
> in s
> |> Capnp.get .people
> |> Capnp.List.get 0 AddressBook.person
> |> Capnp.get .mainPhone AddressBook.person_phoneNumber
>
> -- assume d : Data exists. This is an `Array (Array Int)`
> -- Inputs:
>
> -- Struct
> -- { data = d
> -- , fields =
> -- -- Field AddressBook (Capnp.List.List (StructField Person))
> -- { people = ...
> -- }
> -- , viewOffset = (0, 0)
> -- , currentTraversalDistance = 0
> -- , traversalLimit = 67108864
> -- }

The salient difference in behavior here is that the traversal limit is
part of the struct, rather than the message, so if you have a branching
structure (like a tree), it can't really protect you, since it isn't
shared across branches.

I don't see an ergonomic way to address this.

>
> -- Outputs:
> -- Struct
> -- { -- Data has not been updated. Hopefully, d is not actually
> copied,
>
> -- -- and is simply a pointer, but I'm not sure how this works
> exactly.
>
> -- -- If I have to, I can always separate d from the struct
> definition.

It should be a pointer; this is fine.

Kenton Varda

unread,

May 30, 2019, 2:59:22 PM5/30/19

to prasanth somasundar, David Renshaw, Ian Denhardt, capnproto

On Thu, May 30, 2019 at 10:50 AM prasanth somasundar <Mez...@live.com> wrote:

Pointer field defaults: Field defaults in general are not features I feel super great about. Not that I’ve thought about this in horribly great depth, but they seem to be very problematic if they are ever updated – your binaries will read the same bytes as two different structs. I always assumed that’s why they were removed from proto3. They also don’t seem *that* useful as you can handle this on the application layer sufficiently well. I’m curious if others think differently and feel strongly about their inclusion.

Default values for pointers are almost never needed, mainly because it usually makes more sense for the app to check for null and implement a fallback explicitly, rather than try to construct a default that represents the right fallback.

For primitive fields, I use defaults all the time. The purpose of a default is to express how to handle messages from senders who are unaware of the field, e.g. because they were written before the field existed. I find quite often that when I introduce, say, a new boolean field, I set the default to `true`, because it controls a thing that was historically "on" but should now be possible to turn "off". (You could alternatively phrase the field name as a negative, like "disableFoo", but that leads to code containing double-negatives which are really hard to read.)

Proto3's motivation for removing defaults, as I understand it, is that the designers of Go very much wanted for Protobufs to be represented as raw structs. Go does not have a concept of constructors for raw structs; they are simply zero-initialized. Instead of improving their language, they asked for changes to Protobuf. This was also the motivation to get rid of "unknown field retention", which stores unknown fields seen on the wire off to the side so that if the message is serialized again, they can be included. And it was also the motivation to get rid of the ability to distinguish "set to default value" from "unset".

I've beet told that despite all this effort, it was eventually concluded that raw Go structs still were not a good representation for protobufs...

-Kenton

Ian Denhardt

unread,

May 30, 2019, 6:40:56 PM5/30/19

to 'Kenton Varda' via Cap'n Proto, Kenton Varda, David Renshaw, prasanth somasundar, capnproto

Quoting 'Kenton Varda' via Cap'n Proto (2019-05-30 14:22:59)

> Actual line of code from Cloudflare Workers:
> case PipelineDef::Stage::Worker::Global::Value::JSON:
> So again, I'm not sure this is a problem specific to certain languages.
> :)

Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):

websession.WebSession_WebSocketStream_sendBytes_Results_Future

> Hmm. It's unfortunate that this means that if someone adds a non-union
> field to a struct that previously contained only a union, Haskell code
> using the protocol will break and probably need major rewrites.

I don't know about "major rewrites" -- the changes would be fairly
mechanical, and not outside of the realm of something you could write a
tool to mostly automate. You basically just end up having to wrap unwrap
and unwrap the one extra layer everywhere. It would definitely not be
backwards compatible at the source level, but source compatibility
doesn't work super well with capnproto in general.

But I think there's a similar ergonomics vs. extensibility trade-off
with the current situation, where it's not possible to turn a struct
into a union after the fact (See Yaron Minsky's recent mail). You could
tweak things so that there's always a tag field, and client code always
has to switch on it even if the schema doesn't define any other
variants. So you don't have to worry about the consequences of adding a
variant where there wasn't one, but you have to actually match on every
datatype, vs. not needing to worry about adding common fields but having
an extra .union_ or such to every use of a sum type.

I think this is basically the expression problem at work.

> But I do see why you'd want to use the language's built-in variant
> types if at all possible.

> FWIW I feel this is a fundamental flaw in variant types as seen in most
> functional languages: if you ever discover there's some field that is
> needed by *all* the variants, you can't add it without completely
> changing the type and updating every use site.

Yeah, this is always the tension with things like IDLs; it would be nice
if most programming languages had sums and products that distributed
over one another, but given that they don't, you have to contend with
the fact that if you add features like that to the schema language, they
won't map well (see also Go & default values). I tend to favor
optimizing for the call site, which in this case means the programming
language, rather than the schema language.

There are of course exceptions though; having unions is still a big boon
even in languages like Go that don't have proper sum types at all.

> My main motivation was just that it feels inconsistent to allow
> defaults only for primitives but not for pointers. And default values
> for primitives are used all the time.
> But perhaps I should have been more practical here.

There already are enough inconsistencies between how pointer vs.
non-pointer types behave that you have to keep the difference in mind;
given that I don't think you gain much from this little bit of
extra consistency.

Re: "mmap()" in elm in another of your messages, this is why I suggested
that Mezuzza get in touch with Evan early; there's no reason why the
bytes package couldn't provide this functionality, but it currently
doesn't, and the javascript FFI isn't usable for this sort of thing, so
whether or not Evan seems inclined to add the needed support is really
important to where you end up taking the design.

Also, I'd actually love to see (and in the passed toyed with the idea of
writing) an elm generator that spits out idiomatic data types and
encoders & decoders built on Elm's json package; this would be great for
talking to sandstorm's postMessage API. In that case you're not even
touching the capnp wire format.

-Ian

Prasanth Somasundar

unread,

May 31, 2019, 2:10:09 PM5/31/19

to Ian Denhardt, 'Kenton Varda' via Cap'n Proto, Kenton Varda, David Renshaw

>    Actual line of code from Cloudflare Workers:
>    case PipelineDef::Stage::Worker::Global::Value::JSON:
>    So again, I'm not sure this is a problem specific to certain languages.
>    :)

Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):

    websession.WebSession_WebSocketStream_sendBytes_Results_Future

That's pretty bad in both cases. However, in C++, you get the benefit of using declarations on namespaces and classes, so if you wished, you could do something like:

using Global = Pipeline::Stage::Worker::Global;

case Global::Value::JSON:

I can say from experience writing quite a bit of C++ code that consumed protobufs that this was the only thing that made it look sane. Unfortunately, if we use function name based namespacing (if you want to call it that), we cannot get this benefit easily as you'd have to rename all the functions consistently and on an ad-hoc basis.

With Elm and Haskell, I believe the only layer of namespacing that's provided is the module name which is why both Ian and I gravitated to it initially.

Upon further reflection, I think this might even be important enough that I toy with single-module-per-struct for a prototype.

Arrays are definitely workable for prototyping, but if you switch over
to a flat buffer representation in the future your current design for
updates will start being O(length of segment) instead of O(log(length of

segment)), so that's something to keep in mind.

Fair enough.

One possible design, which I think I'd do for the Haskell implementation

if I were to start from scratch: don't support in-place modifications.
Have encode and decode. This way your read support can do the obvious
thing with buffers, and your write support can be an 'Encoder' type

which is a wrapper around Bytes.Encoder + some metadata about addresses.

I was and still am not really sure how to handle writes properly. It doesn't feel super good. I don't think that Elm has any support for mutable state at the moment and I'm not sure what that would look like. The good news for Elm at least is that it runs on an event loop and you'd be able to atomize all changes using a good API surrounding `Cmd` if you're ok with async writes. The problem currently, is that you have one of three options, and none of them are good:

Have capnp live in Javascript land for a good write API, but terrible read API as your reads become async in Elm.
Have capnp live in Elm for a good read API, but terrible write by performance, complexity, or API.
Have capnp live in both Elm and Javascript and deal with synchronization and data duplication.

I guess I'll think further about what I want writes to look like and publish to the Elm discourse. Just for clarity, I did post a quick thread mentioning that I was looking into this, but there was less traction initially - probably for good reason - but I'll post the design there once I'm satisfied with the API. If Evan doesn't respond, then I'll try to prod some other way.

>    Still, it's not clear to me why you'd use Cap'n Proto if you're going
>    to do a full serialization/deserialization. Just use Protobufs at that
>    point. You could argue that this existing for completeness is valuable
>    i.e. you can run capnp on your backend and not be forced to translate
>    into a protobuf on your frontend, but at that point. I'm not sure that
>    this is a good enough reason to write a library like this.

Fair enough. I'd say for RPC (which you've said you're not shooting for,
so maybe moot in this case) or if you've got some existing system you
want to talk to; it would be neat to have this for writing sandstorm

apps.

To be clear, this initial design was just something to get started. I'd like to implement RPC down the line, it's just that I don't want to start by thinking about it or making it the goal.

The salient difference in behavior here is that the traversal limit is
part of the struct, rather than the message, so if you have a branching
structure (like a tree), it can't really protect you, since it isn't
shared across branches.

I don't see an ergonomic way to address this.

That's interesting. So it's clearly different behavior from what the other libraries are using, but unless I'm missing something (which is quite possible), I believe it should still guard against an attack. If the case we're guarding against is a cyclic pointer and therefore infinite recursion, and the client library has no means of creating a custom struct, then the counter is still going to guard against a poorly encoded/adversarial message.

That said, this can still be bad. In the worst case, as you mentioned a tree, where the client has decided to traverse all branches of the message, the depth is going to be bounded by the limit, but the overall message could be traversed for 2^limit data. This can be mitigated by lowering the threshold from 64Mib to 32Mib or even 16Mib. It might even make sense to have separate traversal and message size limits here to allow larger messages with limited traversal depth.

If someone with a better security background could vet that, I'd appreciate it.

Proto3's motivation for removing defaults, as I understand it, is that the designers of Go very much wanted for Protobufs to be represented as raw structs. Go does not have a concept of constructors for raw structs; they are simply zero-initialized. Instead of improving their language, they asked for changes to Protobuf.

BWAHAHAHA 🤦‍♂️

Re: "mmap()" in elm in another of your messages, this is why I suggested that Mezuzza get in touch with Evan early

Didn't realize my name wasn't appearing on google groups. Seems that I actually had this account in Google since 2010 and forgot about it. We can just use my real name.

--Prasanth

From: Ian Denhardt <i...@zenhack.net>
Sent: Thursday, May 30, 2019 6:36 PM
To: 'Kenton Varda' via Cap'n Proto; Kenton Varda
Cc: David Renshaw; prasanth somasundar; capnproto

Subject: Re: [capnproto] Cap'n Proto for Elm

Kenton Varda

unread,

May 31, 2019, 3:07:36 PM5/31/19

to Prasanth Somasundar, Ian Denhardt, 'Kenton Varda' via Cap'n Proto, David Renshaw

On Fri, May 31, 2019 at 11:10 AM Prasanth Somasundar <mez...@live.com> wrote:

>    Actual line of code from Cloudflare Workers:
>    case PipelineDef::Stage::Worker::Global::Value::JSON:
>    So again, I'm not sure this is a problem specific to certain languages.
>    :)

Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):

    websession.WebSession_WebSocketStream_sendBytes_Results_Future

That's pretty bad in both cases. However, in C++, you get the benefit of using declarations on namespaces and classes, so if you wished, you could do something like:

using Global = Pipeline::Stage::Worker::Global;

case Global::Value::JSON:

I can say from experience writing quite a bit of C++ code that consumed protobufs that this was the only thing that made it look sane. Unfortunately, if we use function name based namespacing (if you want to call it that), we cannot get this benefit easily as you'd have to rename all the functions consistently and on an ad-hoc basis.

With Elm and Haskell, I believe the only layer of namespacing that's provided is the module name which is why both Ian and I gravitated to it initially.

Are you saying Elm and Haskell don't provide any way to declare aliases? That seems surprising to me.

-Kenton

Kenton Varda

unread,

May 31, 2019, 3:10:50 PM5/31/19

to Ian Denhardt, 'Kenton Varda' via Cap'n Proto, David Renshaw, prasanth somasundar

On Thu, May 30, 2019 at 3:40 PM Ian Denhardt <i...@zenhack.net> wrote:

Oof, point taken. Slightly off topic for this list, but how would you
feel about accepting (wire compatible) patches to the sandstorm schema
to flatten the namespace? Here's another doozy (from the output of the
go code generator):

websession.WebSession_WebSocketStream_sendBytes_Results_Future

Ehh... Such changes would probably lead to a lot of source code churn, possibly including in apps... So I'd prefer not to.

-Kenton

David Renshaw

unread,

May 31, 2019, 5:54:32 PM5/31/19

to Kenton Varda, Prasanth Somasundar, Ian Denhardt, 'Kenton Varda' via Cap'n Proto

On Fri, May 31, 2019 at 3:07 PM Kenton Varda <ken...@cloudflare.com> wrote:

With Elm and Haskell, I believe the only layer of namespacing that's provided is the module name which is why both Ian and I gravitated to it initially.

Are you saying Elm and Haskell don't provide any way to declare aliases? That seems surprising to me.

Note that Prasanth's example aliases a module that's at an intermediate point in the nested hierarchy. I doubt that Haskell would let you do something like:

type Global = Pipeline'Stage'Worker'Global;
...
case Global'Value'JSON: ...

case Global'Value'NUMBER: ...

(Hm... or maybe something that could in fact work, but would require the code generator to provide in each module type aliases to all types in that module's "child" modules? Even so, I suspect that would not be very practical.)

Kenton Varda

unread,

May 31, 2019, 7:10:55 PM5/31/19

to David Renshaw, Prasanth Somasundar, Ian Denhardt, 'Kenton Varda' via Cap'n Proto

On Fri, May 31, 2019 at 2:54 PM David Renshaw <dwre...@gmail.com> wrote:

Note that Prasanth's example aliases a module that's at an intermediate point in the nested hierarchy. I doubt that Haskell would let you do something like:

type Global = Pipeline'Stage'Worker'Global;
...
case Global'Value'JSON: ...
case Global'Value'NUMBER: ...

Oh duh, good point... When the namespace is flattened, aliases only shorten the specific declaration aliased and not all of its nested declarations.

Yeah I guess that's tough...

-Kenton

Ian Denhardt

unread,

May 31, 2019, 7:48:04 PM5/31/19

to 'Kenton Varda' via Cap'n Proto, David Renshaw, Kenton Varda, Prasanth Somasundar, 'Kenton Varda' via Cap'n Proto

And in fact you can't even do:

type JSON = Pipeline'Stage'Worker'Globl'Value'JSON

case ... of
JSON -> ...

Because you're giving an alias to the *type*, not the variant tag.
Haskell has a PatternSynonyms extension that I've not used heavily, but
Elm doesn't give you anything to work with here.

-Ian

Quoting 'Kenton Varda' via Cap'n Proto (2019-05-31 19:10:18)
> On Fri, May 31, 2019 at 2:54 PM David Renshaw <[1]dwre...@gmail.com>

> --
> You received this message because you are subscribed to the Google
> Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [2]capnproto+...@googlegroups.com.
> Visit this group at [3]https://groups.google.com/group/capnproto.
> To view this discussion on the web visit

> [4]https://groups.google.com/d/msgid/capnproto/CAJouXQkbbtky0PLWJCc%3D3
> GmjHvsxbZK9TtGAfBhCRxr%2BSyF%2B0Q%40mail.gmail.com.
>
> Verweise
>
> 1. mailto:dwre...@gmail.com

> 2. mailto:capnproto+...@googlegroups.com
> 3. https://groups.google.com/group/capnproto

> 4. https://groups.google.com/d/msgid/capnproto/CAJouXQkbbtky0PLWJCc%3D3GmjHvsxbZK9TtGAfBhCRxr%2BSyF%2B0Q%40mail.gmail.com?utm_medium=email&utm_source=footer

Prasanth Somasundar

unread,

Jun 1, 2019, 4:21:22 AM6/1/19

to Ian Denhardt, 'Kenton Varda' via Cap'n Proto, David Renshaw, Kenton Varda

Because you're giving an alias to the *type*, not the variant tag.
Haskell has a PatternSynonyms extension that I've not used heavily, but

Elm doesn't give you anything to work with here.

Except for module names. Like I said, I'll try to fix this as best I can and see if it makes sense to pursue. My current plan is to topologically sort the graph, break any cycles with extra generics, and then hide the generics in `Internal` modules. Not 100% it'll work, but we'll see. It's fairly complex, but I would feel better about that than presenting long names with no escape.

Oh duh, good point... When the namespace is flattened, aliases only shorten the specific declaration aliased and not all of its nested declarations.

Yea, exactly. At that point, you're not flattening a bundle of code, you're just renaming a value.

Anyhoo, my next steps as I see them (roughly in order, but some can be parallelized):

Decide what writes could/should look like. Only a first pass as I'm sure people on the Elm side will have strong opinions here.
See if I can use a module per struct.
Update doc on the security discussion and the default value discussion.
Prod Evan on Elm discourse to see if he's willing to support some sort of byte array.
Build a fixed width numeric library.
Write the Elm library.
Build out the compiler plugin in either Haskell or C++.

Let me know if I missed anything. And thanks for the help and insights.

--Prasanth

From: Ian Denhardt <i...@zenhack.net>
Sent: Friday, May 31, 2019 7:43 PM
To: 'Kenton Varda' via Cap'n Proto; David Renshaw; Kenton Varda
Cc: Prasanth Somasundar; 'Kenton Varda' via Cap'n Proto

Subject: Re: [capnproto] Cap'n Proto for Elm

Ian Denhardt

unread,

Jun 1, 2019, 1:06:37 PM6/1/19

to 'Kenton Varda' via Cap'n Proto, David Renshaw, Kenton Varda, Prasanth Somasundar

Quoting Prasanth Somasundar (2019-06-01 04:21:18)

> Except for module names. Like I said, I'll try to fix this as best I
> can and see if it makes sense to pursue. My current plan is to
> topologically sort the graph, break any cycles with extra generics, and
> then hide the generics in `Internal` modules. Not 100% it'll work, but
> we'll see. It's fairly complex, but I would feel better about that than
> presenting long names with no escape.

You may run into some trouble with this due to the fact that Elm doesn't
let you re-export values imported from another module. So, there's no
way to import a data constructor from anywhere other than the module in
which it was defined, iirc. If you decide to go this route make sure you
write some proof of concept modules early on to verify that the code you
intend to generate will actually build.

It's your project, but my advice would still be to keep it simple. I
spent far longer than I care to think about banging my head against this
kind of thing. As has been discussed, deeply nested namespaces generate
painful APIs even if the target language has the needed namespace
support, So I really think well-written schema should avoid them anyhow.
And it is always possible to modify a schema to change the names and
namespace structure without breaking wire compatibility, so one is never
*really* stuck with what upstream has given them (e.g. I see why Kenton
doesn't want to change the sandstorm stuff, but there's no reason I
couldn't use my own forked versions to write apps).

Best of luck, whatever route you decide to go.

-Ian

Prasanth Somasundar

unread,

Jun 1, 2019, 3:09:32 PM6/1/19

to Ian Denhardt, 'Kenton Varda' via Cap'n Proto, David Renshaw, Kenton Varda, Prasanth Somasundar

Yea, I'm not going to put a ton of time into that particular problem. Maybe just a few hours.

--Prasanth

-----Original Message-----
From: capn...@googlegroups.com <capn...@googlegroups.com> On Behalf Of Ian Denhardt
Sent: Saturday, June 1, 2019 10:02 AM
To: 'Kenton Varda' via Cap'n Proto <capn...@googlegroups.com>; David Renshaw <dwre...@gmail.com>; Kenton Varda <ken...@cloudflare.com>; Prasanth Somasundar <mez...@live.com>
Subject: Re: [capnproto] Cap'n Proto for Elm

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.

To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/155940854142.15918.7713439203705816260%40localhost.localdomain.

Prasanth Somasundar

unread,

Jun 1, 2019, 11:12:22 PM6/1/19

to Ian Denhardt, 'Kenton Varda' via Cap'n Proto, David Renshaw, Kenton Varda, Prasanth Somasundar

Ok, so I spent almost exactly one hour on modules per struct and have concluded that this is impossible. I did have one other idea that I thought I might float by you guys:

What about just dumping everything into the same module, but without the prefixed namespace for every value/type/type constructor. The reason that we're prefixing everything with underscores is to avoid name collisions, but we can simply error on those while still giving users `$Elm.overrideName` to allow a user to manually resolve collisions.

This works in Elm particularly because record fields do not have name collisions (or more accurately, record fields are encoded in the type, so the same field can access different types in different records). Fields, which are the things that are most likely to overlap, can overlap, but Structs/Unions/Enums cannot. This might be a bit opaque to the user, but I'm hoping through documentation, good error messages, and an easy escape hatch via annotations, this can be alleviated.

I suspect that 90% of schemas files will not have name collisions internally and this way we get to keep shorter names and easier to use but transfer some complexity to the schema file.

There is one exception to field names overlapping: A struct with a union uses its field names as Type constructors. For this I basically have no option other than to prefix the union's name.

--Prasanth

Ian Denhardt

unread,

Jun 1, 2019, 11:28:52 PM6/1/19

to 'Kenton Varda' via Cap'n Proto, David Renshaw, Kenton Varda, Prasanth Somasundar, Prasanth Somasundar

Quoting Prasanth Somasundar (2019-06-01 23:12:20)

> What about just dumping everything into the same module, but without
> the prefixed namespace for every value/type/type constructor. The
> reason that we're prefixing everything with underscores is to avoid
> name collisions, but we can simply error on those while still giving
> users `$Elm.overrideName` to allow a user to manually resolve
> collisions.

I like this, though I might suggest still prefixing data constructors
(at least with the type name, not necessarily the whole path), as it's
pretty common to have overlapping names for union members, which would
cause reasonably frequent conflicts. I've found having *one* level of
name spacing like this for the Haskell implementation isn't too bad.

I toyed at one point with the idea of having the code generator
automatically chop off as many layers of nesting as it could without
causing collisions, though that's a bit more complicated to implement.

-Ian

Prasanth Somasundar

unread,

Jun 1, 2019, 11:48:19 PM6/1/19

to Ian Denhardt, 'Kenton Varda' via Cap'n Proto, David Renshaw, Kenton Varda

> I like this
Cool

> I toyed at one point with the idea of having the code generator automatically chop off as many layers of nesting as it could without causing collisions, though that's a bit more complicated to implement.

I thought about this myself, but the biggest issue I have with it isn't the technical challenge, but the UX challenge. This would mean that if you ever added the wrong name to a schema file, you could end up changing most if not all of the names in the file. That sounds really rough to impose and is basically a showstopper for me.

--Prasanth

-----Original Message-----
From: Ian Denhardt <i...@zenhack.net>

Sent: Saturday, June 1, 2019 8:25 PM
To: 'Kenton Varda' via Cap'n Proto <capn...@googlegroups.com>; David Renshaw <dwre...@gmail.com>; Kenton Varda <ken...@cloudflare.com>; Prasanth Somasundar <Mez...@live.com>; Prasanth Somasundar <mez...@live.com>
Subject: RE: [capnproto] Cap'n Proto for Elm

Prasanth Somasundar

unread,

Jun 6, 2019, 10:41:31 PM6/6/19

to Ian Denhardt, 'Kenton Varda' via Cap'n Proto, David Renshaw, Kenton Varda

Hey guys,

I've updated the doc with a new write API suggestion. The gist is that writes never happen to the byte array. Instead, we build a virtual proto on top of the byte array that's accessed first for any modifications. It's built using `Maybe` all the way down. `Nothing` implies that there are no changes from the byte array. When you transmit/convert it into bytes, the framework performs a "merge" operation before sending over the wire. We will also provide this merge operation to advanced users who wish to manually do this compaction at some points.

This is clearly complicated and worrying, but it's the best API for inplace writes over immutable data that I could think of. There are alternatives, but I wasn't too fond of them. You can read about a few that I thought of in the doc.

As for random-access on gets, it seems that we can use the bytes function. Its implementation simply creates a new DataView and updates the index, so it should be a constant time operation. No changes to elm/bytes required 😂

There are also a few small details around the types that have changed in order to support the write operations. They've become more complex, but I'm going to live with that for now and see if I can reduce their complexity later.

I'm publishing this doc to Elm discourse soon to get feedback from the Elm side of things as well. Thanks again for all the feedback; It's been super helpful.

--Prasanth

From: capn...@googlegroups.com <capn...@googlegroups.com> on behalf of Prasanth Somasundar <Mez...@live.com>
Sent: Saturday, June 1, 2019 11:48 PM
To: Ian Denhardt; 'Kenton Varda' via Cap'n Proto; David Renshaw; Kenton Varda

Subject: RE: [capnproto] Cap'n Proto for Elm

> I like this
Cool

> I toyed at one point with the idea of having the code generator automatically chop off as many layers of nesting as it could without causing collisions, though that's a bit more complicated to implement.

I thought about this myself, but the biggest issue I have with it isn't the technical challenge, but the UX challenge. This would mean that if you ever added the wrong name to a schema file, you could end up changing most if not all of the names in the file. That sounds really rough to impose and is basically a showstopper for me.

--Prasanth

-----Original Message-----
From: Ian Denhardt <i...@zenhack.net>
Sent: Saturday, June 1, 2019 8:25 PM
To: 'Kenton Varda' via Cap'n Proto <capn...@googlegroups.com>; David Renshaw <dwre...@gmail.com>; Kenton Varda <ken...@cloudflare.com>; Prasanth Somasundar <Mez...@live.com>; Prasanth Somasundar <mez...@live.com>
Subject: RE: [capnproto] Cap'n Proto for Elm

Quoting Prasanth Somasundar (2019-06-01 23:12:20)

> What about just dumping everything into the same module, but without
> the prefixed namespace for every value/type/type constructor. The
> reason that we're prefixing everything with underscores is to avoid
> name collisions, but we can simply error on those while still giving
> users `$Elm.overrideName` to allow a user to manually resolve
> collisions.

I like this, though I might suggest still prefixing data constructors (at least with the type name, not necessarily the whole path), as it's pretty common to have overlapping names for union members, which would cause reasonably frequent conflicts. I've found having *one* level of name spacing like this for the Haskell implementation isn't too bad.

I toyed at one point with the idea of having the code generator automatically chop off as many layers of nesting as it could without causing collisions, though that's a bit more complicated to implement.

-Ian

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.

To view this discussion on the web visit https://groups.google.com/d/msgid/capnproto/BYAPR11MB259904C8B85E0B51169A7816C51B0%40BYAPR11MB2599.namprd11.prod.outlook.com.

Kenton Varda

unread,

Jun 7, 2019, 9:46:16 AM6/7/19

to Prasanth Somasundar, Ian Denhardt, 'Kenton Varda' via Cap'n Proto, David Renshaw

On Thu, Jun 6, 2019 at 7:41 PM Prasanth Somasundar <mez...@live.com> wrote:

Hey guys,

I've updated the doc with a new write API suggestion. The gist is that writes never happen to the byte array. Instead, we build a virtual proto on top of the byte array that's accessed first for any modifications. It's built using `Maybe` all the way down. `Nothing` implies that there are no changes from the byte array. When you transmit/convert it into bytes, the framework performs a "merge" operation before sending over the wire. We will also provide this merge operation to advanced users who wish to manually do this compaction at some points.

This sounds like almost a standard design pattern in pure-functional languages -- amortize modifications by storing them to a side structure with periodic merges. I feel like I read about this kind of approach in Osaki's "Purely Functional Data Structures" (though it's been a while...).

Which is to say, it seems like a pretty reasonable approach. :)

-Kenton

Ian Denhardt

unread,

Jun 7, 2019, 1:08:21 PM6/7/19

to Kenton Varda, Prasanth Somasundar, 'Kenton Varda' via Cap'n Proto, David Renshaw

Quoting Kenton Varda (2019-06-07 09:45:37)

> On Thu, Jun 6, 2019 at 7:41 PM Prasanth Somasundar
> <[1]mez...@live.com> wrote:
>
> Hey guys,

> I've updated [2]the doc with a new write API suggestion. The gist is

> that writes never happen to the byte array. Instead, we build a virtual
> proto on top of the byte array that's accessed first for any
> modifications. It's built using `Maybe` all the way down. `Nothing`
> implies that there are no changes from the byte array. When you
> transmit/convert it into bytes, the framework performs a "merge"
> operation before sending over the wire. We will also provide this merge
> operation to advanced users who wish to manually do this compaction at
> some points.
>

> This sounds like almost a standard design pattern in pure-functional
> languages -- amortize modifications by storing them to a side structure
> with periodic merges. I feel like I read about this kind of approach in
> Osaki's "Purely Functional Data Structures" (though it's been a
> while...).
> Which is to say, it seems like a pretty reasonable approach. :)
> -Kenton

You can't really do the amortization tricks from Okasaki's work in Elm,
since they rely critically on being able to do lazy evaluation -- which
the Elm runtime doesn't provide. So amortization in Elm can only work if
you use the data structure in a "single-threaded" way.

I think you may be on to something, but it needs fleshing out -- it's
still not clear to me how it would actually work.

-Ian

Ian Denhardt

unread,

Mar 18, 2020, 1:44:00 PM3/18/20

to capn...@googlegroups.com, prasanth somasundar

Did this ever go anywhere? I'm working on some elm <-> haskell
interaction for a sandstorm app and this would be nice to have.

Quoting prasanth somasundar (2019-05-29 05:00:46)
> Hey Everyone,
>
> I'm thinking about building out a Cap'n Proto implementation in Elm for
> the fun of it. Thought I'd send an email to this list as suggested and
> get some feedback on [1]the initial design which I've also linked
> below. Any thoughts, comments, or concerns are appreciated.
>
>
> --Prasanth
>
>
> Another link to the doc in case you missed it:
> https://docs.google.com/document/d/12qMVyQPOWTXviFKIpjKLXgusKZ95miuRmu9
> AxacyGOA/edit?usp=sharing
>
>

> --
> You received this message because you are subscribed to the Google
> Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send

> an email to [2]capnproto+...@googlegroups.com.
> Visit this group at [3]https://groups.google.com/group/capnproto.

> To view this discussion on the web visit

> [4]https://groups.google.com/d/msgid/capnproto/BYAPR11MB259933537902814
> B8C0041F4C51F0%40BYAPR11MB2599.namprd11.prod.outlook.com.
>
> Verweise
>
> 1. https://docs.google.com/document/d/12qMVyQPOWTXviFKIpjKLXgusKZ95miuRmu9AxacyGOA/edit?usp=sharing

> 2. mailto:capnproto+...@googlegroups.com
> 3. https://groups.google.com/group/capnproto

> 4. https://groups.google.com/d/msgid/capnproto/BYAPR11MB259933537902814B8C0041F4C51F0%40BYAPR11MB2599.namprd11.prod.outlook.com?utm_medium=email&utm_source=footer

Prasanth Somasundar

unread,

Apr 12, 2020, 3:03:23 PM4/12/20

to Ian Denhardt, capn...@googlegroups.com, Prasanth Somasundar

Hey Ian,
I'm still working on it. Unfortunately, I got pulled into a bunch of things mid last year and haven't gotten a chance to work on it for a few months.

--Prasanth

-----Original Message-----
From: Ian Denhardt <i...@zenhack.net>

Reply all

Reply to author

Forward