Cloud Haskell, closures, and statics

Simon Peyton Jones

unread,

Sep 14, 2018, 11:39:03 AM9/14/18

to parallel...@googlegroups.com, Facundo Domínguez, Tim Watson, Edsko de Vries, Neil Mitchell, Mathieu Boespflug, Simon Peyton Jones, Dimitrios Vytiniotis (dimitriv@gmail.com), Neel Krishnaswami

Friends, esp Edsko, Tim, Facundo, Neil, Mathieu

Neel, Dimitrios, and I have started to think a bit more seriously about how static values and closures should work in Cloud Haskell and similar.

The distributed-closures library is a major inspiration.

Also relevant are various wiki pages: StaticPointers, and older version (?), and a bit about polymorphism.

We are very interested in things like

· What are the primary use-cases?

· Where does the shoe pinch? What is awkward or impossible?

The thinking behind distributed-closures; e.g.
- Is closure-application important? Here Would it help if it didn’t need a Typeable constraint? (We think we know how to do that.)
- Is cmap important? Here
- Why did you define StaticFunctor and StaticApply?
- What is cDuplicate for?

· Any documents, blog posts, we should read?

Thanks!

Simon

Alp Mestanogullari

unread,

Sep 14, 2018, 5:57:38 PM9/14/18

to Simon Peyton-Jones, parallel...@googlegroups.com, Edsko de Vries

Edsko did a pretty nice talk that covers many of your questions at the haskell exchange last year (you need an account to view it, IIRC, but it's free). It's available at https://skillsmatter.com/skillscasts/10632-static-pointers-closures-and-polymorphism.

--
You received this message because you are subscribed to the Google Groups "parallel-haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to parallel-haske...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Alp Mestanogullari

Simon Peyton Jones

unread,

Sep 14, 2018, 7:12:40 PM9/14/18

to Alp Mestanogullari, parallel...@googlegroups.com, Edsko de Vries, Neel Krishnaswami, Dimitrios Vytiniotis (dimitriv@gmail.com)

Great talk Edsko! That does indeed help a lot. Thank you Alp.

Neel, Dimitrios: watch the talk!

Simon

Tim Watson

unread,

Nov 10, 2018, 9:06:37 AM11/10/18

to parallel-haskell

Some additional thoughts to add to this....

It would be massively helpful if one didn't have to think about closures at all. Ideally from a library user's perspective, I'd like to just pass some data to a function and hit a compiler error if we cannot work out how to make a closure for it.

If a static value is passed, the CH library ought to be able to skip making a closure for it if the receiving process is on the same local node (i.e. in the same OS process' address space), since the overheads are pointless.

A lot of the library code that sits on top of distributed-process makes use of scope to leverage non-serialisable thunks (e.g. STM), and I'd like to avoid breaking that, as it makes it easy for code outside of the Process monad (e.g., WAI handlers, etc) to seamlessly integrate with CH code running on the local node. Examples: https://github.com/haskell-distributed/distributed-process-async/blob/master/src/Control/Distributed/Process/Async.hs#L121 (basically a CH version of Simon Marlow's async library), and https://github.com/haskell-distributed/distributed-process-client-server/blob/master/src/Control/Distributed/Process/ManagedProcess/Server.hs#L424 (which calls into https://github.com/haskell-distributed/distributed-process-client-server/blob/master/src/Control/Distributed/Process/ManagedProcess/Internal/GenProcess.hs#L608 in the gen_server process/server loop, which in turn relies on the matchSTM primitive in distributed-process).

Cloud Haskell currently tries quite hard to avoid (de)serialisation when passing data to processes on the same local node. See https://github.com/haskell-distributed/distributed-process/issues/275, for example. We have a number of open issues with regards further optimising this. Whatever we do to improve on static pointer usage in CH, we want to keep this optimised, as the overheads are huge.

Just a few extra things to be aware of...

I've yet to actually make use of static pointers, since almost all my CH code is library/utility code, but I plan to start using them in test cases to get acclimatised.

Cheers,

Tim

Tim Watson

unread,

Nov 21, 2018, 9:16:54 PM11/21/18

to Simon Peyton-Jones, parallel...@googlegroups.com, Facundo Domínguez, well-...@edsko.net, Neil Mitchell, Mathieu Boespflug, dimi...@gmail.com, neelakantan....@gmail.com

Hi all,

A slightly less lazy answer from me than the last one, covering points one and two only, and based on trawling the mailing lists, our slack channel, irc, and issues that have been raised on github...

* It is a pain having to run the same executable!

This is a twofold issue. Firstly, there are no guarantees that two nodes which are communicating are running the same executable, so currently things could go horribly wrong and all bets are off if you do that. So it would be nice if we could ensure (or at least detect) that static values are compatible between nodes.

The second issue here is that this is a very restrictive approach. What if we want to build a client-server application: do we really want to force people to bundle the client and server code into the same executable? But if we do not do that, how do we ensure that passed values are properly understood by both nodes?

* It is awkward dealing with remotetables and keys into them

Static pointers ought to make this easier, and Edsko's talk did demonstrate a lot of techniques for doing that, but there's a lot of machinery to it still, and I can definitely see that putting newcomers off.

* It is awkward understand how to work with Closures

Not necessarily a static pointers issue here - though if closures we're first class perhaps this would be simpler - but currently it's very awkward figuring out how to write composable code, and you can't easily use them in a combinatorial style. Take the code Simon M referenced in https://github.com/haskell-distributed/distributed-process/issues/103 for an example of how ugly this looks at the moment (though you could just have `static go` I guess, but I'm not sure how you'd introduce that currently, and it would break the large amount of code that's out there which doesn't run on the same executable anywhere. Which leads me on to...

* Not everyone cares about spawn/closures

Some people just pass data around. Of those people, a subset currently don't worry about there being different executables, they just put common types into a library dependency that both programs use, and hope for the best. It would be nice if (a) we didn't break that, and (b) we improved the situation so that we can verify that the programs are able to interact safely when sending data structures that do not contain closures or static pointers to one another.

Cheers,

Tim

Tim Watson

unread,

Nov 26, 2018, 9:09:10 AM11/26/18

to Neil Mitchell, Simon Peyton Jones, parallel...@googlegroups.com, facundo....@tweag.io, well-...@edsko.net, Boespflug, Mathieu, dimi...@gmail.com, neelakantan....@gmail.com

That's very similar to my experience with Erlang and avoiding sending functions over the wire to be honest!

On Mon, 26 Nov 2018, 13:00 Neil Mitchell <ndmit...@gmail.com wrote:

Hi Simon,

The general drift in our experience over time was that allowing
arbitrary closures to be serialised, while convenient, exposes API's
which aren't very stable and have too much power. That's fine for a
certain subset of problems, but for what we were using serialising
closures for, it started to get problematic and we moved to explicit
endpoints with real APIs. Perhaps that's a viewpoint unique to the
domain I was working in at the time, or perhaps that suggests that an
incremental path to/from serialised closures would help. I appreciate
that is probably of no use to how to evolve static pointers!

Thanks, Neil

Boespflug, Mathieu

unread,

Nov 26, 2018, 9:53:16 AM11/26/18

to Tim Watson, Neil Mitchell, Simon Peyton Jones, parallel...@googlegroups.com, Facundo Domínguez, well-...@edsko.net, dimi...@gmail.com, neelakantan....@gmail.com

And a viewpoint held pretty consistently across many sections of the industry. To be honest, Erlang is a bit of an outlier here. It's practically the only framework widely used across hundreds of companies in the industry that allows for shippable closures. Most get by (just fine?) with API endpoints as Neil suggests, provided there are good strategies in place to locate these endpoints and adequately describe the expected format of their inputs and outputs.

This is why Tweag hasn't pushed much in the way of static pointers in the last 2 years, while we did previously (we contributed -XStaticPointers). They've just not been as important as we initially thought they would. Now, the question becomes: is the under utilization of shippable closures an artifact of poor tooling, or is the need just not there? I don't have a good answer to that. I think there are good use cases for static pointers, e.g. in the API of distributed data parallel frameworks like Apache Spark. But at the very least, good use cases for static pointers are just more limited than one would have assumed at first, for the reasons Neil already mentioned, and because it makes upgrades more complicated.

--
Mathieu Boespflug
Founder at http://tweag.io.

Simon Peyton Jones

unread,

Nov 27, 2018, 4:09:23 AM11/27/18

to Boespflug, Mathieu, Tim Watson, Neil Mitchell, parallel...@googlegroups.com, Facundo Domínguez, well-...@edsko.net, dimi...@gmail.com, neelakantan....@gmail.com, Simon Peyton Jones

Most get by (just fine?) with API endpoints as Neil suggests

Can you just say a bit more about what an “API endpoint” means, concretely?

I believe it means something like:

Here is a URL
Send a request, pretty much in ASCII or something, conforming to a particular grammar. E.g. JSON?
And I’ll service that request and send you a reply.
Mostly stateless

Sorry to be so ignorant. A short cut might be to point to a canonical concrete example.

A big advantage of this approach is that it is fully language-independent: since it specifies the on-the-wire format, the client and server can be written in different languages.

On the other hand, Erlang has a sophisticated failure model that allows a complex distributed system to recover robustly. It may be an outlier but I think Scala’s version (Akka) is pretty successful too. I have no idea of the tradeoff between the additional power of Erlang’s model vs the simplicity of a lowest-common-denominator. Someone must have written papers or blog posts about this!

Thanks

Simon

From: Boespflug, Mathieu <m...@tweag.io>
Sent: 26 November 2018 14:53
To: Tim Watson <watson....@gmail.com>
Cc: Neil Mitchell <ndmit...@gmail.com>; Simon Peyton Jones <sim...@microsoft.com>; parallel...@googlegroups.com; Facundo Domínguez <facundo....@tweag.io>; well-...@edsko.net; dimi...@gmail.com; neelakantan....@gmail.com
Subject: Re: Cloud Haskell, closures, and statics

And a viewpoint held pretty consistently across many sections of the industry. To be honest, Erlang is a bit of an outlier here. It's practically the only framework widely used across hundreds of companies in the industry that allows for shippable closures. Most get by (just fine?) with API endpoints as Neil suggests, provided there are good strategies in place to locate these endpoints and adequately describe the expected format of their inputs and outputs.

Tim Watson

unread,

Nov 27, 2018, 6:12:13 AM11/27/18

to Simon Peyton-Jones, Mathieu Boespflug, Neil Mitchell, parallel...@googlegroups.com, Facundo Domínguez, well-...@edsko.net, Dimitrios Vytiniotis, Neelakantan Krishnaswami

I have some thoughts about this...

Firstly, I'd like to point out that there are still some production systems out there using Cloud Haskell. And not the least of which, https://www.adjoint.io/ are using distributed-process-supervisor to ensure that their Cloud Haskell based infrastructure is fault tolerant! One of their /complaints/, is that supervisor requires you to specify its supervisees/child-processes using Closure (Process ()), when the children are never spawned remotely. So actually, I think one of the failings of the current Cloud Haskell is its focus on replicating Erlang's location transparency, and not allowing the local use cases to ignore the issues that surface due to distribution.

Looking at Erlang's success stories, I think they've largely been oriented around it's excellent usability for writing highly concurrent systems, and the advantages of supervision for building in fault tolerance to individual nodes. Erlang systems which are properly fault tolerant across clustered nodes are still complex to build and get right.

Akka is hugely successful too, and that success is not limited to doing distributed data processing. One of Erlang's biggest industrial success stories is RabbitMQ, an AMQP message broker, and one of Akka's is the Play Framework, a web framework built on Akka + non-blocking socket I/O. Turns out that actors + supervision makes for an absolutely brilliant foundation for building performant and reliable networking code. Another place in which Akka does this particularly well, is in Akka Streams (its implementation of Reactive Streams).

What can Cloud Haskell learn from these success stories? I think we need to focus on making it easier to take advantage of the reliability features we're offering. Moreover, if you look at Rabbit, and Play, the programmer is pretty much shielded from the actor model! Actors and supervision is something that's happening behind the scenes, but the end user/programmer is in a familiar space - writing a web request handler or composing a stream built out of pipe-and-filter-esque components.

I do think we can, and should, make sending serialisable closures around easier. I suspect though, that this isn't the reason people on this list have moved away from using Cloud Haskell and on to other things. I'm going to post a proposal for some improvements, not least of which is making writing actor systems feel more Haskell-ish, and making it easier to write fault tolerant distributed systems that aren't actor based!

Cheers,

Tim

Boespflug, Mathieu

unread,

Nov 27, 2018, 7:22:15 AM11/27/18

to Tim Watson, Simon Peyton Jones, Neil Mitchell, parallel...@googlegroups.com, Facundo Domínguez, well-...@edsko.net, dimi...@gmail.com, neelakantan....@gmail.com

> On the other hand, Erlang has a sophisticated failure model that allows a complex distributed system to recover robustly.

Yes but you can get one without the other. We're mixing a lot of things together in the conversation so let's tease them apart. We mentioned:

- API endpoints (like you defined them)

- Actors, that you can point to with an opaque reference rather than a well-known URL.

- Failure models and supervision hierarchies.

- Shippable closures (using static pointers or otherwise).

Whether actors are stateful or not, language-centric or not, have a typed interface (as in Akka) or not, message oriented rather than limited to request/reply, you can still have actors without shippable closures and you can have supervision hierarchies also without shippable closures (as Tim points out). You can have API endpoints or actors, or a mix of both. Actors can have a typed interface (as in Akka) or not.

Conversely, shippable closures are a useful abstraction outside of actors. Data parallel applications using Apache Spark are an example of that.

In general loose coupling is better than strong coupling. It's hard to avoid strong coupling with actors. So a common design pattern is API endpoints in the large and actors in the small (e.g. within just one service, whether the service is deployed on a single node or many nodes).

HTH,

--
Mathieu Boespflug
Founder at http://tweag.io.

Tim Watson

unread,

Nov 27, 2018, 7:42:11 AM11/27/18

to Mathieu Boespflug, Simon Peyton-Jones, Neil Mitchell, parallel...@googlegroups.com, Facundo Domínguez, well-...@edsko.net, Dimitrios Vytiniotis, Neelakantan Krishnaswami

My request for feedback on proposed changes to the future direction of Cloud Haskell is based on exactly this splitting up of concerns! :)

Domínguez, Facundo

unread,

Nov 27, 2018, 8:26:49 PM11/27/18

to Tim Watson, Mathieu Boespflug, Simon Peyton Jones, ndmit...@gmail.com, parallel...@googlegroups.com, well-...@edsko.net, dimi...@gmail.com, neelakantan....@gmail.com

> Can you just say a bit more about what an “API endpoint” means, concretely?

> ...

> Sorry to be so ignorant. A short cut might be to point to a canonical concrete example.

If I'm told "API endpoint the Haskell way" what jumps to mind is servant [1]. Likely not the panacea for every need out there, but might provide some inspiration.

Cheers,

Facundo

[1]: https://haskell-servant.github.io/

Tim Watson

unread,

Dec 2, 2018, 8:18:12 AM12/2/18

to Simon Peyton-Jones, parallel...@googlegroups.com, Facundo Domínguez, well-...@edsko.net, Neil Mitchell, Mathieu Boespflug, Dimitrios Vytiniotis, Neelakantan Krishnaswami

Simon et al - going back to the original question about static pointers, here is an issue we definitely need some help with!

https://github.com/haskell-distributed/rank1dynamic/issues/22#issuecomment-443506869

Cheers,

Tim

On Fri, 14 Sep 2018 at 16:39, Simon Peyton Jones <sim...@microsoft.com> wrote:

Gershom B

unread,

Dec 2, 2018, 6:17:27 PM12/2/18

to Simon Peyton-Jones, watson....@gmail.com, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Facundo Domínguez, Mathieu Boespflug, parallel...@googlegroups.com

That question reminds me of a longstanding idea I’ve had regarding static pointers. It seems to me that the requirement that two nodes be running the precise same executable is far too strong, and there’s no reason it can’t be weakened. In particular, isn’t it enough to match staticptrs on the module name/contents hash from which they derive? And isn’t this how they’re implemented anyway, in practice?

So, in theory, two executables can share staticptrs based on this situation, which is much weaker than the exact same executable, today, already.

If GHC just changed its documentation to reflect this as the intended situation, does that suffice? And if not, could it be a modest change to code so that this weaker guarantee did suffice?

Cheers,

Gershom

--

Tim Watson

unread,

Dec 2, 2018, 6:37:53 PM12/2/18

to Gershom Bazerman, Simon Peyton Jones, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Facundo Domínguez, Mathieu Boespflug, parallel...@googlegroups.com

One of the reasons I want to move to cborg/serialise is so we don't have to worry about which architecture something was compiled on. Won't that be an issue for static pointers too..?

Gershom B

unread,

Dec 2, 2018, 7:19:44 PM12/2/18

to Tim Watson, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Simon Peyton Jones, Dimitrios Vytiniotis, Facundo Domínguez, Mathieu Boespflug, parallel...@googlegroups.com

Why would that be an issue for staticptrs? The serialization itsn’t implicit in them. They just yield a statickey of the form `Fingerprint !Word64 !Word64` as far as I know (which can be derefed). How you choose to send that across the wire is up to you.

I think this relates to the general issue of why people keep insisting on the “same executable” requirement — they have a mental model where StaticPtrs actually resemble pointers in any meaningful way, when they absolutely don’t :-) As such the “same executable” requirement is a holdover from thinking about StaticPtrs with the model of the original paper, rather than recognizing that the actual semantics of the implementation are very distinct (and imho superior!)

-g

Tim Watson

unread,

Dec 2, 2018, 7:38:01 PM12/2/18

to Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Simon Peyton Jones, Dimitrios Vytiniotis, Facundo Domínguez, Mathieu Boespflug, parallel...@googlegroups.com

How does code at the receiving end know that the static pointer originates from a module that is loaded into the current system?

If this holds, that the library code is all that has to be shared, then the requirement is much less restrictive yes!

Gershom B

unread,

Dec 2, 2018, 7:45:59 PM12/2/18

to watson....@gmail.com, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Simon Peyton Jones, Dimitrios Vytiniotis, Facundo Domínguez, Mathieu Boespflug, parallel...@googlegroups.com

I can sketch a userland algorithm to check this now, but it would be nice if the guarantees could be pushed further down.

In particular, just considering the info at https://hackage.haskell.org/package/base-4.9.1.0/docs/GHC-StaticPtr.html then each executable could exchange its `staticPtrKeys` as annotated with `StaticPtrInfo` and then the communicating nodes could consider it safe to exchange ptrs only on that subset where the keys and info matched on both sides…

Note that this additional safety check only is necessary to prevent against the extremely rare occurance of hash collision to begin with...

(This is assuming what I believe to be the current guarantee — that the same version of the same module should necessarily generate the same fingerprints to begin with).

-g

--

Tim Watson

unread,

Dec 2, 2018, 7:48:42 PM12/2/18

to Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Simon Peyton Jones, Dimitrios Vytiniotis, Facundo Domínguez, Mathieu Boespflug, parallel...@googlegroups.com

That would be very good news indeed!

Simon Peyton Jones

unread,

Dec 6, 2018, 7:38:50 AM12/6/18

to Tim Watson, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Facundo Domínguez, Mathieu Boespflug, parallel...@googlegroups.com, Simon Peyton Jones

On the “same binary” question, here’s what I think:

· There should be no requirement to be running the same binary.

· When sending a static pointer, the implementation must send some bit-pattern on the wire; let’s call that the “static key”.

· We should assume the possibility of a “man in the middle” attack; so when receiving a static key you cannot assume that it is valid. The receiving end should never crash, regardless of how badly mangled the static keys are.

· It follows that

The receiving end must have a table of the static keys that it knows about; and should reject any others.
The receiving end should not crash even if one valid static key is mangled into another valid static key. That requires some kind of dynamic type check at the receiving end.

Note that the no-crash guarantee is entirely local to the receiving end, R. It receives a static pointer with static key K. It looks up K and finds a function whose type matches what it expects. No crash here.

It may de-serialise free variables etc from the message, using R’s own types. If the sender S was using different types, chances are the bit-pattern won’t parse, and we’ll fail during deserialization (but not crash).

It’s possible that S will serialise some value (x :: T) into a bitstring, that R successfully parses as some totally different data type (y :: T’). Then again, no seg-fault. If you are worried about this we could include some additional integrity information, such as the fingerprint of T. But the no-crash guarantee does not depend on this.

· The static key could be:

The machine address of the closure. That really would require running the same binary. But even in this case, the receiving end should have a hash-able of static keys that it knows about, so that it can reject mangled ones.
The index of the static closure in some fixed static-pointer table (SPT). This would be robust to some recompilation, but if you add a new static pointer and recompile, the indices in the SPT would change, so that static key 3 now refers to something different than before.
The package-name/module-name/function-name string for the static function. (Or a fingerprint thereof.) That will be much more robust to recompilation.
A fingerprint of the name and definition of the static function. This would be bad, because the whole point of recompilation of the receiving end might be to fix a bug in the function!
A fingerprint of the name and type of the static function would make a lot more sense. If the type changes, then we probably do want to give it a new key.

I’m sure that more variations are possible. The key thing is that there exist variants that are robust to recompilation. And indeed, the client and server could be running on different machine architectures!

I’m not quite sure what is implemented right now. It’d be good to decide what we want, and write down the plan, and then implement it.

Simon

From: Tim Watson <watson....@gmail.com>
Sent: 03 December 2018 00:49
To: Gershom Bazerman <gers...@gmail.com>
Cc: well-...@edsko.net; Neil Mitchell <ndmit...@gmail.com>; Neelakantan Krishnaswami <neelakantan....@gmail.com>; Simon Peyton Jones <sim...@microsoft.com>; Dimitrios Vytiniotis <dimi...@gmail.com>; Facundo Domínguez <facundo....@tweag.io>; Mathieu Boespflug <m...@tweag.io>; parallel...@googlegroups.com
Subject: Re: Cloud Haskell, closures, and statics

That would be very good news indeed!

Domínguez, Facundo

unread,

Dec 6, 2018, 8:38:44 AM12/6/18

to Simon Peyton Jones, Tim Watson, gers...@gmail.com, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Mathieu Boespflug, parallel...@googlegroups.com

> A fingerprint of the name and type of the static function would make a lot more sense. If the type changes, then we probably do want to give it a new key.

Ok. But the types which are equivalent up to alpha conversion should yield the same fingerprint, I guess.

> The package-name/module-name/function-name string for the static function. (Or a fingerprint thereof.) That will be much more robust to recompilation

This is almost the current approach. We use a unit id instead of a package name.

mkStaticPtrFingerprint :: Int -> Fingerprint

mkStaticPtrFingerprint n = fingerprintString $ intercalate ":"

[ unitIdString $ moduleUnitId this_mod

, moduleNameString $ moduleName this_mod

, show n

]

I'm not sure how stable the unit identifier is. If a module in a package is changed, and this causes the unit id to change, then all the fingerprints in the package will change (provided that all the modules are rebuilt).

Facundo

Tim Watson

unread,

Dec 6, 2018, 8:41:57 AM12/6/18

to Simon Peyton-Jones, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Facundo Domínguez, Mathieu Boespflug, parallel-haskell

Just to add one requirement to that... I don't want to have to explicitly mess about with these static pointer tables in client code. By all the means the decoding agent on R will need to know, and the presumably the encoding side also (so it recognises its own), but when writing application code, this should be hidden from the user.

Nobody wants to faff about writing `remoteTable = blah' in half a dozen modules, let alone do this:

cpEnableTraceRemote :: ProcessId -> Closure (Process ())

cpEnableTraceRemote =

closure (enableTraceStatic `staticCompose` decodeProcessIdStatic) . encode

where enableTraceStatic :: Static (ProcessId -> Process ())

enableTraceStatic = staticLabel "$enableTraceRemote"

remoteTable :: RemoteTable -> RemoteTable

remoteTable = registerStatic "$enableTraceRemote" (toDynamic enableTraceRemote)

enableTraceRemote :: ProcessId -> Process ()

enableTraceRemote pid = getSelfPid >>= enableTrace >> relay pid

-- | Starts a /trace relay/ process on the remote node, which forwards all trace

-- events to the registered tracer on /this/ (the calling process') node.

startTraceRelay :: NodeId -> Process ProcessId

startTraceRelay nodeId =

withRegisteredTracer $ \pid -> spawn nodeId $ cpEnableTraceRemote pid

I do realise that's as simple as it could've been made without static pointers underlying everything, but most newcomers just don't get how to use all the CP versions of CH primitives, and I'm not necessarily clear enough on how to explain it to them.

In fact, all I want to write is `static <thing>`, and in the worst case maybe the registerStatic bit as well...

See https://github.com/haskell-distributed/distributed-process/blob/master/src/Control/Distributed/Process/Internal/Closure/BuiltIn.hs for reference.

Cheers,

Tim

Simon Peyton Jones

unread,

Dec 6, 2018, 9:12:00 AM12/6/18

to Domínguez, Facundo, Tim Watson, gers...@gmail.com, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Mathieu Boespflug, parallel...@googlegroups.com

This is almost the current approach. We use a unit id instead of a package name.

Great

mkStaticPtrFingerprint :: Int -> Fingerprint

What is ‘n’?

Simon

From: Domínguez, Facundo <facundo....@tweag.io>

Sent: 06 December 2018 13:38
To: Simon Peyton Jones <sim...@microsoft.com>

Domínguez, Facundo

unread,

Dec 6, 2018, 9:20:56 AM12/6/18

to Simon Peyton Jones, Tim Watson, gers...@gmail.com, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Mathieu Boespflug, parallel...@googlegroups.com

> mkStaticPtrFingerprint :: Int -> Fingerprint

ehem, the input argument is an integer assigned to each static pointer in a module. The first occurring static pointer in a module is assigned 0 and the integer is incremented by one for each subsequent static pointer. Since static pointers are anonymous, the integer stands for a sort of static pointer name. Of course, one could come up with other ways to differentiate them.

Facundo

Simon Peyton Jones

unread,

Dec 6, 2018, 9:28:52 AM12/6/18

to Domínguez, Facundo, Tim Watson, gers...@gmail.com, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Mathieu Boespflug, parallel...@googlegroups.com

Ah but that’s exactly what we /don’t/ want, since it’s vulnerable to recompilation wibbles.

Surely the name and type are sufficient? For reasons discussed earlier in the thread.

PS: How is the fingerprint of the type included? (I agree it’s not strictly necessary.)

Domínguez, Facundo

unread,

Dec 6, 2018, 10:26:30 AM12/6/18

to Simon Peyton Jones, Tim Watson, Gershom B, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, Mathieu Boespflug, parallel...@googlegroups.com

> Ah but that’s exactly what we /don’t/ want, since it’s vulnerable to recompilation wibbles.

Right. That's true if we change the order in which static forms appear in the module where they are defined.

> Surely the name and type are sufficient?

Which name are you referring to? For instance in the following code:

f :: Int -> Closure Int

f i = Closure (static (\y -> decode y + g 2)) (encode i)

g :: Int -> Int

g x = x +1

Facundo

Boespflug, Mathieu

unread,

Dec 6, 2018, 1:50:30 PM12/6/18

to Facundo Domínguez, Simon Peyton Jones, Tim Watson, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, dimi...@gmail.com, parallel...@googlegroups.com

Here's something to stir up the debate a little, if I may. Consider the following strawman argument: I contend that if you need to worry about upgrades, then you're doing static pointers wrong.

For the reasons Neil mentioned upthread, static pointers are problematic for the very use case it was initially designed for (interacting services - let's call that distributed concurrency). It turns that they are still useful though, but for a different use case: distributed data parallel applications (e.g. Spark applications).

But distributed data parallel applications don't need to worry about upgrades. Relative to services, they are short-lived (a few minutes to a few days). If you realize that your compute function has a bug, you just tear down the computation and start from scratch or from the latest checkpoint with a revised function.

So before we worry about how to do upgrades, perhaps we should settle the question of whether there is a good use case for static pointers where upgrades are necessary?

Best,

--
Mathieu Boespflug
Founder at http://tweag.io.

Simon Peyton Jones

unread,

Dec 6, 2018, 4:08:46 PM12/6/18

to Boespflug, Mathieu, Facundo Domínguez, Tim Watson, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, parallel...@googlegroups.com, Simon Peyton Jones, Dimitrios Vytiniotis (dimitriv@gmail.com)

For the reasons Neil mentioned upthread, static pointers are problematic for the very use case it was initially designed for (interacting services - let's call that distributed concurrency).

I think you mean this:

| The general drift in our experience over time was that allowing

| arbitrary closures to be serialised, while convenient, exposes API's

| which aren't very stable and have too much power. That's fine for a

| certain subset of problems, but for what we were using serialising

| closures for, it started to get problematic and we moved to explicit

| endpoints with real APIs. Perhaps that's a viewpoint unique to the

| domain I was working in at the time, or perhaps that suggests that an

| incremental path to/from serialised closures would help. I appreciate

| that is probably of no use to how to evolve static pointers!

I didn’t really understand this. Erlang seems to make very effective use of distribution to make long-lived, highly-reliable distributed services. So I’m not ready to write off this class of applications. Are you sure we should?

Simon

Jeremy Huffman

unread,

Dec 6, 2018, 8:11:59 PM12/6/18

to sim...@microsoft.com, Boespflug, Mathieu, Facundo Domínguez, Tim Watson, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, parallel...@googlegroups.com, Dimitrios Vytiniotis (dimitriv@gmail.com)

I'm sure there are Erlang systems that do this, but the ones I'm familiar with implement distributed concurrency through message passing between stateful server processes. The most standard way, is to implement the gen_server behaviour and provide functions that wrap the call/cast operations so that you expose the server through a well-defined API. I think distributed Funs and StaticPtrs make more sense in the parallel workloads because in those cases you are not interested so much in the state of any process on a remote node, but rather its CPU resources and perhaps data locality.

Tim Watson

unread,

Dec 6, 2018, 9:31:10 PM12/6/18

to Jeremy Huffman, Simon Peyton Jones, Mathieu Boespflug, Facundo Domínguez, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, parallel...@googlegroups.com, Dimitrios Vytiniotis (dimitriv@gmail.com)

As part of my ongoing research into potentially splitting cloud haskell up into smaller parts, I've been considering the needs of distributed parallel workloads too.

Whilst I appreciate that opinions about the usefulness of static pointers are divided, it does sound like it wouldn't be too hard to polish them into a useful tool, so personally I do think we should persue this.

On a few other notes...

This observation of Jeremy's correlates with my experience working on production erlang systems... And the folks at adjoint, who appeared to be using cloud haskell in production, were at least using distributed-process-supervisor if not the client server library (which is our version of gen-server)... Just as recursion schemes suggests that explicit recursion is "the goto of functional programming", writing bare send/receive code in erlang is not considered the norm. The various gen_ abstractions from OTP are almost always preferred to vanilla process implementations...

Gershom B

unread,

Dec 6, 2018, 11:33:13 PM12/6/18

to Boespflug, Mathieu, Facundo Domínguez, Simon Peyton-Jones, Tim Watson, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, parallel...@googlegroups.com

> But distributed data parallel applications don't need to worry about upgrades. Relative to services, they are short-lived (a few minutes to a few days). If you realize that your compute function has a bug, you just tear down the computation and start from scratch or from the latest checkpoint with a revised function.

This is not the sort of distributed data-parallel applications I have
worked on, which _are_ ongoing, long-lived services, but _also_ need
distribution and parallelism. Often one also may have an edsl for the
computation so that even without binary upgrades you can "distribute"
new queries or analytics via the edsl. So the upgrade question has
always been important for me.

Further, it isn't simply the "upgrade" question -- I often
legitimately want different executables to communicate. For example,
suppose I have a system with six different types of applications,
across three different platforms. Some link to libraries available
only on one platform or another. Some do janitorial work on the
overall dataflow. Some do computation. Some coordinate, and others
expose UI. However, I want a single substrate of communication that
all these applications can share, and building one out of static
pointers is convenient. So I want different binaries that aren't just
"different versions" to all be able to communicate with static
pointers, so long as they share the _libraries_ they communicate with
amongst one another.

-g

Boespflug, Mathieu

unread,

Dec 7, 2018, 2:55:55 AM12/7/18

to Simon Peyton Jones, Facundo Domínguez, Tim Watson, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, parallel...@googlegroups.com, dimi...@gmail.com

On Thu, 6 Dec 2018 at 22:08, Simon Peyton Jones <sim...@microsoft.com> wrote:
>
> [...]

>
> I didn’t really understand this. Erlang seems to make very effective use of distribution to make long-lived, highly-reliable distributed services.

So did C++ with multiple inheritance and deep class inheritance
hierarchies. That doesn't mean that people still use those features in
modern C++ codebases (or don't use them very sparingly if at all).
Back to the topic at hand, I'm not an Erlang practitioner so it's hard
for me to say to what extent Erlangers would only want code shipping
to be pried from their code dead hands. But Tim suggested upthread
that in modern Erlang code shipping is rare. And both Jeremy and Tim
seem to confirm this downthread.

As noted previously, code shipping (whether in a weak form like with
static pointers or shipping actual bytecode) is not a required feature
for many of the patterns developed in Erlang that have made its
success (like supervision hierarchies).

Now, Jeremy, Tim - just to be sure we're all making the same point.
You both pointed to gen_server as a common pattern in Erlang. If I'm
using that pattern, is it common to, from the local node, explicitly
instantiate a gen_server with some handler function on a remote node
(and therefore perform code shipping)?

> So I’m not ready to write off this class of applications. Are you sure we should?

I'm not. I'm just challenging the rest of us to find good use cases
where we really do want to mix code shipping and upgrades, since in my
experience code shipping hasn't proved useful except for data parallel
applications. Despite my belief some years ago that code shipping
would be a lot more useful!

Simon Peyton Jones

unread,

Dec 7, 2018, 3:46:12 AM12/7/18

to Boespflug, Mathieu, Facundo Domínguez, Tim Watson, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, parallel...@googlegroups.com, dimi...@gmail.com

| Just as recursion schemes suggests that explicit recursion
| is "the goto of functional programming", writing bare send/receive
| code in erlang is not considered the norm

I'm losing the connection between "the availability of static pointers" and "the use of bare send/receive". No one is arguing for the latter -- of course, powerful but hard-to-use primitives should be wrapped in libraries. But powerful primitives make better libraries possible!

I /think/ you may be saying that "distributed-process-supervisor" (a good library) doesn't need static pointers? It would be interesting to know what it does and how.

Let me have a guess:

| You both pointed to gen_server as a common pattern in Erlang. If I'm using
| that pattern, is it common to, from the local node, explicitly instantiate
| a gen_server with some handler function on a remote node (and therefore
| perform code shipping)?

I'm interested in this "code shipping" point, and I'd like to understand it better.

If I'm a client talking to a server I may want it to perform a variety of services. One way to do that is to communicate on a channel of type
server :: Chan Service
where 'Service' is an algebraic data type
data Service = ReadFromDataBase ..args...
| WriteToDataBase ..args...
...etc...

Then in the server I'm going to have a case statement to dispatch to the right service code
case request of
ReadFromDataBase ..args.. -> doReadFromDataBase ..args..
WriteToDataBase ..args.. -> doWriteToDataBase ..args..
...etc..

So in effect I have enumerated the functions I can call remotely (the Service type), and I send data, not code pointers, to invoke them. It's very like "de-functionalisation" of higher order programs, in which we replace functions by data, and have a dispatch function.

I think this is what you mean when you say "expose the server through a well-defined API"?

This works fine, until I want to add a service. At that point I change the Service type, and all the clients need to be upgraded simultaneously. (Unless you very carefully design the serialisation protocol so that you can write a data value from a type with 3 constructors and read into a type with 4.) With static pointers you can send a static pointer to the service code; and when both client and server have been upgraded it'll work; until then it'll fail gracefully.

The other thing that does not work so well is if you want to send, in 'args', a piece of work to be evaluated on the server, a thunk. Perhaps its result is large, so the client doesn't want to do that work and sent it over the network to the server; perhaps it uses local I/O resources only available at the server. That's quite hard to do with a protocol like the one above.

But perhaps these use-cases are not that important in practice. But thinking of it as defunctionalisation is helpful to me.

Simon

| -----Original Message-----
| From: Boespflug, Mathieu <m...@tweag.io>
| Sent: 07 December 2018 07:56
| To: Simon Peyton Jones <sim...@microsoft.com>
| Cc: Facundo Domínguez <facundo....@tweag.io>; Tim Watson
| <watson....@gmail.com>; Gershom Bazerman <gers...@gmail.com>; well-
| ty...@edsko.net; Neil Mitchell <ndmit...@gmail.com>;
| neelakantan....@gmail.com; parallel...@googlegroups.com;
| dimi...@gmail.com
| Subject: Re: Cloud Haskell, closures, and statics
|

Boespflug, Mathieu

unread,

Dec 7, 2018, 4:21:42 AM12/7/18

to Gershom Bazerman, Facundo Domínguez, Simon Peyton Jones, Tim Watson, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, dimi...@gmail.com, parallel...@googlegroups.com

On Fri, 7 Dec 2018 at 05:33, Gershom B <gers...@gmail.com> wrote:
>
> > But distributed data parallel applications don't need to worry about upgrades. Relative to services, they are short-lived (a few minutes to a few days). If you realize that your compute function has a bug, you just tear down the computation and start from scratch or from the latest checkpoint with a revised function.
>
> This is not the sort of distributed data-parallel applications I have
> worked on, which _are_ ongoing, long-lived services, but _also_ need
> distribution and parallelism. Often one also may have an edsl for the
> computation so that even without binary upgrades you can "distribute"
> new queries or analytics via the edsl. So the upgrade question has
> always been important for me.

I don't understand this use case yet. Does the EDSL in question
feature an open world of functions? Would you want to model those
using a serializable closure abstraction on top of static pointers? If
your DSL only allows you to compose a closed world of primitives (like
SQL does), then to me you're better off with shipping constructors of
an ADT.

If you have long-running services that spawn parallel computations,
would you not prefer to separate the two layers? The use cases I'm
used to are scientific computing workloads (map/reduce, bulk
synchronous programming, etc). In Spark, you have a layer that ships
binaries to all nodes, so that all those nodes can participate in some
parallel processing job. The infrastructure to send/receive/store
binaries is a "long running service". But it doesn't need static
pointers. The parallel processing jobs do, but they're not long
running in the same sense. They could still take weeks to complete,
but if so they'll be checkpointing to disk regularly and can be
restarted with a new binary running from the latest checkpoint.

The crucial point is, in this kind of architecture there is a
well-defined boundary between the "service" parts and the "compute"
part. And the boundary allows for the "compute" part to be written in
a different language than the "service" part. The compute part is
ephemeral, the service part is not. This boundary is even more evident
in job schedulers used in HPC like SLURM. I see where compute can
benefit from static pointers. I don't yet see where that's useful for
the service part.

> Further, it isn't simply the "upgrade" question -- I often
> legitimately want different executables to communicate. For example,
> suppose I have a system with six different types of applications,
> across three different platforms.

OK. But do these need to be separate binaries? If they do, that sounds
like different services that ought to be less tightly coupled than
static pointers would force you to.

Even if you really really wanted separate binaries communicating
static pointers to each other, you can still structure this with a .so
that all binaries share, containing all the static pointer
destinations. This already works today.

Domínguez, Facundo

unread,

Dec 7, 2018, 6:51:06 AM12/7/18

to Simon Peyton Jones, Mathieu Boespflug, Tim Watson, Gershom B, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, parallel...@googlegroups.com, Dimitrios Vytiniotis

> This works fine, until I want to add a service. At that point I change the Service type, and all the clients need to be upgraded simultaneously. (Unless you very carefully design the serialisation protocol so that you can write a data value from a type with 3 constructors and read into a type with 4.)

I think I can see how static pointers can substitute defunctionalization, if they are conveniently wrapped:

data Service b where

Service :: StaticService a b -> a -> Service b

-- StaticService is kept opaque

newtype StaticService a b = StaticService (StaticPtr (a -> IO b))

-- Static services that the server exposes

listUsers :: StaticService () [UserName]

createUser :: StaticService (UserName, Password) (Either Error ())

...

The interface of the server is defined to be the set of all definitions of type StaticService. We rule out composition of static services to limit what the client can do, but this could be still more convenient to maintain than a Service data type that list all the operations.

> The other thing that does not work so well is if you want to send, in 'args', a piece of work to be evaluated on the server, a thunk. Perhaps its result is large, so the client doesn't want to do that work and sent it over the network to the server; perhaps it uses local I/O resources only available at the server. That's quite hard to do with a protocol like the one above.

I don't understand what you mean. How would the use case require the service to be specified as a thunk instead of as data?

Facundo

Tim Watson

unread,

Dec 7, 2018, 7:55:52 AM12/7/18

to Simon Peyton-Jones, Mathieu Boespflug, Facundo Domínguez, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, parallel-haskell, Dimitrios Vytiniotis

I think there are some really important discussion points here...

On Fri, 7 Dec 2018 at 08:46, Simon Peyton Jones <sim...@microsoft.com> wrote:

I'm losing the connection between "the availability of static pointers" and "the use of bare send/receive". No one is arguing for the latter -- of course, powerful but hard-to-use primitives should be wrapped in libraries. But powerful primitives make better libraries possible!

This is very true!

I /think/ you may be saying that "distributed-process-supervisor" (a good library) doesn't need static pointers? It would be interesting to know what it does and how.

I think this is the perfect example for everyone on this thread, because -supervisor *does use static pointers* or at least Closures. I'll explain why in a few moments. I think it's really important that we use a real world example (and one that is in production).

Let me have a guess:

| You both pointed to gen_server as a common pattern in Erlang. If I'm using
| that pattern, is it common to, from the local node, explicitly instantiate
| a gen_server with some handler function on a remote node (and therefore
| perform code shipping)?

In Erlang, yes that is not uncommon at all. And this essentially does use static pointers, but not usually closures. The pattern is that you send a message over the wire (or spawn remotely) a tuple of {Module, Function, [Args]} which is applied dynamically - i.e. the call site is resolved at runtime. This is very closely baked into how a gen_server will be added to (or specified as part of the startup for) an Erlang supervisor. The supervisor specified its children using {M, F, A} both on started and when you send it messages asking it to add children dynamically. Child processed will often be a gen_server.

I'm interested in this "code shipping" point, and I'd like to understand it better.

Supervisor is really the ideal use case to talk about this... Erlang's version is documented here - http://erlang.org/doc/man/supervisor.html - and Cloud Haskell's version is contained (bar the type definitions) in a single module here - https://github.com/haskell-distributed/distributed-process-supervisor/blob/master/src/Control/Distributed/Process/Supervisor.hs

Our Haskell supervisor needs to work with closures, since there is no notional like apply(M,F,A) in Haskell. Our supervisor is a gen_server just like Erlang's - in Cloud Haskell this comes from the distributed-process-client-server library - and it is a long running process that uses closures. We start it off by spawning a process:

start :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process SupervisorPid

start rs ss cs = spawnLocal $ run rs ss cs

run :: RestartStrategy -> ShutdownMode -> [ChildSpec] -> Process ()

run rs ss specs' = MP.pserve (rs, ss, specs') supInit serverDefinition

This spins up the supervisor process and spawns it's children almost identically to Erlang's supervisor's, except that we pass thunks instead of MFA's. In the discussions about refactoring Cloud Haskell, I'm considering doing this all quite differently, but since I was involved in the implementation of RabbitMQ's own-use rewrite of the OTP supervisor, I knew that implementation particularly well, and basically ported it to Cloud Haskell...

None of this code can work without closures, and likewise adding a child dynamically (whilst the supervisor is already running):

addChild :: Addressable a => a -> ChildSpec -> Process AddChildResult

addChild addr spec = Unsafe.call addr $ AddChild False spec

The ChildSpec has to hold a closure so we know what starting a child process actually means, and of course the ChildSpec has to be Serialisable otherwise we are limited to specifying it only at startup (e.g. it cannot be sent over the wire, even locally, to be started dynamically):

-- | Defines the way in which a child process is to be started.

data ChildStart =

RunClosure !(Closure (Process ()))

| CreateHandle !(Closure (SupervisorPid -> Process (ChildPid, Message)))

deriving (Typeable, Generic, Show)

instance Binary ChildStart where

instance NFData ChildStart where

-- | Specification for a child process. The child must be uniquely identified

-- by it's @childKey@ within the supervisor. The supervisor will start the child

-- itself, therefore @childRun@ should contain the child process' implementation

-- e.g., if the child is a long running server, this would be the server /loop/,

-- as with e.g., @ManagedProces.start@.

data ChildSpec = ChildSpec {

childKey :: !ChildKey

, childType :: !ChildType

, childRestart :: !RestartPolicy

, childRestartDelay :: !(Maybe TimeInterval)

, childStop :: !ChildStopPolicy

, childStart :: !ChildStart

, childRegName :: !(Maybe RegisteredName)

} deriving (Typeable, Generic, Show)

instance Binary ChildSpec where

instance NFData ChildSpec where

Now some of you may ask why we cannot just spawn a process normally, and pass the `ProcessId` to the supervisor to look after it - supervision as a simple wrap around monitoring in other words.

This is problematic for lots of reasons, not least the fact the supervisors choose how (and whether or not) to restart its children based on whether they exited normally or with an error, and allow you to specify different restart types (permanent = always, temporary = never, transient = only if it fails, intrinsic = as transient but the supervisor also exits normally). We also have to avoid the possibility of orphaned processes if the supervisor dies but the child doesn't, so the child has to link to the supervisor. We also have to avoid the complex race conditions that could occur if the child proceeds immediately and crashes instantly, before the supervisor has a chance to set up its monitoring infrastructure.

So once we have obtained the child's thunk using `unClosure`, we have to wrap this code up:

wrapClosure :: ChildKey -> Maybe RegisteredName -> Process () -> Process ChildRef

wrapClosure key regName proc = do

supervisor <- getSelfPid

childPid <- spawnLocal $ do

self <- getSelfPid

link supervisor -- die if our parent dies

maybeRegister regName self

() <- expect -- wait for a start signal (pid is still private)

-- we translate `ExitShutdown' into a /normal/ exit

(proc

`catchesExit` [

(\_ m -> handleMessageIf m (\r -> r == ExitShutdown)

(\_ -> return ()))

, (\_ m -> handleMessageIf m (\(ExitOther _) -> True)

(\r -> logExit supervisor self r))

])

`catches` [ Handler $ filterInitFailures supervisor self

, Handler $ logFailure supervisor self ]

void $ monitor childPid

send childPid ()

let cRef = ChildRunning childPid

report $ SupervisedChildStarted supervisor cRef key

return cRef

So, as you can see, it's not really possible to do all of that unless someone sends you a Closure. If we removed the ability to dynamically add children to a supervisor at runtime, we could have the specifications passed to the supervisor's start function hold onto a `Process ()` instead of a `Closure (Process ())`, however it would then be impossible to start a supervisor on a remote node unless the supervisor's startup was encoded completely without passing in the non-serialisable child specs. It is also useful to be able to obtain a ChildRef that you can `send` to, and its `ChildSpec`, for debugging and other administrative purposes. Again unless they're serialisable, this isn't possible.

If I'm a client talking to a server I may want it to perform a variety of services. One way to do that is to communicate on a channel of type
server :: Chan Service

[snip]

Then in the server I'm going to have a case statement to dispatch to the right service code

[snip]

So in effect I have enumerated the functions I can call remotely (the Service type), and I send data, not code pointers, to invoke them. It's very like "de-functionalisation" of higher order programs, in which we replace functions by data, and have a dispatch function.

Simon, I think Joe Armstrong would totally agree with this, and would probably be stringing up people who didn't think shipping code is important. :D

And personally, I do think there is value to being able to ship code (be it bytecode or some means of identifying a target call site, such as a static pointer). However I do concur with Mathieu that this isn't 100% vital to fault tolerance. You *can* design your system so that all the supervisors are local - in fact I haven't even offered the possibility that Erlang gives you, of saying in the ChildSpec that you wish to spawn the child process remotely.... This adds an interesting dimension to the static pointers question: how would I create the code in wrapClosure up above, and then ship that to the remote node, given that this is how the Erlang version would work (it'd either call `apply({Node, Module}, Function, Args)` to run the entry point on `Node`, or it would wrap up the application in a `Fun(Args) -> doStuff() end` and ship that to the remote node for execution). I'm not sure that's even possible in Haskell, since we'd only have the thunk at runtime.

I think this is what you mean when you say "expose the server through a well-defined API"?

This works fine, until I want to add a service. At that point I change the Service type, and all the clients need to be upgraded simultaneously. (Unless you very carefully design the serialisation protocol so that you can write a data value from a type with 3 constructors and read into a type with 4.) With static pointers you can send a static pointer to the service code; and when both client and server have been upgraded it'll work; until then it'll fail gracefully.

Whilst there is a use case for this, I don't think it's quite as simple of shipping a new static pointer... You would've wanted to regression tests your changes against the whole system - Erlang doesn't really advocate shipping `Fun(X) -> blah(X) end` around your cluster. How the hot code upgrades work is that you make changes to your code, compile the modules, and run a system upgrade. What that upgrade is effectively doing is shipping new bytecode for all the changed modules out to all our nodes, then during some execution point in the server loop, the OTP gen_ behaviour gets a message in it's mailbox telling it that we're applying new code as part of a system upgrade. For a gen_server, the gen_ behaviour loop (the bit that controls the mailbox, which is equivalent to the Cloud Haskell code in ...) will call an upgrade function, giving you a chance to do any comparisons and/or work with the old state:

%% leave it unchanged

code_change(OldVersionNumber, State, Extra) -> {ok, State}.

There are numerous other function callback that get executed during this process to make sure state transitions work properly, and the release management process coordinates all of this so that it happens in the right order across the right set of dependencies.

I really don't see how that would work in Haskell. We'd have to have something like hs-plugins and dynamically reload some SO library code, call the old code with the information that we're upgrading, and somehow re-jiggle pointers and such inside the RTS... Also how to coordinate all of that stuff, we'd have to write a framework to walk the DAG of process dependencies (e.g., which forkIO threads depend on which others) and users would have to specify all the relevant configuration data for the upgrade process to work...

I'm sure none of this is impossible, but I guess what I'm saying is that sending static pointers around (or doing static pointer composition, as per Facundo's notes in a subsequent post) isn't the whole story, and that isn't fundamentally how Erlang/OTP does hot code upgrades.

But perhaps these use-cases are not that important in practice. But thinking of it as defunctionalisation is helpful to me.

I think this defunctionalisation is one of the reasons that our code isn't particularly composable, but the more work I'm doing on proposals to address this, the more unsure I am how composable it actually ought to be.

Perhaps the right question is, how does one apply the lambda calculus to a distributed environment...

I'm currently reading http://homepages.inf.ed.ac.uk/wadler/papers/located-lambda/located-lambda.pdf and https://www.cs.cmu.edu/~crary/papers/2004/symmetric-lics.pdf, but I have no idea quite how to reify those ideas yet, or if that's even the right approach.

Cheers,

Tim

Tim Watson

unread,

Dec 7, 2018, 8:12:39 AM12/7/18

to Simon Peyton-Jones, Mathieu Boespflug, Facundo Domínguez, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, parallel-haskell, Dimitrios Vytiniotis

Going leftfield with this thought for a moment...

On Fri, 7 Dec 2018 at 12:55, Tim Watson <watson....@gmail.com> wrote:

How the hot code upgrades work is that you make changes to your code, compile the modules, and run a system upgrade. What that upgrade is effectively doing is shipping new bytecode for all the changed modules out to all our nodes, then during some execution point in the server loop, the OTP gen_ behaviour gets a message in it's mailbox telling it that we're applying new code as part of a system upgrade. For a gen_server, the gen_ behaviour loop (the bit that controls the mailbox, which is equivalent to the Cloud Haskell code in ...) will call an upgrade function, giving you a chance to do any comparisons and/or work with the old state:

So how insane is the concept of introducing an additional stage in the compilation pipeline that ships bytecode, and having JIT compilation support available in the RTS?

Whilst this sounds like a fun thing to hack on GHC with, I'm really asking whether it's a stupid idea or if it has merit, rather than suggesting it's what we should do to solve this particular issue of live server upgrades with zero downtime.

I will point out again that I don't know that many Erlang systems that utilise hot code loading upgrades like this. RabbitMQ certainly doesn't, we considered it far too complex to implement properly. I have worked on an embedded RADIUS/DIAMETER stack in BT that uses it, however (and perhaps this is due to Quvic's QuickCheck brilliant model based testing being employed extensively prior to launch), the system has never been taken down or turned off and has run without any bug reports or performance issues for 9 years so far...

Jeremy Huffman

unread,

Dec 7, 2018, 8:40:35 AM12/7/18

to watson....@gmail.com, Simon Peyton-Jones, Mathieu Boespflug, Facundo Domínguez, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, parallel-haskell, Dimitrios Vytiniotis

Thanks Tim for drawing the connection between MFA's in Erlang and Closures in Cloud Haskell. An MFA of course, is just data in Erlang so nothing fancy is needed to have all the benefits you described for supervisors. So yes, even if Spawn(Node, Fun) is not often a part of communication between nodes in concurrent applications, Erlang applications definitely rely on core components that in Cloud Haskell need to be implemented in Closures.

Also all the talk of code shipping I think misses a more important point to me. In a distributed concurrent system it is important that we can have rolling updates, where some nodes are restarted at different times than others, but it is not necessarily important that we have hot upgrades. Many useful Erlang applications are deployed without that feature.

To me, the biggest barrier to rolling updates in Cloud Haskell is Binary serialization. If I want to extend an existing service so that it returns a new piece of data - I may have consumers of that service that don't need to care about the new field but still have to be upgraded simultaneously with the service in order to continue using it. Whereas if a serialization format such as Protocol Buffers is used, new "optional" data is ignored by old code. In Haskell this comes at a tremendous cost in boilerplate and usability, but I am intrigued by the idea of having pluggable serialization formats so that applications or even individual data types could make these trade-off decisions independently.

--

Tim Watson

unread,

Dec 7, 2018, 9:16:19 AM12/7/18

to Jeremy Huffman, Simon Peyton-Jones, Mathieu Boespflug, Facundo Domínguez, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, parallel-haskell, Dimitrios Vytiniotis

On Fri, 7 Dec 2018 at 13:40, Jeremy Huffman <jer...@jeremyhuffman.com> wrote:

To me, the biggest barrier to rolling updates in Cloud Haskell is Binary serialization. If I want to extend an existing service so that it returns a new piece of data - I may have consumers of that service that don't need to care about the new field but still have to be upgraded simultaneously with the service in order to continue using it. Whereas if a serialization format such as Protocol Buffers is used, new "optional" data is ignored by old code. In Haskell this comes at a tremendous cost in boilerplate and usability, but I am intrigued by the idea of having pluggable serialization formats so that applications or even individual data types could make these trade-off decisions independently.

This is indeed important, and hopefully you will have seen it's one of the key parts of the proposed refactoring of Cloud Haskell: https://github.com/haskell-distributed/distributed-process/wiki/Cloud-Haskell-3.0-Proposal

Cheers,

Tim

Alexander Kjeldaas

unread,

Dec 7, 2018, 10:10:56 AM12/7/18

to Tim Watson, jer...@jeremyhuffman.com, Simon Peyton-Jones, m...@tweag.io, facundo....@tweag.io, Gershom B, well-...@edsko.net, ndmit...@gmail.com, neelakantan....@gmail.com, parallel-haskell, dimi...@gmail.com

I'd like to also point out some counter-points to the idea that "Spark, but in Haskell" is a great idea and that it requires code shipping.

There is one glaring weakness in the Spark model of execution that is solved by separating code shipping from execution. Security is THE big issue with Spark.

Let's say you have a petabyte in your data lake and you give your engineers Spark as the tool to accomplish their tasks. You want to be able to optimize resource usage, so having a cluster to execute your analysis, jobs and what-have-you makes sense. The alternative is to use for example Amazon EMR with one cluster per employee - extremely inefficient resource-wise as you can expect very low cluster utilization.

However, enter GDPR and other laws - you need to be able to document that you don't give out access to data needlessly. Now spark starts to fail, because the computational model is "ship code". It is hard/slow/difficult to create a cluster from scratch, but that's what you need in order to enforce the security requirement that the individual engineers have different access restrictions imposed on them.

Spark has no business enforcing access restrictions on external data. That's re-creating yet another security layer that needs to be managed. Rather, if the creation of a cluster-wide computation by using fresh processes allocated across a cluster using an existing cluster manager is cheap, you will get what you want - secure isolation of jobs.

The combination of not being able to enforce security policies on shared clusters, and not being able to dynamically increase/decrease the cluster size on a real cloud leads to very inefficient use of resources in Spark. What a sorting benchmark looks like in Spark doesn't matter at all compared to all the idle CPU cycles the wrong computational model entails.

You can say that these are orthogonal issues, but the ability to have a good *cluster agent library* and maybe a build system that can easily integrate with this library in order to quickly bootstrap a virtual cluster, scale it up and down based on what the app wants, and ship binaries, is a lot more useful in the real world where security must be managed and audited.

In this kind of setup, shipping code becomes mostly irrelevant as there is always an existing agent that can create a VM, a job, a process or similar, and that's the agent we need to interact with in a seamless manner, not build a competing agent within our own executable that should receive and execute closures.

Alexander

Tim Watson

unread,

Dec 7, 2018, 11:58:04 AM12/7/18

to alexander...@gmail.com, Jeremy Huffman, Simon Peyton-Jones, Mathieu Boespflug, Facundo Domínguez, Gershom Bazerman, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, parallel-haskell, Dimitrios Vytiniotis

Very interesting perspective, Alexander!

At BT we have a huge data lake, and we are also massively impacted by GDPR. It's an issue we've solved through architecture governance and design assurance, rather than by applying specific tactics against our information systems or technology architectures. But still, this is an interesting point that I'd like to address.

Firstly, I think Mathieu has done us a service by asking whether static pointers are important. Like Simon, I think they are, but I think they are /less important/ than highly efficient networking code, GC that works well for a broad range of concurrency styles (or is configurable), and various other things.

On Fri, 7 Dec 2018 at 15:10, Alexander Kjeldaas <alexander...@gmail.com> wrote:

I'd like to also point out some counter-points to the idea that "Spark, but in Haskell" is a great idea and that it requires code shipping.

Perhaps it's not a great idea, and I'm not married to it. Before we address the point about whether or not it requires code shipping, allow me to clarify...

Akka has been hugely successful in the enterprise space, largely because of it's combination of highly efficient non-blocking I/O, a simple concurrency model that can be adapted to models that don't fit actors out of the box, and ability to seamlessly integrate with other infrastructure that sits on or around the JVM. Also great marketing, of course. The JVM bit isn't relevant to us here.

Would I have expected Cloud Haskell to be as successful? No of course not. Not nearly as many Haskell developers out there, and Haskell uptake in the enterprise is still quite small. However... We've had barely any uptake from the Haskell community itself, and those that did pick it up, stepped away after trying to make it work in anger. I don't think that was entirely down to marketting, and notably some of those turned to Spark and found ways to run Haskell inside or alongside Scala/Java.

One thing that people have flagged, repeatedly, is that working with Closures is painful. So either we should take them out, or get it right, imho.

And also, how does this impact on other distribution systems frameworks written in Haskell? It would be nice to slowly move towards API conformance.

There is one glaring weakness in the Spark model of execution that is solved by separating code shipping from execution. Security is THE big issue with Spark.

[snip]

... enter GDPR and other laws - you need to be able to document that you don't give out access to data needlessly. Now spark starts to fail, because the computational model is "ship code". It is hard/slow/difficult to create a cluster from scratch, but that's what you need in order to enforce the security requirement that the individual engineers have different access restrictions imposed on them.

This feels like a rather contrived situation, but I can appreciate what you're saying.

Spark has no business enforcing access restrictions on external data. That's re-creating yet another security layer that needs to be managed. Rather, if the creation of a cluster-wide computation by using fresh processes allocated across a cluster using an existing cluster manager is cheap, you will get what you want - secure isolation of jobs.

I don't disagree and the approach sounds sensible.

The combination of not being able to enforce security policies on shared clusters, and not being able to dynamically increase/decrease the cluster size on a real cloud leads to very inefficient use of resources in Spark. What a sorting benchmark looks like in Spark doesn't matter at all compared to all the idle CPU cycles the wrong computational model entails.

Is this /really/ because spark is shipping code though?

You can say that these are orthogonal issues, but the ability to have a good *cluster agent library* and maybe a build system that can easily integrate with this library in order to quickly bootstrap a virtual cluster, scale it up and down based on what the app wants, and ship binaries, is a lot more useful in the real world where security must be managed and audited.

Well yes, you could just use https://github.com/weaveworks/weave and spin things up in a fresh environment, sizing as you please. Or a million other approaches.

In this kind of setup, shipping code becomes mostly irrelevant as there is always an existing agent that can create a VM, a job, a process or similar, and that's the agent we need to interact with in a seamless manner, not build a competing agent within our own executable that should receive and execute closures.

Well, the end of that for me is, sure let's not build a competitor to Spark. Some other large scale distributed undertaking would be fine. Point is we can't do that at the moment - things come apart both from a code complexity and an infrastructure capability perspective when we try. So I'd like to understand all the moving parts and fix. And as I've mentioned, one thing various people have raised is that it's awkward working with closures.

Perhaps this code shipping business is a red herring, but if it's relatively easy to fix, perhaps we should and then see what people make of it?

Gershom B

unread,

Dec 7, 2018, 12:28:15 PM12/7/18

to Boespflug, Mathieu, Facundo Domínguez, Simon Peyton-Jones, Tim Watson, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, parallel...@googlegroups.com

On Fri, Dec 7, 2018 at 4:21 AM Boespflug, Mathieu <m...@tweag.io> wrote:
>
> I don't understand this use case yet. Does the EDSL in question
> feature an open world of functions? Would you want to model those
> using a serializable closure abstraction on top of static pointers? If
> your DSL only allows you to compose a closed world of primitives (like
> SQL does), then to me you're better off with shipping constructors of
> an ADT.

Right. The point is I want long-lived processes that can accept things
like ADTs or other configurable analytics requests, not short-lived
processes in the style you propose. And at times, I want to distribute
work to them in an ongoing and streaming fashion. There are of course
other architectures to do this, but the nicest pure-haskell way is
through a nice and convenient communications substrate between Haskell
processes where the compiler directly does the work of managing the
"defunctionalized" references to dispatch tables for functions, etc.
directly.

This is _almost_ what static pointers give the basis for, except for
the "same binary" requirement being too strong. (But it isn't really a
requirement, as you point out below, just something _wrongly claimed_
as a requirement, so I suggest we fix that, and improve the
situation.)

> Even if you really really wanted separate binaries communicating
> static pointers to each other, you can still structure this with a .so
> that all binaries share, containing all the static pointer
> destinations. This already works today.

Finally, you write

> OK. But do these need to be separate binaries? If they do, that sounds
> like different services that ought to be less tightly coupled than
> static pointers would force you to.

And indeed, currently, static pointers force coupling now, because of
the "same binary" requirement. But they don't need to! And then they'd
be ideal for this circumstance. So what I'm suggesting is we shake off
the inbuilt "static pointers are not good for X" stuff that's all
induced by thinking of them too rigidly, and think of them as a
general part of a communication substrate with a wide variety of uses.
Otherwise we have a circular argument about all the things they're not
good for, but it is based on the current limitations of the "same
binary" mandate.

I do think there's an important point gestured at here though, which
is actually the reason that even in the nicest version cloud haskell
wouldn't make sense in my work project -- we have a very polyglot
environment, and so necessarily had to pick a single substrate that
can interop between a variety of languages and runtimes. Hence, we
coordinate things through a mix of Kafka and grpc (which itself uses
protos as a serialization layer).

However, if we had more call for haskell <-> haskell communication
only, then a variety of things we currently accomplish with grpc would
certainly be accomplished more easily and pleasantly with a messaging
layer built instead on a static pointer substrate.

-g

Alexander V Vershilov

unread,

Dec 7, 2018, 2:28:20 PM12/7/18

to gers...@gmail.com, Mathieu Boespflug, facundo....@tweag.io, Simon Peyton Jones, Tim Watson, well-...@edsko.net, ndmit...@gmail.com, neelakantan....@gmail.com, dimi...@gmail.com, parallel...@googlegroups.com

Let me jump in a bit.

Theoretically, we don't have the same "executable" restriction, though we
have the same library restrictions. If you compile the library that uses static
pointers and you can ship that to the remote node and load it by any
means then you
can use static pointers between the packages. And all this
functionality is presented
in the modern Haskell we can build object files send them remotely
load and unload them.
This means that is we have a static core and dynamic logic we can have
everything these
days, but that require some additional care.
Really Cloud Haskell is designed in a way that does have a static
core, and messages,
static dictionary and serialization for the code that does not depend
on the recompilation
and a library changes. You can think of Erlang the executable as a
static core for Erlang for
instance.
This means that you can implement hot updates, reload and different
executables as long
as they share the same library.

But let's look a bit more at the hotfix update example, we may have
two indistinguishable cases:
1. user found the bug and wants to ship an run updated version for the symbol;
2. user users wrong library and calls incompatible version of the symbol.
If we will use more stable names for the static pointers, then we can
easily get ourselves into
the trap of the latter case. Current solution disallows to have that,
by generating a fresh name,
each time, this is not very convenient for the user when he wants more
stability in naming.
However, I would argue that allowing the 1st use case is very
application specific and can be
done on the application layer, there are few options possible:
1. run stable pointer resolution first, when nodes are starting to talk
2. use more stable names associated with data or methods and resolve
them locally

I don't think that any of the solutions is strictly better than the
other and they can co-exists.

> --
> You received this message because you are subscribed to the Google Groups "parallel-haskell" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to parallel-haske...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--

Alexander

Tim Watson

unread,

Dec 8, 2018, 6:57:58 AM12/8/18

to Alexander Vershilov, Gershom Bazerman, Mathieu Boespflug, Facundo Domínguez, Simon Peyton-Jones, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, parallel-haskell

On Fri, 7 Dec 2018 at 19:28, Alexander V Vershilov <alexander...@gmail.com> wrote:

Let me jump in a bit.

Please do! :)

Theoretically, we don't have the same "executable" restriction, though we
have the same library restrictions. If you compile the library that uses static
pointers and you can ship that to the remote node and load it by any
means then you can use static pointers between the packages. And all this
functionality is presented in the modern Haskell we can build object files send them remotely

load and unload them. This means that is we have a static core and dynamic logic we can have
everything these days, but that require some additional care.

Yes, looking about it is possible to do any of this stuff if we really wanted to...

Really Cloud Haskell is designed in a way that does have a static
core, and messages, static dictionary and serialization for the code that does not depend
on the recompilation and a library changes. You can think of Erlang the executable as a
static core for Erlang for instance.

I like that distinction.

This means that you can implement hot updates, reload and different
executables as long as they share the same library.

We really need to document that this is possible then, as per Gershom's point. Even recent talks, e.g. the one at the top of this thread by Edsko, mention that this is a risky proposition.

Also, I'd like to comprehensively and correctly document what the risks and outlying potential problems are on the Haskell wiki - since this issue doesn't only affect Cloud Haskell - and then we can discuss mitigations and approaches.

But let's look a bit more at the hotfix update example, we may have
two indistinguishable cases:
1. user found the bug and wants to ship an run updated version for the symbol;
2. user users wrong library and calls incompatible version of the symbol.

I don't understand how we got into the situation in (2). How does one use the wrong library? Isn't this an issue at build time, when the object code is being linked and generated?

If we will use more stable names for the static pointers, then we can easily get ourselves into the trap of the latter case. Current solution disallows to have that, by generating a fresh name, each time, this is not very convenient for the user when he wants more stability in naming.

I don't have any issue with the mechanics of how the current solution works under the covers. What I personally have an issue with is the API and overall what I would call the "developer experience" of using it. Personally I have no major issue using Template Haskell to generate everything, but if someone does (or their work place disallows it), then producing all that stuff is rather complicated. And personally, I find it extremely awkward trying to modify and pull together the right stuff to get the builtins correct, let alone application code.

For example, ages ago I started trying to work in https://github.com/haskell-distributed/distributed-process/issues/103, but I got quickly dumbfounded as to how to produce a built in CP for creating the channel for the reply and passing this in. That was in my earlier days so perhaps if I poke around in it this week, it'll be obvious, but I still think it's rather opaque for newcomers. Perhaps that's not an issue - and also, I can always ask someone to help with CP code I'm struggling with I suppose.

However, I would argue that allowing the 1st use case is very
application specific and can be done on the application layer, there are few options possible:

1. run stable pointer resolution first, when nodes are starting to talk
2. use more stable names associated with data or methods and resolve
them locall

I don't see how (1) would work at the application layer. Wouldn't the node controller have to be involved in that, since it is there that we apply `unClosure` and whatnot...

If I just blindly use a static pointer in today's code base and try to `unStatic`, we will segfault (at best).

I don't think that any of the solutions is strictly better than the
other and they can co-exists.

Co-existing seems viable. Perhaps we should explore what that looks like in view of the proposed refactoring?

Tim Watson

unread,

Dec 8, 2018, 7:09:36 AM12/8/18

to Alexander Vershilov, Gershom Bazerman, Mathieu Boespflug, Facundo Domínguez, Simon Peyton-Jones, well-...@edsko.net, Neil Mitchell, Neelakantan Krishnaswami, Dimitrios Vytiniotis, parallel-haskell

Another thought to add to all this...

On Sat, 8 Dec 2018 at 11:57, Tim Watson <watson....@gmail.com> wrote:

This means that you can implement hot updates, reload and different
executables as long as they share the same library.

We really need to document that this is possible then, as per Gershom's point. Even recent talks, e.g. the one at the top of this thread by Edsko, mention that this is a risky proposition.

Is all of this going to work seamlessly in terms of backpack, too? I really want to start making use of that once stack support is in place. I really missed parameterised modules from OCaml when I first came to Haskell, and I think backpack is a really nice way to tie your dependencies together at build time.

I'd be inclined to consider creating some signatures and using them to inject choice of serialisation library if backpack was fully supported on hackage already, but I suspect for now we'll have to stick with runtime resolution via type classes or some such.

Alexander Kjeldaas

unread,

Dec 8, 2018, 10:08:53 AM12/8/18

to Tim Watson, jer...@jeremyhuffman.com, Simon Peyton-Jones, m...@tweag.io, facundo....@tweag.io, Gershom B, well-...@edsko.net, ndmit...@gmail.com, neelakantan....@gmail.com, parallel-haskell, dimi...@gmail.com

On Fri, Dec 7, 2018 at 5:58 PM Tim Watson <watson....@gmail.com> wrote:

Very interesting perspective, Alexander!

At BT we have a huge data lake, and we are also massively impacted by GDPR. It's an issue we've solved through architecture governance and design assurance, rather than by applying specific tactics against our information systems or technology architectures. But still, this is an interesting point that I'd like to address.

I feel that I might be derailing the discussion, and I am not presenting strong arguments against shipping closures. However you have been discussing the "developer experience", and as a product feature, I don't think it's as important as great integration with orchestration systems. Assuming a great orhestration system is already available opens up new points in the design space, and then it might be possible to innovate instead of imitate.

Btw, another huge issue with Spark is the caching layer. As it doesn't provide proper security, the default mode of operation on Amazon for example is to ship data to and from S3 which is enormously inefficient compared to a secure caching layer. This is another performance hit taken because the architecture (IMO) is wrong.

This feels like a rather contrived situation, but I can appreciate what you're saying.

Some context - my experience is from an organization where there are maybe 40 subsidiaries - wildly varying in size, technical competencies and preferences, and thus very different requirements - and those always changing. There are legally multiple controllers (in a GDPR sense) in this environment, so enforcing compliance through policy and procedures is not an option. In this environment we chose Spark as our workhorse around 4 years back, and we still use it. It's not bad, but what looked very attractive initially turned out to require quite a lot of work to get basic "tenant" security and to optimize resource usage. Also, the turn-around time for ad-hoc use is slow enough to be out-competed by alternatives, basically because it assumes an existing cluster.

Well yes, you could just use https://github.com/weaveworks/weave and spin things up in a fresh environment, sizing as you please. Or a million other approaches.

Yes, and today container orchestration is a part of the stack, just like TCP/IP or VMs. So it's beneficial to take a stand and make this explicit. Unlike when Akka, Spark etc were designed, today we can assume that some set of orchestration APIs similar to what kubernetes provides is available. So for example instead of:
"Given a cluster of servers, our wonderful system can utilize them to the max by .. <magic tech>"

it's more like:

"Given an orchestration system, our wonderful system will automatically express priority, deployment preferences, latency requirements, security config and emit data and code containers that works well with your existing orchestration system using.. <magic tech>"

We can assume a lot more, so we should be able to simplify and simultaneously improve on the spark model. Part of the assumption is that if the system is able to produce a container, then deployment and scaling is trivially done for us. We can assume that we can encapsulate data and have it be distributed along with our code, but also securely shared with other tenants as a caching layer. We can assume that the orchestration layer can atomically switch code for us so we don't need to be binary compatible, or not switch atomically but just ensure that we can discover all components we need. For a REPL use-case (like a Spark REPL), we could think in terms of creating a container, deploying it, and tearing it down for every expression typed into the repl (should be <3s overhead at 99% percentile on kubernetes for example). We could encapsulate data (like RDDs in spark) as separate objects and gain superior caching and security, use POD abstractions and local communication with simpler failure modes (I think this is part of what you suggested).

We can simplify by assuming an orchestration API. Don't distribute code, that's the job of the orchestration system - just produce containers quickly. Don't try to utilize the cluster to the fullest, try to express the intent and priority of the current task and let the external scheduler figure it out. Don't assume any minimum or maximum number of nodes, but make it possible to express desires and handle whatever allocation is given to you. Don't try to handle multiple tenants or security, but make it possible to express / export this info to the orchestrator. For caching, let the orchestration system handle access controls between the caching layer and the executing code.

We can actually assume something more specific than just any orchestration API, since kubernetes won (https://twitter.com/redmonk/status/1000066226482221064). This also simplifies a lot.

This isn't just limited to the spark use-case. If there's a solid framework for this sort of stuff, normal cabal or stack builds should likewise be accellerated using cloud haskell. I do most of my haskell builds on a cloud VM, and I have a spare kubernetes cluster laying around. If I don't, then I can build one in a minute or so in all major clouds. Given the rate of kubernetes offerings being launched, in one more year there won't be anyone not having some spare capacity in their kubernetes cluster that could be useful for their day-to-day builds.

Again, this is a bit orthogonal to code shipping, but I think it's core to the update of cloud haskell.

Alexander

Tim Watson

unread,

Dec 8, 2018, 12:35:58 PM12/8/18

to alexander...@gmail.com, Jeremy Huffman, Simon Peyton Jones, m...@tweag.io, facundo....@tweag.io, Gershom B, well-...@edsko.net, ndmit...@gmail.com, neelakantan....@gmail.com, parallel-haskell, dimi...@gmail.com

On Sat, 8 Dec 2018, 15:08 Alexander Kjeldaas <alexander...@gmail.com wrote:

I feel that I might be derailing the discussion, and I am not presenting strong arguments against shipping closures. However you have been discussing the "developer experience", and as a product feature, I don't think it's as important as great integration with orchestration systems.

Okay, well if we take that on board, then let's discuss what it means to prioritise one thing over the other. Also, I'm going to create a ticket on the distributed-process issue tracker so we can take this particular conversation offline if people feel that's the right thing to do (I personally don't mind either way).

Assuming a great orhestration system is already available opens up new points in the design space, and then it might be possible to innovate instead of imitate.

I would definitely prefer to innovate, where the fundamentals are already in place.

Btw, another huge issue with Spark is the caching layer. As it doesn't provide proper security, the default mode of operation on Amazon for example is to ship data to and from S3 which is enormously inefficient compared to a secure caching layer. This is another performance hit taken because the architecture (IMO) is wrong.

A lot of consultants have complained about this online!

Some context - my experience is from an organization where there are maybe 40 subsidiaries - wildly varying in size, technical competencies and preferences, and thus very different requirements - and those always changing. There are legally multiple controllers (in a GDPR sense) in this environment, so enforcing compliance through policy and procedures is not an option.

From an enterprise architecture perspective, I would answer that governance in that situation cannot not be an option, legally speaking.

We have moved our approach over to using event sourcing to track the progress of every activity within the concept to market process. Using event collaboration and choreography over orchestration allows us to reduce TCO at the point of use, and reify the relationships between interconnected processes and data outside of their systems of record. Performing governance becomes a matter of applying business rules and recording conformance or raising exceptions. From a legal perspective, we have an absolute "paper trail" from design through to operations, and can answer audit at a moments notice.

Thusbis obviously way off topic, but since I archtected the platform, I can't help soap boxing about it. :)

Well yes, you could just use https://github.com/weaveworks/weave and spin things up in a fresh environment, sizing as you please. Or a million other approaches.

Yes, and today container orchestration is a part of the stack, just like TCP/IP or VMs. So it's beneficial to take a stand and make this explicit. Unlike when Akka, Spark etc were designed, today we can assume that some set of orchestration APIs similar to what kubernetes provides is available.

Does this mean that we are to ignore the capability that something like akka offers, though? Because today, cloud haskell sits firmly in that same space as akka, yet whilst play, spark, and varietous other things have been built on top of akka, there is no comparable usage of cloud haskell.

Are you saying we should move firmly up the value chain? And if so, does that mean we abandon primitives likes actors, distributed nodes, and channels? And if so, what do haskell programmers use in their place? Are we going to simply defer to something for messaging and reliability? Because if we do, one could argue that we're copping out. And I can build distributed apps even in python which can work at scale, so why does haskell have nothing to offer in this space? Even OCaml, largely driven by Jane Street's uptake, has a healthy distribured systems stack today...

So for example instead of:
"Given a cluster of servers, our wonderful system can utilize them to the max by .. <magic tech>"
it's more like:
"Given an orchestration system, our wonderful system will automatically express priority, deployment preferences, latency requirements, security config and emit data and code containers that works well with your existing orchestration system using.. <magic tech>"

I absolutely agree this would be useful, and would drive innovation. I'm just not sure if this is where cloud haskell should sit, or if we're talking about some other thing that would be useful.

We can assume a lot more, so we should be able to simplify and simultaneously improve on the spark model.

This much I follow, yes. In particular, if we can build something that has better utility in secured and regulated environments, and in addition has a better resource utilisation profile, then we're definitely on to a winner.

I've noted that today's contenders to Spark's current crown are all streaming based platforms.

Part of the assumption is that if the system is able to produce a container, then deployment and scaling is trivially done for us. We can assume that we can encapsulate data and have it be distributed along with our code, but also securely shared with other tenants as a caching layer. We can assume that the orchestration layer can atomically switch code for us so we don't need to be binary compatible, or not switch atomically but just ensure that we can discover all components we need.

And which bits of that stack require cloud haskell, or to put it another way, what abstract properties does cloud haskell need in order to be useful in realising an architecture such as that?

Also, is that *the* canonical/core use case? I note that in the early days we felt that distributed closures was the canonical use case, and it hasn't been. I'm wary of all our eggs going into one basket, but also aware we need to make some design decisions in order to make progress...

For a REPL use-case (like a Spark REPL), we could think in terms of creating a container, deploying it, and tearing it down for every expression typed into the repl (should be <3s overhead at 99% percentile on kubernetes for example). We could encapsulate data (like RDDs in spark) as separate objects and gain superior caching and security, use POD abstractions and local communication with simpler failure modes (I think this is part of what you suggested).

Yes, that latter part basically seems a good fit with my proposed refactoring. The idea being to separate all the concerns and make each layer depend on APIs only, so you can swap out actors for something else, but still utilise nodes and channels, or vice versa.

We can simplify by assuming an orchestration API. Don't distribute code, that's the job of the orchestration system - just produce containers quickly.

Producing containers quickly has been done elsewhere though... So would we look to provide hooks into other capabilities here? Or offer an API and assume some compatibility layer? Also which APIs do _we_ depend on?

Don't try to utilize the cluster to the fullest, try to express the intent and priority of the current task and let the external scheduler figure it out.

That assumes a lot about what is being built and deployed. It's very task oriented, but I suppose you're saying this in the context of our answer to Spark?

We can actually assume something more specific than just any orchestration API, since kubernetes won (https://twitter.com/redmonk/status/1000066226482221064). This also simplifies a lot.

I would be delighted to have a cloud haskell layer that is tightly integrated with Kubernetes, so long as it was an optional thing not a forced requirement, and so long as we defined an API so other implementations could exist - much like you can choose between simplelocalnet or azure backends today.

This isn't just limited to the spark use-case. If there's a solid framework for this sort of stuff, normal cabal or stack builds should likewise be accellerated using cloud haskell.

Could you explain more about this?

I do most of my haskell builds on a cloud VM, and I have a spare kubernetes cluster laying around. If I don't, then I can build one in a minute or so in all major clouds.

[snip]

Again, this is a bit orthogonal to code shipping, but I think it's core to the update of cloud haskell.

So, what I'm getting from all this, is the desire for a clearer distinction between "distributed haskell" and "cloud haskell", where the latter is focused on making it easier to use haskell in the cloud, and is also the bit that you're mostly interested in?

That sounds like a useful distinction, and one I'd be happy encoding in our architecture. I'd also be happy to work on both parts, and it'd be great to be able to leverage this second part to load test, do Ci tests on, and experiment with all the other bits of the stack.

Does anyone know whether we can easily leverage Kubernetes on something like Google compute for free, as an open source project?

By the way thanks for getting into this discussion Alexander. I know we're meandering (and we *will* get back on topic, I promise), but I think it's useful and important.

I think cloud haskell has a great role to play in socialising how awesome haskell and ghc are for serious development, and as a project it has garnered plenty of attention. We just need to convert that into users! :D

Cheers,

Tim

Tim Watson

unread,

Dec 8, 2018, 1:01:03 PM12/8/18

to parallel-haskell

On Saturday, 8 December 2018 17:35:58 UTC, Tim Watson wrote:

On Sat, 8 Dec 2018, 15:08 Alexander Kjeldaas wrote:
I feel that I might be derailing the discussion, and I am not presenting strong arguments against shipping closures. However you have been discussing the "developer experience", and as a product feature, I don't think it's as important as great integration with orchestration systems.

Okay, well if we take that on board, then let's discuss what it means to prioritise one thing over the other. Also, I'm going to create a ticket on the distributed-process issue tracker so we can take this particular conversation offline if people feel that's the right thing to do (I personally don't mind either way).

Said ticket is here: https://github.com/haskell-distributed/distributed-process/issues/339

I am going to create another, to cover the questions this whole thread has raised more specifically about static pointers.

Boespflug, Mathieu

unread,

Dec 8, 2018, 1:10:22 PM12/8/18

to Gershom Bazerman, Facundo Domínguez, Simon Peyton Jones, Tim Watson, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, dimi...@gmail.com, parallel...@googlegroups.com

On Fri, 7 Dec 2018 at 18:28, Gershom B <gers...@gmail.com> wrote:
>
> [...]

> I want long-lived processes that can accept things
> like ADTs or other configurable analytics requests, not short-lived
> processes in the style you propose. And at times, I want to distribute
> work to them in an ongoing and streaming fashion.

> [...]

>
> I do think there's an important point gestured at here though, which
> is actually the reason that even in the nicest version cloud haskell
> wouldn't make sense in my work project -- we have a very polyglot
> environment, and so necessarily had to pick a single substrate that
> can interop between a variety of languages and runtimes.

Yes, precisely. In my experience this isn't bad luck. It's the common,
and even desirable, reality. In the use case you describe above, I
would prefer a design that doesn't force the same implementation
language everywhere. That makes static pointers a poor fit except
where tight coupling is desirable, like in the definition of a
parallel computation on bulk data. But I understand that others prefer
different architecture styles.

Tim Watson

unread,

Dec 8, 2018, 4:59:37 PM12/8/18

to Mathieu Boespflug, Gershom Bazerman, Facundo Domínguez, Simon Peyton Jones, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, dimi...@gmail.com, parallel...@googlegroups.com

On Sat, 8 Dec 2018, 18:10 Boespflug, Mathieu <m...@tweag.io wrote:>

> I do think there's an important point gestured at here though, which
> is actually the reason that even in the nicest version cloud haskell
> wouldn't make sense in my work project -- we have a very polyglot
> environment, and so necessarily had to pick a single substrate that
> can interop between a variety of languages and runtimes.

Yes, precisely. In my experience this isn't bad luck. It's the common,
and even desirable, reality. In the use case you describe above, I
would prefer a design that doesn't force the same implementation
language everywhere. That makes static pointers a poor fit except
where tight coupling is desirable, like in the definition of a
parallel computation on bulk data. But I understand that others prefer
different architecture styles.

And this is closely related to my aims here, which are to be able to build something that is valuable beyond the haskell community, but also to it.

Let's take RabbitMQ as an example. It's written in Erlang, and therefore showcases what Erlang is capable of. It's also easiest to implement plugins using Erlang, thereby providing useful hooks for other Erlang projects to integrate seamlessly. But the protocol the outside world uses (AMQP) is language independent, and there are clients available for a dozen other languages.

It would be nice to have something that is a big enabler for building tightly integrated concurrent (and non-blocking) systems and/or Distributed Haskell applications, something that is useful for cloud oriented topologies, and I lean towards the idea that these shouldn't necessarily all be the same platform/framework.

Simon Peyton Jones

unread,

Dec 14, 2018, 5:48:41 AM12/14/18

to Tim Watson, Mathieu Boespflug, Gershom Bazerman, Facundo Domínguez, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, dimi...@gmail.com, parallel...@googlegroups.com

Wow – this is a very rich conversation with many sub-strands. I’m delighted about this because somehow I feel that Haskell is well suited to building robust, long-lived distributed systems – but I have literally zero front-line experience of building such systems, so it’s great to see a debate among people who really do know what they are talking about.

I have often had the experience of “lets try doing X in Haskell” that started as just copying X, but because of the new context we ended up inventing really interesting new stuff. Take STM for example; GHC’s STM started a rip-off of ideas in Java, but ended up with ‘retry’ and ‘orElse’ which are a qualitative step forward IMHO. So maybe – not necessarily, but maybe – we might find some additional leverage for parallel or distributed systems by re-imagining them in Haskell.

So Tim’s enterprise of refactoring/reimagining Cloud Haskell is a great one. It’s be great if everyone could join in to improve that design.

And ultimately I there may well be primitives that GHC itself needs to support better, static pointers and closures being the obvious ones. So please do distil your conversations into specific design requests for GHC. The idea is that GHC should implement simple but powerful primitives that lets you build amazing libraries on top.

The conversation has many strands. I wonder if it might be a good service for someone to summarise those separate strands on a wiki page somewhere? I can see:

Better static pointers and closures. (Need to be precise about what “better” means.) As Tim puts it “One thing that people have flagged, repeatedly, is that working with Closures is painful. So either we should take them out, or get it right”

Serialisation that is more robust to change in data type definitions (cf protocol buffers). As one person put it “To me, the biggest barrier to rolling updates in Cloud Haskell is Binary serialization. If I want to extend an existing service so that it returns a new piece of data - I may have consumers of that service that don't need to care about the new field but still have to be upgraded simultaneously with the service in order to continue using it. Whereas if a serialization format such as Protocol Buffers is used, new "optional" data is ignored by old code.”

Something about rolling upgrades – I’m not sure what.
Maybe something about security – again I’m out of my depth.

Neel, Dimitrios and I are (slowly) thinking about the how-to-improve-static-pointers-and-closures piece. One thing that would help us would be small poster-child examples that demonstrate an existing problem: this should be easy, and today it isn’t. Edsko’s Haskell Exchange talk was great in that way, but I don’t want to over-fit to one example.

Thanks

Simon

From: Tim Watson <watson....@gmail.com>
Sent: 08 December 2018 21:59
To: Mathieu Boespflug <m...@tweag.io>
Cc: Gershom Bazerman <gers...@gmail.com>; Facundo Domínguez <facundo....@tweag.io>; Simon Peyton Jones <sim...@microsoft.com>; well-...@edsko.net; Neil Mitchell <ndmit...@gmail.com>; neelakantan....@gmail.com; dimi...@gmail.com; parallel...@googlegroups.com
Subject: Re: Cloud Haskell, closures, and statics

On Sat, 8 Dec 2018, 18:10 Boespflug, Mathieu <m...@tweag.io wrote:>

Simon Marlow

unread,

Dec 17, 2018, 3:39:32 AM12/17/18

to Simon Peyton-Jones, Tim Watson, Mathieu Boespflug, Gershom Bazerman, facundo....@tweag.io, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, dimi...@gmail.com, parallel-haskell

There are a number of reasons why the Cloud Haskell distribution model isn't a great fit when things start to scale up, some of which have been mentioned already but not all:

* Services sometimes need to be written in different languages (use the best tool for the job), so in order for them to talk to the rest of the world we need to use a language-independent RPC protocol. e.g. Google uses Protobufs, Facebook uses Thrift. I've come to realise that these tools are a Big Hammer that solves a lot of problems, albeit at the expense of having to fit your datatypes into an impoverished lowest-common denominator type language with some odd quirks. Incidentally there's a nice Thrift compiler written in Haskell that we at Facebook are hoping to open source soon, with luck within the next few months.

* Upgrades don't happen atomically across the entire fleet, which gives rise to the need for backwards compatibility in the serialization format (mentioned earlier in the thread). Using something like Protobufs or Thrift helps a lot here.

* Individual services need to be separately deployed, and managed by different teams. We might have existing deployment-management tools and software, so having this handled by Cloud Haskell is not an option unless you're starting from scratch.

* Capacity management, monitoring, load-balancing, failover etc. all needs to be done for each service. As things scale up these become big tasks themselves, perhaps with some centralised tooling.

So the Cloud Haskell / Erlang approach of "fork this service over there" ends up being great for small-scale use cases, but runs into problems when you need to fit into a large-scale infrastructure. I think that's why we see individual applications written in Erlang, but not entire multi-service infrastructures.

Cloud Haskell provides some facilities for failover and deployment, but in a real-world large-scale setting you need a lot more. Should we continue to extend Cloud Haskell so it can do more? That's going to help for people with a clean slate who buy into Cloud Haskell to built their infrastructure on, but it doesn't help if you want to fit CH into an existing infrastructure, and even the small-scale use cases will eventually run into the limitations of CH.

What am I getting at here? I'm not sure - Cloud Haskell is kind of squeezed between on the one hand the great local concurrency paradigms that we have in STM and Control.Concurrent(.Async), and on the other hand large-scale but explicitly managed separate services. I personally haven't yet encountered somewhere that I've needed the facilities that Cloud Haskell provides, but I do know that the folks using Erlang at WhatsApp are quite happy with it.

Cheers

Simon

--

Alan & Kim Zimmerman

unread,

Dec 17, 2018, 4:01:42 AM12/17/18

to Simon Marlow, Simon Peyton-Jones, Tim Watson, Mathieu Boespflug, Gershom Bazerman, facundo....@tweag.io, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, dimi...@gmail.com, parallel-haskell

> What am I getting at here? I'm not sure - Cloud Haskell is kind of squeezed between on the one hand the great local concurrency paradigms that we have in STM and Control.Concurrent(.Async), and on the other hand large-scale but explicitly managed separate services.

I was watching the video of Rúnar Bjarnason presenting the Unison language[1] recently, and it struck me that they have a good approach to this.

Essentially every piece of compiled code is kept track of by a hash of the AST (for GHC this would be of Core, I presume), with names removed/standardised.

This means that identical code fragments have identical hashes, always. So you can ship pieces of code around, and if a fragment is not in the current node it can just be retrieved by hash. And you automatically get granularity at the level of use, rather than at the level of say a library. So rolling updates become feasible.

Alan

[1] https://www.youtube.com/watch?v=rp_Eild1aq8

Boespflug, Mathieu

unread,

Dec 17, 2018, 4:25:14 AM12/17/18

to Simon Marlow, Simon Peyton Jones, Tim Watson, Gershom Bazerman, Facundo Domínguez, well-...@edsko.net, Neil Mitchell, neelakantan....@gmail.com, dimi...@gmail.com, parallel...@googlegroups.com

Our experience matches Simon M's. As I said previously in this thread,
we by default make the assumption that Haskell is not the only
language used across the board. Even when not strictly true, it's
still a good assumption to make when dealing with large codebases. Not
to say that this leaves no room for some kind of work on more and
better distributed programming libraries in Haskell (that's a very
broad topic), but these are the reasons why we don't use static
pointers to "ship" services, and only use them in bulk parallel
computations.

--
Mathieu Boespflug
Founder at http://tweag.io.

Reply all

Reply to author

Forward