Friends, esp Edsko, Tim, Facundo, Neil, Mathieu
Neel, Dimitrios, and I have started to think a bit more seriously about how static values and closures should work in Cloud Haskell and similar.
The distributed-closures library is a major inspiration.
Also relevant are various wiki pages: StaticPointers, and older version (?), and a bit about polymorphism.
We are very interested in things like
· What are the primary use-cases?
· Where does the shoe pinch? What is awkward or impossible?
· Any documents, blog posts, we should read?
Thanks!
Simon
--
You received this message because you are subscribed to the Google Groups "parallel-haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to parallel-haske...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Great talk Edsko! That does indeed help a lot. Thank you Alp.
Neel, Dimitrios: watch the talk!
Simon
Hi Simon,
The general drift in our experience over time was that allowing
arbitrary closures to be serialised, while convenient, exposes API's
which aren't very stable and have too much power. That's fine for a
certain subset of problems, but for what we were using serialising
closures for, it started to get problematic and we moved to explicit
endpoints with real APIs. Perhaps that's a viewpoint unique to the
domain I was working in at the time, or perhaps that suggests that an
incremental path to/from serialised closures would help. I appreciate
that is probably of no use to how to evolve static pointers!
Thanks, Neil
Most get by (just fine?) with API endpoints as Neil suggests
Can you just say a bit more about what an “API endpoint” means, concretely?
I believe it means something like:
Sorry to be so ignorant. A short cut might be to point to a canonical concrete example.
A big advantage of this approach is that it is fully language-independent: since it specifies the on-the-wire format, the client and server can be written in different languages.
On the other hand, Erlang has a sophisticated failure model that allows a complex distributed system to recover robustly. It may be an outlier but I think Scala’s version (Akka) is pretty successful too. I have no idea of the tradeoff between the additional power of Erlang’s model vs the simplicity of a lowest-common-denominator. Someone must have written papers or blog posts about this!
Thanks
Simon
From: Boespflug, Mathieu <m...@tweag.io>
Sent: 26 November 2018 14:53
To: Tim Watson <watson....@gmail.com>
Cc: Neil Mitchell <ndmit...@gmail.com>; Simon Peyton Jones <sim...@microsoft.com>; parallel...@googlegroups.com; Facundo Domínguez <facundo....@tweag.io>; well-...@edsko.net; dimi...@gmail.com; neelakantan....@gmail.com
Subject: Re: Cloud Haskell, closures, and statics
And a viewpoint held pretty consistently across many sections of the industry. To be honest, Erlang is a bit of an outlier here. It's practically the only framework widely used across hundreds of companies in the industry that allows for shippable closures. Most get by (just fine?) with API endpoints as Neil suggests, provided there are good strategies in place to locate these endpoints and adequately describe the expected format of their inputs and outputs.
--
--
On the “same binary” question, here’s what I think:
· There should be no requirement to be running the same binary.
· When sending a static pointer, the implementation must send some bit-pattern on the wire; let’s call that the “static key”.
· We should assume the possibility of a “man in the middle” attack; so when receiving a static key you cannot assume that it is valid. The receiving end should never crash, regardless of how badly mangled the static keys are.
· It follows that
It may de-serialise free variables etc from the message, using R’s own types. If the sender S was using different types, chances are the bit-pattern won’t parse, and we’ll fail during deserialization (but not crash).
It’s possible that S will serialise some value (x :: T) into a bitstring, that R successfully parses as some totally different data type (y :: T’). Then again, no seg-fault. If you are worried about this we could include some additional integrity information, such as the fingerprint of T. But the no-crash guarantee does not depend on this.
· The static key could be:
I’m sure that more variations are possible. The key thing is that there exist variants that are robust to recompilation. And indeed, the client and server could be running on different machine architectures!
I’m not quite sure what is implemented right now. It’d be good to decide what we want, and write down the plan, and then implement it.
Simon
From: Tim Watson <watson....@gmail.com>
Sent: 03 December 2018 00:49
To: Gershom Bazerman <gers...@gmail.com>
Cc: well-...@edsko.net; Neil Mitchell <ndmit...@gmail.com>; Neelakantan Krishnaswami <neelakantan....@gmail.com>; Simon Peyton Jones <sim...@microsoft.com>; Dimitrios Vytiniotis <dimi...@gmail.com>; Facundo Domínguez <facundo....@tweag.io>;
Mathieu Boespflug <m...@tweag.io>; parallel...@googlegroups.com
Subject: Re: Cloud Haskell, closures, and statics
That would be very good news indeed!
This is almost the current approach. We use a unit id instead of a package name.
Great
mkStaticPtrFingerprint :: Int -> Fingerprint
What is ‘n’?
Simon
From: Domínguez, Facundo <facundo....@tweag.io>
Sent: 06 December 2018 13:38
To: Simon Peyton Jones <sim...@microsoft.com>
Ah but that’s exactly what we /don’t/ want, since it’s vulnerable to recompilation wibbles.
Surely the name and type are sufficient? For reasons discussed earlier in the thread.
PS: How is the fingerprint of the type included? (I agree it’s not strictly necessary.)
For the reasons Neil mentioned upthread, static pointers are problematic for the very use case it was initially designed for (interacting services - let's call that distributed concurrency).
I think you mean this:
| The general drift in our experience over time was that allowing
| arbitrary closures to be serialised, while convenient, exposes API's
| which aren't very stable and have too much power. That's fine for a
| certain subset of problems, but for what we were using serialising
| closures for, it started to get problematic and we moved to explicit
| endpoints with real APIs. Perhaps that's a viewpoint unique to the
| domain I was working in at the time, or perhaps that suggests that an
| incremental path to/from serialised closures would help. I appreciate
| that is probably of no use to how to evolve static pointers!
I didn’t really understand this. Erlang seems to make very effective use of distribution to make long-lived, highly-reliable distributed services. So I’m not ready to write off this class of applications. Are you sure we should?
Simon
I'm losing the connection between "the availability of static pointers" and "the use of bare send/receive". No one is arguing for the latter -- of course, powerful but hard-to-use primitives should be wrapped in libraries. But powerful primitives make better libraries possible!
I /think/ you may be saying that "distributed-process-supervisor" (a good library) doesn't need static pointers? It would be interesting to know what it does and how.
Let me have a guess:
| You both pointed to gen_server as a common pattern in Erlang. If I'm using
| that pattern, is it common to, from the local node, explicitly instantiate
| a gen_server with some handler function on a remote node (and therefore
| perform code shipping)?
I'm interested in this "code shipping" point, and I'd like to understand it better.
If I'm a client talking to a server I may want it to perform a variety of services. One way to do that is to communicate on a channel of type
server :: Chan Service
Then in the server I'm going to have a case statement to dispatch to the right service code
So in effect I have enumerated the functions I can call remotely (the Service type), and I send data, not code pointers, to invoke them. It's very like "de-functionalisation" of higher order programs, in which we replace functions by data, and have a dispatch function.
I think this is what you mean when you say "expose the server through a well-defined API"?
This works fine, until I want to add a service. At that point I change the Service type, and all the clients need to be upgraded simultaneously. (Unless you very carefully design the serialisation protocol so that you can write a data value from a type with 3 constructors and read into a type with 4.) With static pointers you can send a static pointer to the service code; and when both client and server have been upgraded it'll work; until then it'll fail gracefully.
But perhaps these use-cases are not that important in practice. But thinking of it as defunctionalisation is helpful to me.
How the hot code upgrades work is that you make changes to your code, compile the modules, and run a system upgrade. What that upgrade is effectively doing is shipping new bytecode for all the changed modules out to all our nodes, then during some execution point in the server loop, the OTP gen_ behaviour gets a message in it's mailbox telling it that we're applying new code as part of a system upgrade. For a gen_server, the gen_ behaviour loop (the bit that controls the mailbox, which is equivalent to the Cloud Haskell code in ...) will call an upgrade function, giving you a chance to do any comparisons and/or work with the old state:
--
To me, the biggest barrier to rolling updates in Cloud Haskell is Binary serialization. If I want to extend an existing service so that it returns a new piece of data - I may have consumers of that service that don't need to care about the new field but still have to be upgraded simultaneously with the service in order to continue using it. Whereas if a serialization format such as Protocol Buffers is used, new "optional" data is ignored by old code. In Haskell this comes at a tremendous cost in boilerplate and usability, but I am intrigued by the idea of having pluggable serialization formats so that applications or even individual data types could make these trade-off decisions independently.
I'd like to also point out some counter-points to the idea that "Spark, but in Haskell" is a great idea and that it requires code shipping.
There is one glaring weakness in the Spark model of execution that is solved by separating code shipping from execution. Security is THE big issue with Spark.
... enter GDPR and other laws - you need to be able to document that you don't give out access to data needlessly. Now spark starts to fail, because the computational model is "ship code". It is hard/slow/difficult to create a cluster from scratch, but that's what you need in order to enforce the security requirement that the individual engineers have different access restrictions imposed on them.
Spark has no business enforcing access restrictions on external data. That's re-creating yet another security layer that needs to be managed. Rather, if the creation of a cluster-wide computation by using fresh processes allocated across a cluster using an existing cluster manager is cheap, you will get what you want - secure isolation of jobs.
The combination of not being able to enforce security policies on shared clusters, and not being able to dynamically increase/decrease the cluster size on a real cloud leads to very inefficient use of resources in Spark. What a sorting benchmark looks like in Spark doesn't matter at all compared to all the idle CPU cycles the wrong computational model entails.
You can say that these are orthogonal issues, but the ability to have a good *cluster agent library* and maybe a build system that can easily integrate with this library in order to quickly bootstrap a virtual cluster, scale it up and down based on what the app wants, and ship binaries, is a lot more useful in the real world where security must be managed and audited.
In this kind of setup, shipping code becomes mostly irrelevant as there is always an existing agent that can create a VM, a job, a process or similar, and that's the agent we need to interact with in a seamless manner, not build a competing agent within our own executable that should receive and execute closures.
Let me jump in a bit.
Theoretically, we don't have the same "executable" restriction, though we
have the same library restrictions. If you compile the library that uses static
pointers and you can ship that to the remote node and load it by any
means then you can use static pointers between the packages. And all this
functionality is presented in the modern Haskell we can build object files send them remotely
load and unload them. This means that is we have a static core and dynamic logic we can have
everything these days, but that require some additional care.
Really Cloud Haskell is designed in a way that does have a static
core, and messages, static dictionary and serialization for the code that does not depend
on the recompilation and a library changes. You can think of Erlang the executable as a
static core for Erlang for instance.
This means that you can implement hot updates, reload and different
executables as long as they share the same library.
But let's look a bit more at the hotfix update example, we may have
two indistinguishable cases:
1. user found the bug and wants to ship an run updated version for the symbol;
2. user users wrong library and calls incompatible version of the symbol.
If we will use more stable names for the static pointers, then we can easily get ourselves into the trap of the latter case. Current solution disallows to have that, by generating a fresh name, each time, this is not very convenient for the user when he wants more stability in naming.
However, I would argue that allowing the 1st use case is very
application specific and can be done on the application layer, there are few options possible:
1. run stable pointer resolution first, when nodes are starting to talk
2. use more stable names associated with data or methods and resolve
them locall
I don't think that any of the solutions is strictly better than the
other and they can co-exists.
This means that you can implement hot updates, reload and different
executables as long as they share the same library.We really need to document that this is possible then, as per Gershom's point. Even recent talks, e.g. the one at the top of this thread by Edsko, mention that this is a risky proposition.
Very interesting perspective, Alexander!At BT we have a huge data lake, and we are also massively impacted by GDPR. It's an issue we've solved through architecture governance and design assurance, rather than by applying specific tactics against our information systems or technology architectures. But still, this is an interesting point that I'd like to address.
This feels like a rather contrived situation, but I can appreciate what you're saying.
Well yes, you could just use https://github.com/weaveworks/weave and spin things up in a fresh environment, sizing as you please. Or a million other approaches.
I feel that I might be derailing the discussion, and I am not presenting strong arguments against shipping closures. However you have been discussing the "developer experience", and as a product feature, I don't think it's as important as great integration with orchestration systems.
Assuming a great orhestration system is already available opens up new points in the design space, and then it might be possible to innovate instead of imitate.
Btw, another huge issue with Spark is the caching layer. As it doesn't provide proper security, the default mode of operation on Amazon for example is to ship data to and from S3 which is enormously inefficient compared to a secure caching layer. This is another performance hit taken because the architecture (IMO) is wrong.
Some context - my experience is from an organization where there are maybe 40 subsidiaries - wildly varying in size, technical competencies and preferences, and thus very different requirements - and those always changing. There are legally multiple controllers (in a GDPR sense) in this environment, so enforcing compliance through policy and procedures is not an option.
Well yes, you could just use https://github.com/weaveworks/weave and spin things up in a fresh environment, sizing as you please. Or a million other approaches.Yes, and today container orchestration is a part of the stack, just like TCP/IP or VMs. So it's beneficial to take a stand and make this explicit. Unlike when Akka, Spark etc were designed, today we can assume that some set of orchestration APIs similar to what kubernetes provides is available.
So for example instead of:
"Given a cluster of servers, our wonderful system can utilize them to the max by .. <magic tech>"it's more like:"Given an orchestration system, our wonderful system will automatically express priority, deployment preferences, latency requirements, security config and emit data and code containers that works well with your existing orchestration system using.. <magic tech>"
We can assume a lot more, so we should be able to simplify and simultaneously improve on the spark model.
Part of the assumption is that if the system is able to produce a container, then deployment and scaling is trivially done for us. We can assume that we can encapsulate data and have it be distributed along with our code, but also securely shared with other tenants as a caching layer. We can assume that the orchestration layer can atomically switch code for us so we don't need to be binary compatible, or not switch atomically but just ensure that we can discover all components we need.
For a REPL use-case (like a Spark REPL), we could think in terms of creating a container, deploying it, and tearing it down for every expression typed into the repl (should be <3s overhead at 99% percentile on kubernetes for example). We could encapsulate data (like RDDs in spark) as separate objects and gain superior caching and security, use POD abstractions and local communication with simpler failure modes (I think this is part of what you suggested).
We can simplify by assuming an orchestration API. Don't distribute code, that's the job of the orchestration system - just produce containers quickly.
Don't try to utilize the cluster to the fullest, try to express the intent and priority of the current task and let the external scheduler figure it out.
We can actually assume something more specific than just any orchestration API, since kubernetes won (https://twitter.com/redmonk/status/1000066226482221064). This also simplifies a lot.
This isn't just limited to the spark use-case. If there's a solid framework for this sort of stuff, normal cabal or stack builds should likewise be accellerated using cloud haskell.
I do most of my haskell builds on a cloud VM, and I have a spare kubernetes cluster laying around. If I don't, then I can build one in a minute or so in all major clouds.
Again, this is a bit orthogonal to code shipping, but I think it's core to the update of cloud haskell.
On Sat, 8 Dec 2018, 15:08 Alexander Kjeldaas wrote:I feel that I might be derailing the discussion, and I am not presenting strong arguments against shipping closures. However you have been discussing the "developer experience", and as a product feature, I don't think it's as important as great integration with orchestration systems.Okay, well if we take that on board, then let's discuss what it means to prioritise one thing over the other. Also, I'm going to create a ticket on the distributed-process issue tracker so we can take this particular conversation offline if people feel that's the right thing to do (I personally don't mind either way).
> I do think there's an important point gestured at here though, which
> is actually the reason that even in the nicest version cloud haskell
> wouldn't make sense in my work project -- we have a very polyglot
> environment, and so necessarily had to pick a single substrate that
> can interop between a variety of languages and runtimes.
Yes, precisely. In my experience this isn't bad luck. It's the common,
and even desirable, reality. In the use case you describe above, I
would prefer a design that doesn't force the same implementation
language everywhere. That makes static pointers a poor fit except
where tight coupling is desirable, like in the definition of a
parallel computation on bulk data. But I understand that others prefer
different architecture styles.
Wow – this is a very rich conversation with many sub-strands. I’m delighted about this because somehow I feel that Haskell is well suited to building robust, long-lived distributed systems – but I have literally zero front-line experience of building such systems, so it’s great to see a debate among people who really do know what they are talking about.
I have often had the experience of “lets try doing X in Haskell” that started as just copying X, but because of the new context we ended up inventing really interesting new stuff. Take STM for example; GHC’s STM started a rip-off of ideas in Java, but ended up with ‘retry’ and ‘orElse’ which are a qualitative step forward IMHO. So maybe – not necessarily, but maybe – we might find some additional leverage for parallel or distributed systems by re-imagining them in Haskell.
So Tim’s enterprise of refactoring/reimagining Cloud Haskell is a great one. It’s be great if everyone could join in to improve that design.
And ultimately I there may well be primitives that GHC itself needs to support better, static pointers and closures being the obvious ones. So please do distil your conversations into specific design requests for GHC. The idea is that GHC should implement simple but powerful primitives that lets you build amazing libraries on top.
The conversation has many strands. I wonder if it might be a good service for someone to summarise those separate strands on a wiki page somewhere? I can see:
Neel, Dimitrios and I are (slowly) thinking about the how-to-improve-static-pointers-and-closures piece. One thing that would help us would be small poster-child examples that demonstrate an existing problem: this should be easy, and today it isn’t. Edsko’s Haskell Exchange talk was great in that way, but I don’t want to over-fit to one example.
Thanks
Simon
From: Tim Watson <watson....@gmail.com>
Sent: 08 December 2018 21:59
To: Mathieu Boespflug <m...@tweag.io>
Cc: Gershom Bazerman <gers...@gmail.com>; Facundo Domínguez <facundo....@tweag.io>; Simon Peyton Jones <sim...@microsoft.com>; well-...@edsko.net; Neil Mitchell <ndmit...@gmail.com>; neelakantan....@gmail.com; dimi...@gmail.com;
parallel...@googlegroups.com
Subject: Re: Cloud Haskell, closures, and statics
On Sat, 8 Dec 2018, 18:10 Boespflug, Mathieu <m...@tweag.io wrote:>
--