1. Actors are utterly based on side-effects. When we say that
"everything is asynchronous", keep in mind that implies every
operation has Unit return type. If operations return no information,
then they can only act via side-effects. We've thrown the central
tenet of functional programming out of the window. Do we get something
as powerful in compensation? I can't see it.
(Further material
[http://pchiusano.blogspot.com.au/2010/01/actors-are-not-good-concurrency-model.html],
[http://nescala.org/2011/#mealy])
2. Distributed systems are Actor systems, and they are a not easy to
build, operate and (especially) diagnose. Many multi-node distributed
enterprise systems, such as Ive worked on at REA, Goldman Sachs JB
Were, Sensis and Intamission, conform the to Actor model. No memory is
shared, the nodes communicate via message passing. Are they easy to
build or support? No. (So much so that Martin Fowler coined his "First
Law of Distributed Systems": Just don't!) Are they free of deadlocks
and race conditions? No. Do they scale easily? Not especially. Do they
"compose" easily? In my experience, rather, most distributed systems
"decompose" easily.
3. Actor people talk techno-babble. In most actor video/slidedecks
I've seen, the emphasis is on low level technical details of actors:
thread pooling, message delivery, supervisors, failover. Carl Hewitt's
talk is great example. But where is a coherent big-picture? Simply,
how do I map my business problem onto Actors?
4. Hype of Actors, eg the mythical 9 nines.There's this claim
circulating the internet about how Erlang achieved 99.9999999% (9 9s)
uptime via Actor based systems. Anyone heard that one? Eg, it made an
appearance at YOW last December, or here
[http://twitter.com/#!/JelleVictoor/status/137115913790038016].
Here's Joe Armstrong's response (creator of Erlang), when asked about
it in an SERadio interview
[http://www.se-radio.net/2008/03/episode-89-joe-armstrong-on-erlang/]:
"That figure of 9 9s is kind of bloated around on alot of blogs...it
doesn't represent an average behaviour, it represents a best case
behaviour, that was observed once, and British Telecom reported it to
us.....Was it a systematic study? No, it's more apocryphal than hard
science"
Apocryphal: "of doubtful authorship or authenticity".
Maybe some hands-on time with Akka will change my mind...
-Ben
PS For perspective, there are plenty of software movements that seem
promising, but in the fullness of time don't have the impact they
expected:
- Aspect oriented programming. IIRC, AOP was included in MIT 1997
Technology Review's "Top 10 Technologies that will change the world".
- Tuplespaces and Javaspaces: From the same period, there was a
plausible sounding book about how Javaspaces were the answer to
distributed systems
[http://java.sun.com/developer/Books/JavaSpaces/introduction.html].
There are quite a lot of "Actor skeptics" around the place -- just have
to ask them. To me, it's a rather mundane observation. I wonder if that
is why Paul has bothered to speak up.
--
Tony Morris
http://tmorris.net/
On 2012-04-15 10:21 , Ben Hutchison wrote:
> 1. Actors are utterly based on side-effects. When we say that
> "everything is asynchronous", keep in mind that implies every
> operation has Unit return type. If operations return no information,
> then they can only act via side-effects. We've thrown the central
> tenet of functional programming out of the window. Do we get something
> as powerful in compensation? I can't see it.
In distributed systems, one fact-of-life that can't be ignored is
"partial network failure" or "network partition".
If you have two (or more) machines that are cooperating to perform some
computation, then the situation where both machines are still operating
and have become disconnected for some unpredictable period (perhaps
forever) ... needs to be dealt with.
I presume this is analogous to a function being called and *never*
returning. I don't mean ... returns with some type like Unit, but it
truly never returns. I'd be interested to hear from knowledgeable
functional programmers what is an appropriate (i.e correct) solution in
that case.
For all the flaws attributed to the Actor model (some of which were just
straw man arguments) ... the Actor model does provide an approach for
that important distributed systems case. Especially, when combined with
"leasing" (typically for robust distributed garbage collection).
- - - - - - - - - - - - - - - - -
At this point in time, perhaps there isn't just one known model of
computation that is always superior in all possible situations.
For the time being (until there is some successful grand programming
unification theory) ... then some awkward, uneasy combination of
computational models will be required when dealing with distributed
systems. Somewhat like the dilemma facing physicists who have to deal
with quantum mechanics and gravity ... whilst still continuing to chase
their holy grail of unification.
- - - - - - - - - - - - - - - - -
At the risk of being sternly re-educated (hi Tony !) ... is it safe to
suggest that a central tenet of functional programming is not the
"elimination of side-effects", but rather a formal, strict and clear
"management of side-effects" ?
After all, an application that has zero side-effects ... has achieved
nothing-at-all (once it halts).
So ... is there some appropriate functional programming approach to
dealing with the types of side-effects, due to network partition, which
must not be ignored in distributed systems ?
- - - - - - - - - - - - - - - - -
If you only have two choices: asynchronous or synchronous ... then, the
only known robust way to deal with the network partition problem is the
asynchronous approach.
If you also wish to maintain a predominately functional programming
approach in a distributed system, then I believe that you will be forced
to consider some mixture of asynchronous and synchronous design. Of
course, that is a huge design mismatch ... and a considerable source of
errors lying-in-wait.
I'd suggest that if you consider the end-to-end control path in such a
mixed approach ... then it is critical to have synchronous design either
at the edges of your network ...
Note: The symbol "<-->" indicates some form of communication between two
disparate machines on your network ... and the symbol "|" indicates some
form of communication within the same process on the same machine.
S|A <--> A <--> A <--> A|S
... or synchronous design encapsulated by asynchronous network protocols ...
A <--> A|S|A <--> A
... where the synchronous communications must never span two different
machines (or even processes within the same machine).
- - - - - - - - - - - - - - - - -
In summary, I'd absolutely recommend that "everything over the network
is asynchronous".
Which is a loose way of saying that "every communication between
divisible computational engines must be asynchronous" ... or ... "only
communication within an indivisible computational engine may be
synchronous".
Where typically the "indivisible computational engine" is an operating
system process, e.g JVM.
--
-O- cheers = /\ /\/ /) `/ =
--O -- http://www.geekscape.org --
OOO -- an...@geekscape.org -- http://twitter.com/geekscape --
This is sometimes called a coprogram -- the dual of a program (which
does return).
> For all the flaws attributed to the Actor model (some of which were just
> straw man arguments) ... the Actor model does provide an approach for
> that important distributed systems case. Especially, when combined with
> "leasing" (typically for robust distributed garbage collection).
>
> - - - - - - - - - - - - - - - - -
>
> At this point in time, perhaps there isn't just one known model of
> computation that is always superior in all possible situations.
There are models that are strictly superior to others though -- the
actor model as it stands is vastly inferior.
> For the time being (until there is some successful grand programming
> unification theory) ... then some awkward, uneasy combination of
> computational models will be required when dealing with distributed
> systems. Somewhat like the dilemma facing physicists who have to deal
> with quantum mechanics and gravity ... whilst still continuing to chase
> their holy grail of unification.
>
> - - - - - - - - - - - - - - - - -
>
> At the risk of being sternly re-educated (hi Tony !) ... is it safe to
> suggest that a central tenet of functional programming is not the
> "elimination of side-effects", but rather a formal, strict and clear
> "management of side-effects" ?
Sure, if you ask for it.
"Management of side-effects" in such a way where "management" is a very
rigorous discipline and not any handwavy notion.
> After all, an application that has zero side-effects ... has achieved
> nothing-at-all (once it halts).
False. Here is a program with no side-effects and achieves something:
main = putStrLn "hi"
Understanding this undeniable, truly demonstrably true fact is really
where the "re-education" opportunity is. Just requires a lot of
adjustment of the "fundamentals."
--
Tony Morris
http://tmorris.net/
On 2012-04-17 12:52 , Tony Morris wrote:
> There are models that are strictly superior to others though -- the
> actor model as it stands is vastly inferior.
For the domain of distributed systems, would you mind suggesting a
superior model to the actor model ?
> Management of side-effects" in such a way where "management" is a very
> rigorous discipline and not any handwavy notion.
Agreed. That is definitely what I meant by "formal and strict".
Andy wrote:
>> After all, an application that has zero side-effects ... has achieved
>> nothing-at-all (once it halts).
Tony wrote:
>> False. Here is a program with no side-effects and achieves something:
>> main = putStrLn "hi"
When that Haskell program is compiled and run ... it results in
characters "hi" being sent to the terminal console. Why isn't that a
side-effect ... given that a state change (of the console) occurred
outside of any value returned by a function ?
FYI. This is interesting ... on Mac OS X, using "ghc --make test.hs",
to compile the Haskell source into an executable resulted in it being
756,484 bytes long. The traditional C "hello world" executable that
achieved a similar effect was 8,688 bytes long.
FYI. This is interesting ... on Mac OS X, using "ghc --make test.hs",
to compile the Haskell source into an executable resulted in it being
756,484 bytes long. The traditional C "hello world" executable that
achieved a similar effect was 8,688 bytes long.
>> For all the flaws attributed to the Actor model (some of which were just
>> straw man arguments) ... the Actor model does provide an approach for
>> that important distributed systems case. Especially, when combined with
>> "leasing" (typically for robust distributed garbage collection).
>>
>> - - - - - - - - - - - - - - - - -
>>
>> At this point in time, perhaps there isn't just one known model of
>> computation that is always superior in all possible situations.
>
> There are models that are strictly superior to others though -- the
> actor model as it stands is vastly inferior.
In the case of distributed systems where node/network failures are
unavoidable, I feel asynchronous designs - hence the actors model -
are more suitable than fully synchronous ones.
And when you look at these as event driven systems, asynchronous
designs start to feel more natural (subjective, I admit), even if they
could be more complex to implement.
What models do you have in mind that are superior?
Not sure if it's strictly a 'model' but the only other design I can
think of would be based on pseudo-synchronous calls that time out.
>> After all, an application that has zero side-effects ... has achieved
>> nothing-at-all (once it halts).
>
> False. Here is a program with no side-effects and achieves something:
> main = putStrLn "hi"
I think this would depend on your definition of side-effects.
For most people, this causes something to happen, thus has a side-effect.
Perhaps it's being (un?)able to explicitly state that a function has a
non return-value-based side-effect that's the issue here?
my 2c
King
The best way to start answering these hard questions, is to fix the easy
ones. See below.
>
>>> After all, an application that has zero side-effects ... has achieved
>>> nothing-at-all (once it halts).
>> False. Here is a program with no side-effects and achieves something:
>> main = putStrLn "hi"
> I think this would depend on your definition of side-effects.
Of course it does. Thankfully, there is a very rigorous, well-defined,
undeniable definition of "side-effect."
> For most people, this causes something to happen, thus has a side-effect.
Luckily, this is not true. If I take the subset of programmers that I
know who also know what side-effect means ("most" of them, perhaps a
hundred or so), then *none* of them believe the aforementioned program
(Z) is side-effecting. Not just not most, but *none*. This might seem
quite remarkable, except of course, the observation is subject to our
selection bias.
So let's put it like this: there does not exist a person on this planet
who knows what side-effect means and also agrees that Z is
side-effecting. This is my personal challenge to anyone -- find one
person who knows what side-effect means who also agrees that P
side-effects; I put it to you that you will fail in this endeavour. Now,
this might be all confrontational, but it's not intended to be -- it's
to provoke the possible introspective thought of, "gee, maybe I am
really wrong and there is some principled thought to be applied here",
the challenge of teaching.
Even if "most people" did believe something, this doesn't make it true
or even hint that it is true. Thankfully, neither the premise (many
people believe proposition Q) or the conclusion (Q is true) are true.
Both are false.
The obvious question is then, what does side-effect mean?
It means that I can take an expression and substitute it with its value
and observe no program outcome change for any given program.
You will probably agree with this:
Let program P1 be:
p1 = let x = [()] in x ++ x
Let program P2 be:
p2 = [()] ++ [()]
There is no observable difference between the outcomes of programs P1
and P2. All I did here was replace the expression [()] with its value x,
into a program (y) and I observed no difference. Does there exist a
program for which there is an observable difference? You're free to
fiddle program P2 all you like and the moment you come with an answer
yes, or you prove that you cannot come up with such a program, is
exactly the moment you have answered the question, "is the program (x)
side-effecting?", with the precise same answer.
Now I just gave you a mundane example, using list values and what-not.
In my experience, people who have been trained in languages that force
you inside "the IO interpreter", also have their mental models trapped
inside the same environment. For some reason, enormously
disproportionate bias perhaps, IO is given special attention. Notice how
you thought that inside "the [] interpreter", you thought the example
was mundane and uninteresting and obvious? That's exactly the same
"feeling" that many others (those who I happen to know well) get when we
look at values inside IO. In other words, Q (main = putStrLn "hi") is
obviously, mundanely, uninterestingly not side-effecting.
So why then am I writing this email? Well, it's a personal indulgence --
it turns out that some people don't know that! Furthermore, if they did
come to know that, then I believe we could make significantly better
progress in computing -- for example, I could start having an
interesting discussion with you about actors and distributed computing.
I'd like to have those discussions, but I think retraining on the
fundamentals is required first. I digress.
Let's apply the same reasoning to this program:
main = putStrLn "hi"
I am going to come up with a program where I substitute the expression
with its value:
hi = putStrLn "hi" >> putStrLn "hi"
and substitute:
hi' = x >> x
Is there an observable difference between the values hi and hi'? The
answer is no. Is there a program that can be constructed, and after
substitution, we can observe a difference? The answer is no, therefore
(and only therefore), this program is *not* side-effecting.
Now, you might try and handwave away, saying I moved the goal posts or
something. But I remind you, it is mundanely uninterestingly, true that
Q does not side-effect. IO deserves no special attention above any other
"EDSL." That you (anyone, metaphorical you) might be compelled to
continually attribute this undeserved attention is beyond my control --
it is a bias that I can only sit here and wait for it to subside.
You might start talking about "terminals" and "networks", in an effort
to convince me that I may have just slipped up back there. But I am just
going to start talking to you about typewriters, spaghetti and jelly
sandwiches, just to join in the indulgence of discussing irrelevant
things. I am not trying to be a smart-arse; I am attempting to highlight
just how irrelevant these things are -- *completely* irrelevant . Not
"loosely related" , not "surely you can't deny", not anything like that,
but completely unrelated. In an effort to further highlight the absence
of relation, I will simply shift the context to some other scenario,
which you might already accept as mundane and uninteresting, then make
the same argument back at you and watch you reel in surprise and how
absurd that argument is -- that absurd.
Hope this helps. It's a difficult subject to teach.
> Perhaps it's being (un?)able to explicitly state that a function has a
> non return-value-based side-effect that's the issue here?
>
> my 2c
>
> King
>
On 17 April 2012 13:24, Andy Gelme <an...@geekscape.org> wrote:FYI. This is interesting ... on Mac OS X, using "ghc --make test.hs",
to compile the Haskell source into an executable resulted in it being
756,484 bytes long. The traditional C "hello world" executable that
achieved a similar effect was 8,688 bytes long.
Link it dynamically: ghc --make -dynamic test.hs - came to 16K on my Ubuntu box. Your C compiler also probably does some dynamic linking and gets a small on-disk footprint with that.
In general the higher footprint is the price you pay for including things like a garbage collector and all the other nice things haskell gives you that C does not.
Oh dear, I think I'm about to put my head in the dragon's mouth, but
here goes ...
On 2012-04-17 14:53 , Tony Morris wrote:
> The obvious question is then, what does side-effect mean?
>
> It means that I can take an expression and substitute it with its value
> and observe no program outcome change for any given program.
I believe you have provided us with the exact definition of "referential
transparency".
An absence of "side effects" is a necessary (and by itself insufficient)
precondition for achieving referential transparency. However, a side
effect is a different concept from referential transparency ... if I
understand correctly.
> I am just going to start talking to you about typewriters, spaghetti and jelly
> sandwiches
Mmmmm ... "spaghetti and jelly sandwiches" ... nom, nom !
--
You received this message because you are subscribed to the Google Groups "Melbourne Scala User Group" group.
To post to this group, send an email to scala...@googlegroups.com.
To unsubscribe from this group, send email to scala-melb+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scala-melb?hl=en-GB.
We have 2 choices here:
(a) acknowledge the failure in the type of the function, by returning
something like an Either[Result, Error] or an Akka Future (which has
an error case).
(String) => Int becomes (String) => Future[Int] or (String) =>
Either[Int, Error]
(b) treat the failure as being "outside" our application domain and
ignore it. Let it show up as a runtime TimeoutException, that might be
fatal, or at least tear down whatever computation was in flight
"Correctness" depends of course whether you are writing a quick script
or a reactor control system, and whether you can usefully proceed in
the error case anyway.
Especially, when combined with
> "leasing" (typically for robust distributed garbage collection).
+1. I'm with you on the goodness of leasing.
> If you only have two choices: asynchronous or synchronous ... then, the
> only known robust way to deal with the network partition problem is the
> asynchronous approach.
We build all synchronous systems over asynchronous substrates. Network
packets are purely asynch. Synchronicity is an illusion, but one that
is necessary to declaratively express program intent.
For the last 10 years, the illusion has typically been achieved by
blocking threads. Its now realised that performs poorly. But lets find
a better way to maintain the illusion, than try to be rid of
synchronicity. We cannot be rid of it, we'll just keep re-inventing
it. If my program depend on the output of a function to proceed, no
amount of hand-wringing will remove that dependency.
> If you also wish to maintain a predominately functional programming
> approach in a distributed system, then I believe that you will be forced
> to consider some mixture of asynchronous and synchronous design. Of
> course, that is a huge design mismatch ... and a considerable source of
> errors lying-in-wait.
I think it will be possible to do distributed functional programming
just fine. No mismatch.
Yes, the underlying messaging substrate will use asynch and probably
actors. But it is important to see these as the building blocks only.
Not the end game. We should use FP on top IMO.
My money is on:
http://doc.akka.io/docs/akka/2.0.1/scala/typed-actors.html
http://doc.akka.io/docs/akka/2.0/scala/dataflow.html
http://msdn.microsoft.com/en-us/library/hh191443(v=vs.110).aspx
Gotta go - school pickup time...
-Ben
Coincidentally, the language I used earlier, called SafeHaskell, has no
potential to define side-effects. If I was to show you a side-effect,
I'd need to use a different programming model altogether. In other
words, not only is the program I gave most explicitly not
side-effecting, it is not possible to give a side-effecting program
without stepping outside of the language.
Again, to emphasise, IO is as related to side-effects as much as [],
State s, ReaderT f a, Maybe and a whole list of completely unrelated
type constructors or "EDSLs" if you prefer. That is, completely and
absolutely and most emphatically unrelated
On 2012-04-17 15:09 , Matthew Moloney wrote:
> Quick question: How do I send bytes across the network without side
> effects ?
If you have Ben's and Tony's perspective of Functional Programming on
top of asynchronous Distributed Systems design ... then, perhaps someone
who has experience with Functional Reactive Programming or Arrows might
have something helpful to say ?
However, I'll defer to greater expertise for providing the correct or
better answer in that situation (FP on top of DS).
- - - - - - - - -
Personally, I believe that any non-trivial distributed system will
involve the careful (strict / formal) management of side-effects.
Ben wrote:
> Yes, the underlying messaging substrate will use asynch and probably
actors.
> But it is important to see these as the building blocks only. Not the
end game.
> We should use FP on top IMO.
Tony wrote:
> Mine too.
Ah, I don't see it as a matter of "FP on top of DS" or conversely "DS on
top of FP" ... but, rather "FP along side of DS" as design principle
peers in the overall system/network design.
I realize that traditionally the network layers (just like the database
and all other I/O layers, e.g console or file) are considered as
libraries that we build applications on top of. However, a network of
computers provides failure modes that just can't be wall-papered over
(by a layer of synchronicity).
Whatever your DS building blocks are, e.g actors or something else ...
the nature of networks and failure will fundamentally affect the overall
design of your whole system (you just can't stick networking in a black
box and forget about the ramifications). I believe it is a mistake to
consider distributed systems design as something you can restrict to be
a "building block" ... if you also wish to have a robust and secure
system as a result.
When you are thinking about the application logic and writing functions
... then, sure FP will dominate. But, you can't ignore the design
requirements of the interconnections between your FP nodes of the
network. When you are thinking about the networking protocols ... then,
sure DS will dominate and everything will be asynchronous. But, you
can't ignore the design requirements of the message contents
sent/received across the network by the FP nodes.
> We need to start from an appropriate distributed computing model first
though.
I completely agree with that goal.
I would disagree with broad suggestions that you can limit the effects
of distributed computing to just a layer or building block in the
overall system design.
Tony: It would be good to hear what you suggest as a superior approach
to the actor model for distributed systems ?
> Have you tried "tacking on" expression-based programming when you
start with actors ?
I would strongly recommend against such an approach. A long time ago, I
was in such a situation ... and it is clearly the wrong path to take.
On 17/04/12 15:24, Ben Hutchison wrote:
> We should use FP on top IMO.
>
Mine too. We need to start from an appropriate distributed computing
model first though. Have you tried "tacking on" expression-based
programming when you start with actors?
--
You received this message because you are subscribed to the Google Groups "Melbourne Scala User Group" group.
To post to this group, send an email to scala...@googlegroups.com.
To unsubscribe from this group, send email to scala-melb+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scala-melb?hl=en-GB.
On 2012-04-17 16:24 , Matthew Moloney wrote:
> Hi Andy, my question was in jest ;)
Yes, that was clear. But, it was also an interesting question (if not
taken literally) !
Hopefully, this is all shaping up to be a really interesting next Scala
Users Group meeting :)
Hi Andy, my question was in jest ;)
> If you have Ben's and Tony's perspective of Functional Programming on
> top of asynchronous Distributed Systems design ... then, perhaps someone
> who has experience with Functional Reactive Programming or Arrows might
> have something helpful to say ?
>
> However, I'll defer to greater expertise for providing the correct or
> better answer in that situation (FP on top of DS).
Curiously, if you want to do distributed computing in Haskell today, then the best choice is probably "Cloud Haskell" (bad name), which is highly inspired by Erlang:
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf
There is also a Haskell interface to MPI, which focusses on high-performance computing, and is fairly similar to the actors model:
http://themonadreader.files.wordpress.com/2011/10/issue19.pdf
There have been various experimental parallel Haskells in the past, some of which had distributed components, but none particularly successful.
Cheers,
Bernie.
It's possible to type-check actor messages. We wrote that in scalaz in 2008. Functional Java too, just for giggles. It requires a shift from the typical actor implementation of e.g. erlang. Pi-calculus is great.
--
You received this message because you are subscribed to the Google Groups "Melbourne Scala User Group" group.
To view this discussion on the web, visit https://groups.google.com/d/msg/scala-melb/-/V1vMBD4g6TcJ.
> Taking hardware into account, there is a very interesting network topology called the 3D torus which can improve performance ~100x Hadoop.
Incidentally, the BlueGene/P supercomputer is configured in a 3D torus, and the next version, the BlueGene/Q, is configured in a 5D torus (which is challenging to visualise).
I guess it comes down to a compromise between the cost and complexity of the network infrastructure versus the mean hop distance between any two nodes.
Cheers,
Bernie.
Your sentiment's echo those of the famous 1994 paper by Waldo, et al,
"A Note on Distributed Computing":
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.7628
It's thrust is that, generally, network partitions cannot be papered
over and that you shouldn't try; rather, let the network boundaries
show up explicitly in your code and application domain.
By and large, I don't agree with it. It's tricky to neatly summarize
in a few words opinions formed over years of working with distributed
systems, but I'm going to try using a hypothetical 2-node example:
Our main node is an Ecommerce online store. The other node handles
payment processing, and we need to successfully communicate with it to
complete the order. At my application level, all I really care about
is that I got a response, modelled perhaps as
Either[PaymentTransaction, PaymentFail]. If I get payment fail, I will
push out some error message and return to the "place order" screen.
PaymentFail might be because:
- The other node was unreachable
- Communication with the other node dropped out mid-conversation
- The payment application had an exception due to some environmental
condition (ie low disk space)
- The payment application had an intermittent bug in it like race condition
My point is, reasons 1 & 2 are network related, but 3 & 4 are not.
Overall, distributed or local, /payment might fail/. You'd need to
code for the failure case even if the payment was processed locally.
So polluting your domain model with network-related details is not
necessary.
Rather, I advocate abstracting to simply Success & Fail cases, or
whatever the application actually cares about. Unless the application
itself wants to respond differently for 5 different network layer
errors, it seems counterproductive to even expose them.
Exposing too many network/distribution related details in the domain
model makes it hard to reorganise what runs where, since the
type-signatures will likely have to change.
-Ben
Hi folks,
Bernie; with cheap of the shelf 4 port 1Gbs nics (of which you get two per machine), and in a 3x3x3 configuration (max 3 hops with wrap around) the configuration becomes incredibly performance and cost effective as you need 3 hops for replication anyway. Getting switches that can have every node working at line rate is by comparison very expensive. Granted the software to manage this architecture is not yet commodity, but it is possible to write.
Ben; I agree with your method of wrapping failure into the Either monad. The way I do it is to use three states; Success, Failure, and Exception. For example thee states returned on the withdraw $100 could be (successful withdraw | insufficient funds | network error). The combinator library that this results in is very elegant.
Cheers,
Matt
On 2012-04-18 18:03 , Ben Hutchison wrote:
> Your sentiment's echo those of the famous 1994 paper by Waldo, et al,
> "A Note on Distributed Computing":
Yes, I have been strongly influenced by Jim and the Jini team. Spend a
number of years using Jini (and related efforts). I believe they broke
some new ground (in the context of that time) in what they attempted to
bring to main-stream developers. I have some opinions about where that
effort fell short, but I'm careful not to pretend to be any smarter or
more experienced than the Jini team.
> Rather, I advocate abstracting to simply Success & Fail cases, or
> whatever the application actually cares about. Unless the application
> itself wants to respond differently for 5 different network layer
> errors, it seems counterproductive to even expose them.
I completely agree.
I think you'll find that "A note on distributed computing" was
advocating that fundamental differences due network failure must
influence the overall design of a true distributed application /
system. And, that a simplistic exposure of "5 different network layer
errors" is never mentioned in Jim's note, nor is it the conclusion of
his argument.
- - - - - - - - - - - -
Elsewhere, Jim Waldo also discusses issues such as deciding whether to
have a homogenous language distributed system architecture (Java, RMI,
Jini approach), that enables "code movement" over the network and avoids
IDLs, etc. Versus a heterogeneous language distributed systems
architecture, which is more data-centric (Microsoft CLR or XML or JSON).
Other issues that interest me are ...
- Managing the life-cycle of long-term persistent distributed services
(components, actors, etc) ... and how to make that easy for application
developers in a fully asynchronous environment. Particularly,
dynamically creating new services.
- Security and delegation of authority, i.e how to securely allow a
distributed service to act on your behalf in a constrained fashion and
revoke those privileges when the action is complete.
- Creating distributed development tools (IDE), operational tools and
user interfaces that are first-class citizens of the distributed system,
i.e not bolted on the side using a simplistic client/server approach.
You'll need to be able to monitor and selectively upgrade your system in
real-time. You'll never be able to take the whole thing down for an
upgrade. Version control of services / components is another fun challenge.
- When a call is made to a distributed service and a problem occurs,
then only providing the failure information to the caller is
insufficient. The caller probably isn't responsible for the life-cycle
management of that service. There is no point returning detailed
failure information to the caller who can't do anything about the
problem and probably won't even understand what the failure means.
- How to have small cpu/memory/power constrained embedded devices act as
first-class citizens in a distributed system
We are going to have many billions of mobile and embedded devices acting
on our behalf (often autonomously) and incorporated into our lives ...
and they'll be much more invasive than any tweet or Facebook status
update. We better start figuring out how to do this, because HTTP /
REST / JSON isn't going to be sufficient.
This will be way more challenging than a "hypothetical 2-node example" !
Andy, thanks for your comments, you raise a number of good points,
such as the question of mobile code which was a big part of Jini/RMI,
but never really taken off in the form of remote bytecode loading.
(yet mobile /interpreted/ code is wildly successful, as SQL and
Javascript demonstrate).
However, I want to return to 2 things that the "Waldo school" has
typically argued against: /transparency/ and /synchronicity/, because
I believe basically that both are a Good Thing (TM).
WRT to transparency, I honestly think the community has voted with its
feet on this one. The popularity of Hibernate and Rails is in no small
part because they make database network interactions transparent. When
I call "save" on a domain object, all sorts of complex network
messages result, but they are hidden from me. And because it "just
works", for most people most of the time, its proven very popular.
Now "synchronous" interactions are out of fashion at present. "Asynch
everything" is very 2012. But most important asynch interactions seem
to need 3 parts:
(a) An asynch send that sends a message and remembers that it was sent
(b) An asynch receiver that waits for some kind of reply
(c) A timer that detects if (b) failed to occur within a given time
To me, that's just re-inventing synchronous calls. So why not move to
a higher level of abstraction, factor out the common code, keeping
"first-class citizen status" for synchronous interactions.
The main objection is that synchronous has become conflated with
blocked threads and the resultant context switching. But mechanism
like C#'s "async", or Akka's Dataflow concurrency, give us the ability
to write high-level code that looks and feels synchronous, yet is
implemented via asynch building blocks that do not block threads.
-Ben
On Thu, Apr 19, 2012 at 5:50 AM, Andy Gelme <an...@geekscape.org> wrote:Andy, thanks for your comments, you raise a number of good points,
> I think you'll find that "A note on distributed computing" was
> advocating that fundamental differences due network failure must
> influence the overall design of a true distributed application /
> system.
such as the question of mobile code which was a big part of Jini/RMI,
but never really taken off in the form of remote bytecode loading.
(yet mobile /interpreted/ code is wildly successful, as SQL and
Javascript demonstrate).
However, I want to return to 2 things that the "Waldo school" has
typically argued against: /transparency/ and /synchronicity/, because
I believe basically that both are a Good Thing (TM).
WRT to transparency, I honestly think the community has voted with its
feet on this one. The popularity of Hibernate and Rails is in no small
part because they make database network interactions transparent. When
I call "save" on a domain object, all sorts of complex network
messages result, but they are hidden from me. And because it "just
works", for most people most of the time, its proven very popular.
Now "synchronous" interactions are out of fashion at present. "Asynch
everything" is very 2012. But most important asynch interactions seem
to need 3 parts:
(a) An asynch send that sends a message and remembers that it was sent
(b) An asynch receiver that waits for some kind of reply
(c) A timer that detects if (b) failed to occur within a given time
To me, that's just re-inventing synchronous calls. So why not move to
a higher level of abstraction, factor out the common code, keeping
"first-class citizen status" for synchronous interactions.
The main objection is that synchronous has become conflated with
blocked threads and the resultant context switching. But mechanism
like C#'s "async", or Akka's Dataflow concurrency, give us the ability
to write high-level code that looks and feels synchronous, yet is
implemented via asynch building blocks that do not block threads.
-Ben
To unsubscribe from this group, send email to scala-melb+unsubscribe@googlegroups.com.
On Tue, Apr 17, 2012 at 4:18 PM, Andy Gelme wrote:
>... However, a network of
> computers provides failure modes that just can't be wall-papered over
> (by a layer of synchronicity).
Your sentiment's echo those of the famous 1994 paper by Waldo, et al,
"A Note on Distributed Computing":
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.7628
It's thrust is that, generally, network partitions cannot be papered
over and that you shouldn't try; rather, let the network boundaries
show up explicitly in your code and application domain.By and large, I don't agree with it. It's tricky to neatly summarize
in a few words opinions formed over years of working with distributed
systems, but I'm going to try using a hypothetical 2-node example:Our main node is an Ecommerce online store. The other node handles
payment processing, and we need to successfully communicate with it to
complete the order. At my application level, all I really care about
is that I got a response, modelled perhaps as
Either[PaymentTransaction, PaymentFail]. If I get payment fail, I will
push out some error message and return to the "place order" screen.
...
Good points. It depends I guess on how your interpret their meaning.
There's some design continuum between network transparency and
opaqueness (?). Ive felt that Waldo's piece generally argues more
towards the opaque end than I sit myself.
The original remark from Andy Gelme, that brought up Waldo was in
reference to whether FP could sit on top of a distributed system, or
"alongside" it. I still think that's an interesting question. What if
I have some function "foo" whose body executes remotely, but is
otherwise pure. Obviously, the network adds a side-effect, but I might
choose not to acknowledge that within the type system, if I feel the
network is reliable enough, and generally treat it as pure. But in an
language-enforced effect system like Haskell's, or that proposed for
Scala, will it be possible to make remote invocations transparent, due
to the intervening network comms?
-Ben
> --
> You received this message because you are subscribed to the Google Groups
> "Melbourne Scala User Group" group.
> To view this discussion on the web, visit
> https://groups.google.com/d/msg/scala-melb/-/MLOaYAtUMyIJ.