Sorry for the delay... In the recent discussions (and especially in a
long IM discussion with Michael Bridgen last week) I've realised I am
using a lot of 0MQ terminology that was never clearly defined and
explained. Which of course made the discussion pretty hard.
So, I've spent couple of days writing it all down. The result can be
found here:
http://www.250bpm.com/concepts
Hopefully, it'll make the discussion more clear and comprehensible.
For reference, here's a similar document for broker-based messaging
(written by Pieter back in 2004 IIRC):
http://www.openamq.org/doc:amqp-background
Martin
> So, I've spent couple of days writing it all down. The result can be found
> here:
> http://www.250bpm.com/concepts
Nice, this is very useful.
-Pieter
If I paraphrase one quote, messaging makes (concurrent and
distributed) computation digestible.
kohei
> --
> Note Well: This discussion group is meant to become an IETF working group in
> the future. Thus, the posts to this discussion should comply with IETF
> contribution policy as explained here:
> http://www.ietf.org/about/note-well.html
>
This is a great explanation, thank you for writing it Martin.
Do you have ideas how ZeroMQ might be adapted to fulfill the design
principles given in the appendix?
For instance, the uniformity principle: "Consider PUB/SUB pattern as
currently implemented in �MQ. It allows for multiple publishers in the
topology which introduces non-uniformity".
So far as I can tell, the design given at http://www.250bpm.com/pubsub
also suffers from this problem. (Not sure if that comes under "currently
implemented in �MQ").
Is the implication that a topology should have just one publisher? Or
that the topology name should resolve to a single address for
publishers? (sorry, mixing in terminology, I know ..)
--Michael
This is important in most enterprises but also in multi-tenant data
centers which utilize the concept of Central Limit Theorem to properly
size capacity (basically flatten the supply/demand curve).
I am curious what others think about how tightly bound the service is to
topology and the expectations for instance that information cannot be
leaked out of the segmented boundary.
The concept that a "service" is not associated with a specific endpoint
helps to promote the share-nothing[1] architecture people want for
stateless services. Rod Johnson recently has been promoting the need to
embrace data-grid technologies to promote this decoupling and enhance
scalability(He believes as do I that this is central to building Platform
as a Service).
Certainly data-grid approaches like Gigaspaces, Coherence, and Gemfire
have a component which includes messaging so maybe there are some
interesting ideas surrounding this space.
As for the name resolution discussion some interesting work has been done
by Van Jacobsen specifically Networking Named Content[2] which the group
might find interesting.
1. http://en.wikipedia.org/wiki/Shared_nothing_architecture
2. http://www.named-data.net/education.html
Although the example of the problem is a little bit subject to
discussion -- you can see the graph as two topologies, one endpoint
(B) participating in both -- the pub/sub pattern is often inherently
broken in regards to at least one of the three principals. To scale
reliably in a uniform network and permit interjection of nodes would
require that all nodes would need to seen almost all messages at one
point. Scaling on pub/sub would necessarely required some tradeoff at
some point.
Fabien
> Do you have ideas how ZeroMQ might be adapted to fulfill the design
> principles given in the appendix?
I think it's important to distinguish 0MQ as a product and the
"scalability" issues that's this group is meant to research.
While there's a lot of intersection, 0MQ contains patterns that are
inherently non-scalable (pair) or offer limited scalability (pipeline).
There are various reasons for that: Some people really want to use
non-scalable patterns and there's no legitimate reason not to allow them
to do so. There are backward compatibility reasons. Etc.
As for SP work these reasons don't apply as the goal is to address
scalability per se.
> For instance, the uniformity principle: "Consider PUB/SUB pattern as
> currently implemented in �MQ. It allows for multiple publishers in the
> topology which introduces non-uniformity".
>
> So far as I can tell, the design given at http://www.250bpm.com/pubsub
> also suffers from this problem. (Not sure if that comes under "currently
> implemented in �MQ").
>
> Is the implication that a topology should have just one publisher? Or
> that the topology name should resolve to a single address for
> publishers?
I would say there's a need to separate "pub/sub" and "aggregator"
pattern. Pub/sub would have a single publisher and a tree of
subscribers, while aggregator would allow for multiple publishers and
only a single consumer.
Note that both these patterns meet the outlined principles.
Moreover, any topology created using existing 0MQ pub/sub can be broken
to "new pub/sub" and "aggregator" topologies, providing exactly the same
functionality. The only difference is that instead of one big
inconsistent topology, user would be forced to define couple of smaller
consistent topologies.
As for resolving the topology names I have no idea. There was almost no
work done in that area, so everybody is free to propose suggestions.
Martin
> This is great.. I think however the concept of a topology needs to be
> dealt with primarily for proper service segmentation.
>
> This is important in most enterprises but also in multi-tenant data
> centers which utilize the concept of Central Limit Theorem to properly
> size capacity (basically flatten the supply/demand curve).
Yes. I believe that the main point here.
TCP tried to codify the notion of "service" (TCP port) which was only
partly successful due to many reasons not the least one being that TCP
service accounts only for classic star topology (server & clients, all
communicating on the same port).
We have a chance now to define what service is in a broader fashion.
By providing a strict definition and making services automatically
distinguishable one from another we are basically providing an
information about formal properties of the business logic to the network.
Network, having that information available, can provide all kinds of
smart behaviour that is currently either implemented in applications or
not implemented at all.
> I am curious what others think about how tightly bound the service is to
> topology and the expectations for instance that information cannot be
> leaked out of the segmented boundary.
I have no experience with security, but secure separation of the
topologies (say in multi-tenant environment) sounds like one possible
application of the model.
> The concept that a "service" is not associated with a specific endpoint
> helps to promote the share-nothing[1] architecture people want for
> stateless services. Rod Johnson recently has been promoting the need to
> embrace data-grid technologies to promote this decoupling and enhance
> scalability(He believes as do I that this is central to building Platform
> as a Service).
>
> Certainly data-grid approaches like Gigaspaces, Coherence, and Gemfire
> have a component which includes messaging so maybe there are some
> interesting ideas surrounding this space.
GenStone was acquired by vmWare IIRC, so RabbitMQ guys may have a
contact there, however, I've dealt with GenStone in the past and my
feeling was that they are interested in DB side of the things rather
than in the networking. I may be wrong though.
> As for the name resolution discussion some interesting work has been done
> by Van Jacobsen specifically Networking Named Content[2] which the group
> might find interesting.
I'll give it a look. Thanks!
Martin
If pub/sub is constrained to a distribution tree (ie. one publisher, N
consumers) all messages would have to be generated at a single point
which would place a sane cap on massage flow. Note that adding more
nodes (intermediaries, consumers) won't add any more messages to the
topology, meaning that each node would have to process at most the
number of messages generated by the ultimate publisher.
Martin
Actual endpoint(s) to receive the message are selected in transparent manner by ØMQ.with respect to the use-case scenario of mobile devices.
if you want to send data to specific endpoint you should use TCP or similar protocol.
If you want to send it to the topology and let topology to decide on the destination, you should use ØMQ.
> Given rapidly changing endpoints, what underlying mechanisms does �MQ
> use to maintain topology?
> Rapidly changing can mean as low as every 5 minutes.
I am not an expert on mobile applications, however, given that TCP is
used underneath, I would assume that if IP address changes abruptly
(because you walk from the office to starbucks) the other end won't be
notified about the old address disappearing in less than 2 hours (see
TCP keepalives spec). As for the mobile end I have no idea what happens,
however, even if disconnect notification is issued by the OS, and
topology is reestablished from the new IP address, the old lingering TCP
connection is not going to go away.
If you have any thoughts of how to make SP robust on mobile devices,
feel free to make suggestions.
Taking a risk of appearing stupid: Isn't this an L3 or L4 problem,
rather than a messaging problem?
Martin