Re: [mqtt] Efficient topic organization for 1-to-1 communication between publishers and subscribers.

674 views
Skip to first unread message

Raphael Cohn

unread,
Oct 17, 2012, 3:44:37 AM10/17/12
to mq...@googlegroups.com
Philip,

A lot of questions for a fascinating use case. Overall, MQTT as a protocol can do what you seek.

I'll do my best to answer from my own perspective of the broker we designed (I apologise in advance if in doing so I overstep the mark into self-promotion). I'm sure others on this list will want to contribute. There's a lot of knowledge out there.

1 Numbers of Devices
MQTT is a very lightweight protocol based on TCP which is asynchronous. Thus for scaling to millions of devices:-
- The problem is not one of processing power; a single modern server can be designed to handle 10,000s to 100,000s of 1000s of devices, easily;
- Since data is asynchronous, and messages are often small and 'occasional' (ie not every ms for every device), there is not real cap on TCP connections except available port numbers
  - this can be managed by using virtual interfaces
  - not all devices are connected at once
- Actually, the TCP problem becomes flow control. With this many devices, some external TCP networks can have issues (eg mobile operators)
  - If this is an issue, one could write a SCTP binding quite simply for MQTT. The downside is the lack of support outside the Linux / Unix space, particularly in embedded stacks.
- Most 'traditional' brokers use a thread per connection; for a broker to cope, it needs to use a thread pool or a thread per CPU model ('asynchronous IO' or 'NIO' in Java)
- One then scales by adding CPUs and interfaces initially
- Nearly all brokers are C or Java. However, Java based NIO does NOT scale well when cores get large (eg 24+) (this includes MINA, etc) because it uses synchronous locks
  - So you need a thread per CPU model which uses non-blocking, CAS based operations and can store and forward internally without synchronous memory locks
   - This reduces the need to flush caches and so can leverage NUMA designs
- When these limits are reached, you can use multiple brokers. Even if the broker doesn't support distributed message queues (and many don't because it's hard to get right and frankly, not that useful normally as they usually perform poorly) it's not really an issue - because most application software you're going to write is going to struggle with this volume, too. (Indeed, the usual problem we've found is not devices, but processing the vast stream of data they produce). So you logically divide your brokers up into separate clusters.
- If you need truly distributed brokers, then you use a broker design with a shared-nothing approach to distributed queues so you can even share in-memory messages without using disk-based replication or insanely expensive memory replication (eg Infiniband) (which has its own set of complexities).

2 Topic Overheads - 'Heaviness'
In our broker, they are incredibly light. We actually model it as nothing more than a selector on messages in a queue. The impact then is per connection, not per topic - so the performance penalty scales with connections. If a client is not connected there is no performance penalty. In addition, in our design, there is a small amount of memory used per connection to manage state. The queue itself is used as the buffer to send the message, so there is zero-copy to the network - thus no need for a memory copy (and more utilisation).

A challenge here is that a broker can easily run out of memory. A good design will use a memory quota model (eg each potential device is allocated x RAM and y disk, etc) to prevent this. Thus an 'out-of-memory' scenario results in connections being denied - not calamity or messages being lost, ie degraded service vs no service.

So in our broker memory usage is basically message + a little overhead per message in the memory store (RAM or disk, your choice) + a small 4K page per connection. So about 4Gb for 1m connections.

3 Mailbox per topic
Shouldn't be a problem. Where it can be a problem is if 1m devices are all sending messages to the same space. This means that even with non-blocking multiple reader, multiple writer queue designs using CAS based algorithms, there is still a problem of back-off thrash. However, this is not usually the case, as actually, the number of truly simultaneous operations is more limited - in our design, to the number of cores, as we allocate each connection to a core (circa - we reserve a number of cores for other things). If it is a genuine issue, then one simply uses multiple topic spaces, and then a small daemon process to 'consolidate' them. However, it may not be necessary, depending on the messages and their intended use cases.

For devices receiving messages, see my answer below.

4 Topic Levels, Topics
With MQTT, you have one topic space. If your broker works like ours does, then I'd suggest making your topic hierarchy as rich as possible. This only becomes a problem if a broker uses regex-like approaches to do topic matching. Most don't for the MQTT filters. If you find yourself needing multiple topic hierarchies then you might want to use multiple connections. If a topic is fairly quite, then there isn't a huge overhead to doing that (TCP isn't chatty, although you may need to tweak keepalive and be aware of some bizarreness that mobile operators can do). Ultimately, if you REALLY need multiple hierarchies and matching on other criteria, then MQTT isn't going to be right - something like AMQP does this very well. But that is a very complex protocol for small devices to use, and it quite probably has higher bandwidth, particularly if its for occasional connections. (A disclaimer: we started off as an AMQP implementation). It does have an advantage in allowing each device to represented by a 'link', so you can manage device connection characteristics persistently, eg change flow control per device, or disable a device spewing too much data, re-direct a particular device to another broker (say you sold half your fleet of device using trucks to BuyOut Corp or some such). But I doubt it wins here.

5 Persistence
It depends on the broker. For us, the actual topic isn't persisted at all. We offer the choice of whether to operate a persistent message queue underneath it, and whether the subscription is volatile or durable. Others do so too. If you do need the topic to hang about, you might want message replay. This is when a broker lets you replay a stream of previously sent messages. Useful if things have gone calamitously wrong.

6 Device Updates
We've done this before. There are a number of techniques. Rather than a mailbox per device, one can use a topic hierarchy organised by, say, firmware revision. That way, devices subscribe to their current firmware revision to receive their update; only those devices not yet updated receive that message. One can also sub-divide by geographic locale or (if operating with multiple telcos) telco, etc (a bit like DVD regions, say). Even if your topic design isn't perfect (and its hard to get hierarchies right in the first iteration of any MQ based architecture), one can use this to push out a device update that then perhaps connects to an alternative hierarchy. Other techniques include publishing package updates this way. We've done an exercise previously in mapping debian repositories to MQTT...

If the broker is implemented like ours, there is only one copy of that message; when all have received it, it is lost. This is efficient. Where problems arise is if the update is not Kb but Mb. Then there is the problem of partial transmission (loss during networks outages, quite common in the mobile space). This can be solved in at least 3 ways:-
- Dividing updates into smaller chunks or packages
- Providing a synchronous mechanism in parallel, eg
  - message queue simply has a 'device update ready' message
  - device then uses HTTP with range requests to receive update pieces (requires state management)
- Using a different protocol; this is an use case where AMQP might make sense, but that then complicates device implementations, especially for small devices... but then one is doing this because the device update is large any way, so its probably not a device and more of linux server or some such.

7 Recommendation
I'm sure others will have their favourites! In the words of Francis Urquhart, 'I couldn't possibly comment.'

I hope that helps. I'm sure others will have lots more to contribute.

Raph

Chief Architect, stormmq
Secretary, OASIS AMQP Standard
raphae...@stormmq.com

UK Office:
Hamblethorpe Farm, Crag Lane, Bradley BD20 9DB, North Yorkshire, United Kingdom
Telephone: +44 845 3712 567

Registered office:
16 Anchor Street, Chelmsford, Essex, CM2 0JY, United Kingdom
StormMQ Limited is Registered in England and Wales under Company Number 07175657
StormMQ.com



On 16 October 2012 20:40, Philip Lombardi <plomb...@gmail.com> wrote:

Good afternoon,

I am conducting preliminary research for using MQTT as a new transport protocol for a new device communication protocol in an existing product. Recently, while discussing the protocol a question was posed about 1-to-1 communication between our platform and a device; in particular how we would handle sending a single message to a large number of devices, but not the complete set of devices, just a subset. My response at the time was to have a separate topic or topic-level per device which would be treated as a mailbox – each device would subscribe to the correct mailbox. However, we are considering a rather large number of devices this 1-to-1 communication could be occurring with; in our case that number is between 2 to 3 million devices and is expected to grow rapidly in the next 12 to 18 months. The idea of maintain a couple million topic subscriptions as ‘mailboxes’ is a bit terrifying and it doesn’t even factor in other topics that are not unique to a single device but apply to various subsets of different sizes and overlap.

Some initial questions for the MQTT gurus:

  • Is the mailbox per device approach a good idea or is it a case that sounds good on paper but is probably flawed due to scalability issues?
  • How 'heavy' are topics usually? How much memory does a topic consume on the broker? What kind of resource utilization (CPU, memory, disk) might we be looking at here?
  • Presumably if I were to publish a software update for a device, say a 64K firmware update that applied to only some subset of all the devices (say 500,000 of 3,000,000) I would need to publish to each of the devices mailbox in the subset thus creating a situation where the storage for the message is 500,000 * 64K ... which is ~30GB?
    • Note: An alternative here would be to send a message to the devices to subscribe to a special topic containing the device; but then we have doubled network communications (one to notify, one to download).
  • Would there be any advantage to using topic-levels compared to individual topics?
  • Are topics and topic levels persisted forever or are they created by the subscriber, publisher or both?
  • Does anyone have a broker recommendation for this type of scale? I’m not sure how well Mosquitto or ActiveMQ is going to do with this kind of potential throughput.  

Thank you for your time. I look forward to reading any responses I receive.

Regards,

Philip Lombardi

--
--
To learn more about MQTT please visit http://mqtt.org
 
To post to this group, send email to mq...@googlegroups.com
To unsubscribe from this group, send email to
mqtt+uns...@googlegroups.com
 
For more options, visit this group at
http://groups.google.com/group/mqtt
 
 
 

Philip Lombardi

unread,
Oct 21, 2012, 9:46:47 PM10/21/12
to mq...@googlegroups.com
Raph,

Thank you for the extremely detailed post. You have given me quite a bit to chew on as I continue my research and this post has a lot of great information for the group as a whole. I will update this post if I have any further questions and to let people know my experience with MQTT as I fall further down the rabbit hole!

Cheers,
Phil
Reply all
Reply to author
Forward
0 new messages