Thousands of concurrent clients possible with Python?

108 views
Skip to first unread message

Zsolt Ero

unread,
Aug 25, 2017, 4:06:40 PM8/25/17
to MQTT

I am developing a GPS-based live-tracking mobile app where users can see the real-time position of other nearby users. I am trying to figure out how would the server side look for serving such an app? I mean how can I solve the problem of concurrently communicating with up to 10.000+ clients using a Python server?

I have three concepts in mind, but have no experience in the viability of any of those:

  1. Use a standard WSGI Python app, under a few normal sync workers (say 8 gunicorn workers), with HTTP REST api:

    • Every location update would be a HTTP POST JSON request say once per 5 seconds for every live client.
    • The server would store the received locations in Redis.
    • The clients would issue a HTTP GET JSON request once every 5 seconds, and the server would query Redis for the nearby clients.
  2. Using some kind of WebSocket / ASGI / async Python implementation to provide persistent connections, with the same logic behind.

  3. Use MQTT protocol on the clients and use some kind of MQTT broker and split the server side to a WSGI REST API (authentication) and a MQTT client (location updates).

Which of these methods are viable, if it is possible at all in Python? From what I've seen WebSockets in Python is mostly benchmarked till hundreds of connections, maybe up to 1000 concurrent connection. This is in stark contrast to frameworks like Phoenix/Elixir which has been benchmarked to 2M concurrent connections on a single box. So I believe 2. is not a viable path.


Would 1. or 3. work reliably with 10.000+ concurrent users?

Vidyadhar Kothekar

unread,
Aug 26, 2017, 10:51:15 PM8/26/17
to MQTT
I will go with option 3 - MQTT Broker for following reasons:
- Much lighter than REST/HTTP from overhead perspective
- Best fit for the given problem which is purely in the publish / subscribe domain
- Can be scaled linearly as we add more MQTT brokers
- Can easily service thousands to millions of mobile app instances
- Choose the QoS that's fit for purpose
- Heavy lifting of data distribution and routing taken care by MQTT broker

You may want to take a look at http://dev.solace.com & live demo of geo proximity filters for cars @ http://london.solacesystems.com/aaron/geo/


Regards,
Vidyadhar Kothekar

Zsolt Ero

unread,
Aug 27, 2017, 8:41:06 AM8/27/17
to mq...@googlegroups.com
Vidyadhar, thanks a lot for the reply, I was thinking of the same
choice actually. The Solace demo is also impressive.

On 27 August 2017 at 04:51, Vidyadhar Kothekar
> --
> To learn more about MQTT please visit http://mqtt.org
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "MQTT" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/mqtt/J07vk3l74Bo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> mqtt+uns...@googlegroups.com.
> To post to this group, send email to mq...@googlegroups.com.
> Visit this group at https://groups.google.com/group/mqtt.
> For more options, visit https://groups.google.com/d/optout.

Hans Jespersen

unread,
Aug 27, 2017, 5:27:05 PM8/27/17
to MQTT
Most connected car companies (including Uber) have Apache Kafka on the back end datacenter to do streaming analytics, storage and replay of driving sessions, and fan out and feeding multiple repositories like Redis, Hadoop, Elasticsearch, Casandra, etc.

Device to Datacenter can be either REST or MQTT or both with a set of Kafka Connectors for each protocol running as dynamically scalable Microservices in the DMZ.

If you are going to run MQTT over the public internet then you will likely need to tunnel it through secure websockets (WSS) to keep it secure and still be able to traverse firewalls that might otherwise block raw Mqtt on TCP port 1883.

Zsolt Ero

unread,
Aug 27, 2017, 5:31:39 PM8/27/17
to mq...@googlegroups.com
Thanks, that's quite some important information to know. So at the end
I'd need to use Websockets anyway, so I think I'd probably start with
that. Right now I'm thinking of investing time to learn Elixir/Phoenix
as it can simply handle millions of concurrent Websocket connections
without any problem.

Hans Jespersen

unread,
Aug 27, 2017, 5:53:44 PM8/27/17
to mq...@googlegroups.com
There is value running mqtt over websockets versus running just raw websockets though. Websockets is more of a transport than a pub/sub messaging system and you would want QoS 1 or 2 which you only get with mqtt. When you get into putting hardware on cars it's nice to have a local lightweight mqtt broker than can buffer data when you drive through tunnels, dead coverage areas, or underground parking lots.

If you want to learn more about the back end of Uber there was a great talk at last years Kafka Summit that was recorded and is available here https://kafka-summit.org/sessions/stream-processing-kafka-uber/

-hans

Vidyadhar Kothekar

unread,
Aug 28, 2017, 8:12:02 PM8/28/17
to MQTT

Hi Zsolt,

 

Referring to few lines from your problem statement: “I am developing a GPS-based live-tracking mobile app where users can see the real-time position of other nearby users. I am trying to figure out how would the server side look for serving such an app? I mean how can I solve the problem of concurrently communicating with up to 10.000+ clients using a Python server?”

 

If I understand the goal correctly, you are looking at a solution that’s on the similar lines to the demo link I sent you earlier – the one where you can “Draw the boundary of the vicinity you are interested in tracking the users within”. I would like to refer you to the paper where it talks about the mechanics of the boundary creation and how it easily translates into simple lat/lon based topics for subscriptions. The beauty of the solution lies in its simplicity - http://worldcomp-proceedings.com/proc/p2016/ICM3967.pdf

 

If you are interested in fanning-out the information to streaming analytics or data-at-rest analytics, you can use Wire-Tap pattern where information published on topics can be spooled to queue. You can favour choosing open wireline like AMQP for richer functionality of data processing within Core infrastructure and use MQTT for mobile to edge communication – right tool for the right job that is fit for purpose.


Regards,
Vidyadhar Kothekar

Zsolt Ero

unread,
Aug 29, 2017, 8:43:00 AM8/29/17
to mq...@googlegroups.com
Hi Vidyadhar,

I'm developing a location aware mobile game, but the use case is quite
similar to Uber / car tracking. One part where mine is much simpler is
that I don't care about the historic data, only the most up-to-date
one.

Thanks for sharing the paper. My idea right now is to use Redis and
it's build in GEORADIUS command, once every n seconds. I've read some
articles which mention millisecond query times using GEORADIUS with a
million points.
https://redis.io/commands/georadius

What I still need to research is how much information sharing is
usually done by a broker, and how much is handled by the server. For
example in a Slack like application, are clients subscribing to a
"room", or they are all individually subscribed to their own unique
channel and the backend server does the filtering and sending of
messages to each of them?

Zsolt

On 29 August 2017 at 02:12, Vidyadhar Kothekar

Vidyadhar Kothekar

unread,
Aug 29, 2017, 9:39:47 PM8/29/17
to MQTT
Hi Zsolt,

Your requirement of *Not* tracking historical data and being interested in only "Current" state makes it even more compelling to use MQTT broker. I say this because you can scale-out horizontally without getting into the complexities introduced by state management required by queueing / storage. As the number of users increase, you can very easily scale up linearly and scale down when demand goes down. I have done similar implementation at one of the gaming / betting companies where we scale in/out brokers in cloud elastically to adapt to the changing demand cycles.

Based on my quick read through, the Georadius seems more like "Query - then take action" paradigm rather than "Event happened - take action" model. If you want to build the solution around Event Driven architectural concepts then middleware messaging broker would be the right fit to realise it.

Regarding the Slack scenario you mentioned: I can say it is very much possible to build 100% of "chat data movement / information sharing" by using only broker without any server side component. In fact, I view the broker as an intelligent intermediary that handles information movement semantics using well defined message header attributes (to be specific - messaging Topic). The client can then subscribe to a "Room" or "myPrivateInbox" to either receive broadcast information sent to Room or to receive a personal message sent to it on privateInbox. The client implementation remains the same in either case. The heavy lifting is done by broker - as in who all are interested in "Room" and who are "myPrivateInbox". If you don't use broker, you need to build these semantics yourself. You may find something that mimics this functionality as a server side offering (that is not middleware broker); but you will have to figure out how reliable and robust that option is. Messaging Brokers and the concept of middleware based information movement has been there for more than quarter century now and is proven to be best-fit for the purpose.

Regards,
Vidyadhar Kothekar
Reply all
Reply to author
Forward
0 new messages