Hi All,
I have question on approaching a problem with golang at its ecosystem.
At
happypancake.com we are building a chat server for the next version of our social dating web site. We are trying to stick to golang, since it has really nice concurrency primitives and great web stack.
However, recently we hit a wall with a design problem. This is most likely due to me just being silly. So I hope you could help me out.
So, in our chat two people can engage in a conversation between each other. They can send messages and also see "User X is typing" presence notifications.
I expect to have 20k people chatting at the same time (based on the numbers from the existing system). Obviously, we don't want to go to database each time a message is sent, user starts typing or every time client asks for new updates (both presence or messages).
As I understand, people solve this problem by caching latest messages in memory and occasionally spilling it to the disk in batches. Majority of requests will be performed against that cache, and will be really fast and cheap.
This approach can probably be implemented in two ways in a cluster (neither of which seems to be supported in golang):
1. Actor style : conversation state is located on one of the nodes in the cluster. All requests for that conversation (post message, get last events since X, notify that user is typing) will be somehow routed to that node. Regularly the node will be save state from memory to the disk (loosing a few messages is OK in our case). In case of crash, the conversation state will be linked to the other node and loaded from memory. All new requests will be from now on routed to that node.
Problem with that approach : As far as I know, there is no tooling in golang ecosystem to deal with this kind of state-aware routing, leader election and actor management functionality.
2. Replicate everywhere : all messages are replicated to all nodes in the cluster (e.g.: via zeromq, nanomsg or http connection between the peers). Each node maintains a copy of conversation state in memory. State between the nodes is in sync (in eventually consistent way).
Problems with that approach : It does not scale out, we would have to deal with managing peer relations (and dealing with fail over) and network partitions.
So the question is : how would you recommend to deal with this design challenge in go? Is there an approach that fits the ecosystem and existing tooling that I missed?
Best regards,
Rinat