Hi,
we are investigating Akka Distributed Data for storing some (Long -> Long) mapping in a LWWMap.
We plan for up to ~1mio entries for this and expect a write load of few hundreds/thousands entries per day, in contrast we expect a read load of some hundred requests per second. Therefore, we plan to use readLocal together with writeMajority.
Roughly calculating the size of the map, we should end up with some 20-30MB, which feels okay'ish when delta-CRDTs work.
While testing with only some thousand entries, we see huge propagation delays from one node to another in the order of tens of seconds (both running on the same PC). This raises our concern, if Akka Distributed Data is really a valid option for our use case (I will explain it a bit in detail below). What concerns us is what happens when we need to do a full cluster restart, when this map will fill up at a rate of like 100 entries per second. We currently do not feel too confident that this can work.
I already asked in the Akka-User chat and got the feedback that artery could help a bit to overcome the head-of-line blocking (we see the Phi value detector logging about higher delays). We tried that and got even slower updates IIRC. And the usual suspect to manually shard it into multiple top-level maps, which we could go for, but we would still have 100 maps at 10k entries each.
Our use case is as follows and I will describe it using an example of a CarActor, which is addressed by its ID with Cluster Sharding (so that we have one actor per imaginary car in the whole cluster). However, we need to send messages to that actor also via 3 other identifiers, let's say the IDs of the engine, gear, and the exhaust pipe (that these parts are occasionally replaced and even transferred to a different car pretty well reflects our scenario; reads would be like every time we pass a toll bridge.. damn.. I think we could earn way more money if we would really build that very system :-)). So these three mappings (engine ID -> car ID, gear ID -> car ID, exhaust ID -> car ID) are what additionally needs to disseminated among the nodes participating in cluster sharding. The characteristics of CRDTs sound very appealing here - an engine wouldn't be in another car in the very next second, leaving enough time for convergence via gossip. Having the whole data structure in memory feels like nothing to how we run our system currently (all nodes store all data, mostly successfully invalidate other node's state, backed by a heap of >100G).
It would be nice to get your feedback here, for both, if CRDTs will kill us one day and if you have other ideas, how to disseminate the mapping (in our mind we have 1) storing those in Redis accessible by all cluster sharding nodes and 2) having some mapping actors that know the mappings and store them in some local data structure; either 2a) one mapping actor per node knowing the whole state or 2b) one actor per engine ID, addressed via cluster sharding, knowing only the car ID by looking it up once in the data base, leading to 3mio tiny mapping actors alive ¯\_(ツ)_/¯).
Thanks
Steffen