TL;DR: We discussed higher availability database options in a work
week session, and we identifies three solutions to prototype:
A) sticking with a single-master MySQL cluster, focusing on improving
failover and error messaging;
B) a multi-master, multi-region MySQL cluster (possibly using Tungsten);
C) a multi-region Cassandra ring.
*We'd like suggestions from the community on solutions that we didn't
consider which satisfy the requirements.*
This email follows on an initial DB planning message that Jared sent a few
weeks ago[1], see there for background and a list of requirements.
We were able to talk through and call out some very specific constraints
and opportunities related to our data storage choice :
* Low read latency is not very important because so much of persona is
intentionally CPU bound, effectively hiding any other latency behind 500ms
of compute time
* The read/write ratio is very read heavy and very write light
* Any existing instances in persona of writes followed closely by reads are
not desired/required and will be removed. This effectively removes a need
for immediate consistency
* The data set is small and is expected not to grow beyond the storage on a
single server effectively removing the need to shard the data.
* We need the data to be highly available such that within a given
datacenter/region, we can stand the loss of a host and across the world we
can stand the loss of an entire datacenter/region without human
intervention. This is to have high availability to avoid service downtime.
This relates to data replication needs. We are ok without immediate
consistency such that some writes (in that they're infrequent) could be
lost during failover.
* Though we're intolerant of having reads not be highly available, we
are tolerant of write outages of somewhat short durations.
* The data structure/schema is so simple that we don't have any needs for
advanced data search functionalities (SELECT WHERE, ORDER BY etc.). We only
ever look at data for a single user at a time.
Here is more detail on the prototypes listed above that we hope to
implement :
A) Installing ScaleBase[2] or some other tool which will automate the
process of failover. Possibly look into MySQL 5.6[3] which provides more
master promotion options than the existing version
B) Sheeri Cabral is going to look into Tungsten and let us know how she
sees it fitting with our needs. If it looks applicable we'll bring up a
prototype.
C) With some consultation with Ben Bangert we're going to bring up a
Cassandra installation
-Gene
[1]
https://groups.google.com/d/msg/mozilla.dev.identity/kRzXJNfmQmI/lu4qCIFRUs8J
[2]
http://www.scalebase.com/
[3]
http://dev.mysql.com/tech-resources/articles/whats-new-in-mysql-5.6.html#replication