Persona data storage improvement discussion and next steps

Gene Wood

unread,

May 12, 2013, 12:39:40 PM5/12/13

to dev-id...@lists.mozilla.org, Mozilla Services Operations, bban...@mozilla.com, Sheeri Cabral

TL;DR: We discussed higher availability database options in a work
week session, and we identifies three solutions to prototype:
A) sticking with a single-master MySQL cluster, focusing on improving
failover and error messaging;
B) a multi-master, multi-region MySQL cluster (possibly using Tungsten);
C) a multi-region Cassandra ring.
*We'd like suggestions from the community on solutions that we didn't
consider which satisfy the requirements.*

This email follows on an initial DB planning message that Jared sent a few
weeks ago[1], see there for background and a list of requirements.

We were able to talk through and call out some very specific constraints
and opportunities related to our data storage choice :
* Low read latency is not very important because so much of persona is
intentionally CPU bound, effectively hiding any other latency behind 500ms
of compute time
* The read/write ratio is very read heavy and very write light
* Any existing instances in persona of writes followed closely by reads are
not desired/required and will be removed. This effectively removes a need
for immediate consistency
* The data set is small and is expected not to grow beyond the storage on a
single server effectively removing the need to shard the data.
* We need the data to be highly available such that within a given
datacenter/region, we can stand the loss of a host and across the world we
can stand the loss of an entire datacenter/region without human
intervention. This is to have high availability to avoid service downtime.
This relates to data replication needs. We are ok without immediate
consistency such that some writes (in that they're infrequent) could be
lost during failover.
* Though we're intolerant of having reads not be highly available, we
are tolerant of write outages of somewhat short durations.
* The data structure/schema is so simple that we don't have any needs for
advanced data search functionalities (SELECT WHERE, ORDER BY etc.). We only
ever look at data for a single user at a time.

Here is more detail on the prototypes listed above that we hope to
implement :
A) Installing ScaleBase[2] or some other tool which will automate the
process of failover. Possibly look into MySQL 5.6[3] which provides more
master promotion options than the existing version
B) Sheeri Cabral is going to look into Tungsten and let us know how she
sees it fitting with our needs. If it looks applicable we'll bring up a
prototype.
C) With some consultation with Ben Bangert we're going to bring up a
Cassandra installation

-Gene

[1]
https://groups.google.com/d/msg/mozilla.dev.identity/kRzXJNfmQmI/lu4qCIFRUs8J
[2] http://www.scalebase.com/
[3]
http://dev.mysql.com/tech-resources/articles/whats-new-in-mysql-5.6.html#replication

Dirkjan Ochtman

unread,

May 12, 2013, 3:50:43 PM5/12/13

to Gene Wood, Mozilla Services Operations, bban...@mozilla.com, Sheeri Cabral, dev-id...@lists.mozilla.org

On Sun, May 12, 2013 at 6:39 PM, Gene Wood <ge...@mozilla.com> wrote:
> TL;DR: We discussed higher availability database options in a work
> week session, and we identifies three solutions to prototype:
> A) sticking with a single-master MySQL cluster, focusing on improving
> failover and error messaging;
> B) a multi-master, multi-region MySQL cluster (possibly using Tungsten);
> C) a multi-region Cassandra ring.
> *We'd like suggestions from the community on solutions that we didn't
> consider which satisfy the requirements.*

At work, I've been very happy with CouchDB. It:

- works very well with read-heavy loads
- has insanely simple multi-master replication, no complexity at all
- the REST API is very clean, easy to use
- it should integrate well with node.js stuff (npm is built on it IIRC)

I don't have any experience with MySQL replication or Cassandra
myself, but from what I've heard Cassandra is pretty complex, and
MySQL is maybe not so complex but also not quite reliable. CouchDB
uses MVCC with append-only file stores and has never gotten any
corruption in the 4 years I've been using it.

(Also, I'm quite happy to field any questions or generally get my
hands dirty with CouchDB if it's selected.)

Cheers,

Dirkjan

P.S. Sorry, I guess I didn't read the previous thread close enough or
I'd have commented sooner.

Ben Bangert

unread,

May 12, 2013, 5:13:23 PM5/12/13

to Dirkjan Ochtman, Mozilla Services Operations, Sheeri Cabral, Gene Wood, dev-id...@lists.mozilla.org

On May 12, 2013, at 12:50 PM, Dirkjan Ochtman <dir...@ochtman.nl> wrote:

> At work, I've been very happy with CouchDB. It:
>
> - works very well with read-heavy loads
> - has insanely simple multi-master replication, no complexity at all
> - the REST API is very clean, easy to use
> - it should integrate well with node.js stuff (npm is built on it IIRC)

The main issues with CouchDB I've had are:
- Views need to be written in Javascript, they're separate from the code-base, so all the complexity in maintaining and versioning stored procedures occurs. Also, new views need to be built in the background if the dataset is large as they can take awhile, additional data does incremental updates so that doesn't hurt at least. This means you don't really *update* a view, so much as make a new one entirely, materialize it in the background, then switch to using it.
- The multi-master replication is indeed super easy, unless there's inconsistencies, in which case you can either have the last update win, or worse, resolve the inconsistent updates yourself (more code).
- It's not terribly fast (maybe not an issue), this is due to how it actually fsync's during every write (because it has a habit of crashing, though since it fsync's before ack'ing, I've never ever lost data and its back up immediately).

I completely agree with Dirkjan on his other points, thanks to all the views being materialized with incremental updates, they're all consistently very fast. I built the prior version of PylonsHQ on CouchDB and generally enjoyed it quite a bit despite my issues I had above.

> I don't have any experience with MySQL replication or Cassandra
> myself, but from what I've heard Cassandra is pretty complex, and
> MySQL is maybe not so complex but also not quite reliable. CouchDB
> uses MVCC with append-only file stores and has never gotten any
> corruption in the 4 years I've been using it.

Yep, same here. Rock solid on storing the data.

Cheers,
Ben

Dirkjan Ochtman

unread,

May 13, 2013, 2:27:13 AM5/13/13

to Ben Bangert, Mozilla Services Operations, Sheeri Cabral, Gene Wood, dev-id...@lists.mozilla.org

On Sun, May 12, 2013 at 11:13 PM, Ben Bangert <bban...@mozilla.com> wrote:
> The main issues with CouchDB I've had are:
> - Views need to be written in Javascript, they're separate from the code-base, so all the complexity in maintaining and versioning stored procedures occurs. Also, new views need to be built in the background if the dataset is large as they can take awhile, additional data does incremental updates so that doesn't hurt at least. This means you don't really *update* a view, so much as make a new one entirely, materialize it in the background, then switch to using it.

It's fairly easy to write some tools to sync your views from a version
controlled repository to the design documents, though.

> - The multi-master replication is indeed super easy, unless there's inconsistencies, in which case you can either have the last update win, or worse, resolve the inconsistent updates yourself (more code).

That might be an issue depending on how the storage architecture is
setup (i.e. what's the chance of conflicts).

> - It's not terribly fast (maybe not an issue), this is due to how it actually fsync's during every write (because it has a habit of crashing, though since it fsync's before ack'ing, I've never ever lost data and its back up immediately).

I think this isn't true anymore; the default in current versions is to
fsync every second or so. Still, if you have a read-heavy load, this
shouldn't be much of an issue anyhow.

Cheers,

Dirkjan

Jared Hirsch

unread,

May 14, 2013, 1:32:01 PM5/14/13

to Dirkjan Ochtman, Gene Wood, Ben Bangert, Mozilla Services Operations, Sheeri Cabral, dev-id...@lists.mozilla.org

Who's interested in helping build prototypes in the next few weeks? Please
ping this thread + let's get to it :-)

Here's my suggested timeline from here:
* 2-3 weeks to build prototypes
* 2-3 weeks to evaluate prototypes
* End of Q2 (end of June): select a winner
* Q3: actually put the winner into production

Dirkjan--you're totally welcome to prototype a CouchDB solution. We
narrowed to 3 options mainly because we don't have more people to help. (If
we had more time/people, I'd personally really like to try using durable
queues to handle cross-region replication.) Prototyping would entail
building an equivalent of [1] and [2], then working with ops to stand up a
stack in AWS.

Also, thanks for the fantastic summary, Gene :-)

Cheers,

Jared

[1] https://github.com/mozilla/browserid/blob/dev/lib/db/mysql.js
[2] https://github.com/mozilla/browserid/blob/dev/lib/db/json.js

> _______________________________________________
> dev-identity mailing list
> dev-id...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-identity
>