NoSQL vs Relational DBs - pros and cons

David Hardwick

unread,

Jul 18, 2014, 9:57:22 AM7/18/14

to google-a...@googlegroups.com

And how are you?

My purpose with this email to not to seek out a religious debate on NoSQL vs Relational databases, I'm really looking for someone that has had a great experience with successfully sharding relational databases for multi-tentant applications operating at very large scale, and overcame the associated challenges. (or someone that's combined a NoSQL database and Elastic search, for example, to have relational database like query features atop a NoSQL database).

The reason I seek such a person out is to answer the question: Can I have the querying features of a relational database while at the same time have the ease of maintaining a NoSQL database with respect to scale, replication, and availability when operating a massive, multi-tenant database?

We know there are no magic architectures, and that such comparisons of NoSQL vs Relational database really depend on the need. The need in this case is for:

a highly scalable and reliable database for a multi-tenant application that will serve an estimated 40,000 companies with an average of 1000 users per company and about 10,000 entries per user (or 10 million entries per company spread across about 50 tables).

Given this particular use case, which I admit is still a bit vague, here is a pro/con analysis on NoSQL vs Relational DB based on my experiences with both types of databases.

When NoSQL databases - like Google Datastore, Amazon SimpleDB, Opensource Casandra and MongoDB - are stacked up against this use case, they have the following pros (+) and cons ( - ), IMHO:

+ Scales easily - sharding on a key-blob entry in a NoSQL datastore is fairly easy compared sharding a relational database
+ Replicates easily - for the same reason as sharding
+ Query speed is always fast as each query has to be based on a key index.
+ Easy to update data structures, no schema management to deal with
+ Easy to maintain...in cases of Datastore and SimpleDB, no DBA is needed!
- Ad hoc querying across entities/tables is difficult, typically done in the code rather than by the database
- More upfront data processing work is needed to get things like counts on an entity
- Sorting & filtering on an entity requires pre-defined indexes

When Relational Databases - like SQL Server, Postgres, Oracle, MySQL - are stacked up against this use case (i.e., 30,000 customers with 10M entries each across about 50 tables) have the following pros (+) and cons ( - ), IMHO:

+ Easier to query for ad-hoc reporting needs
+ Easier to throw aggregate queries (group by/ counts/ max/ min) to the database instead of doing it in code
+ Pagination (go to page x of Y ) is doable, even on really large datasets
+ Sorting and filtering on an entity or even between entities is easier; cross entity/table querying is easier
- Scaling is difficult - sharding strategy needs to be figured out ahead of time. How to shard for 100,000 customers, with 10M records for each customers (but those 10M records are stored amoungst 50 different tables)
- Replication and Scaling is complicated (??), need serious expertise / experience
- Queries can drag because there are so many possibilities -- need to have a represented dataset in a non production instance to do pre-analysis and optimization on new code/queries getting introduced. Need to monitor and analyze the query stats on existing queries to see what needs tuning.
- Data structure changes are harder, need to main sequenced DDL and DML scripts, and coordinate deployments to have the database changes go before the code changes
- Need a DBA for replication, backups, performance, scalability, monitoring

It almost comes down to saying...

hey, if you want your product team to be able to do any querying they want, then go with a relational database but the technology team will have to deal with the technical consequences and challenges.

...or ...

hey, you don't want to deal with the technical challenges for maintaining a massively scalable and reliable relational database, so go with NoSQL and have your product team deal with the consequences and challenges.

I lean toward NoSQL because I've never seen a database scale so easily and be so reliable compared to the very large relational database projects I've worked on (and I've seen some MASSIVE Oracle environments and datasets in my work history in both government and commercial installations, but they weren't multi-tenant).

However, if I was able to hear the experiences of someone that's been able to either:

a) figure out how to shard a relational database easily even for 30,000 customers containing 10M records per customer, across 50 tables such that the relational database scales and is as reliable as a NoSQL database, then I would definitely go with a Relational Database.
or
b) Overcome querying deficiencies of a NoSQL database by writing the data to both a NoSQL datastore and something like Elastic Search (which also scales easily). This would make up for a number of the 'cons' of a NoSQL database for this use case based on a small spike project we did last summer. So that's another option I would like to hear from someone about that has implemented this architecture and are fans.

Please let me know if you have any thoughts on any of the above, and I'm most interested in speaking with folks about how they've figured out how to have the benefits of both Relational and NoSQL databases.

Rock on,

Hardwick

Kaan Soral

unread,

Jul 20, 2014, 3:33:58 PM7/20/14

to google-a...@googlegroups.com

Relational databases also require indexes, however NoSQL indexes are much harder to generate and costly

You generally have to build with precision when developing with NoSQL, always knowing what you want and how you want it, build the models and indexes / systems for that

There are also many other gotchas you learn with NoSQL - or AppEngine specifically

But I will say this, I've never felt hopeless while using AppEngine, there is always someone out there to help with issues, or an efficient documentation

Long story short, I've been using AppEngine at large scale for a while now, I would suggest it
(The taskqueue system is also pretty strong, you can easily build a mapping task (map of mapreduce) and iterate over all entities in a relatively short time, so if costs are not a big issue, you can always iterate over everything instead of running a flexible relational db query that you would do otherwise)

(Issues also happen, but there is always the soothing fact that someone else solves them at scale, however, since you are intending for a corporate usage, you might take the responsibility yourself and build a custom system, would be much harder, much more costly)

I'm guessing this isn't the detailed reply you were looking for, but just my .02

David Hardwick

unread,

Jul 21, 2014, 1:13:04 PM7/21/14

to google-a...@googlegroups.com

Thanks, Kaan.

I realize my post looks too mired in opinion. In retrospect, I should have stated that we are using App Engine and the Datastore, and we ARE operating at that scale mentioned (40,000 customers, 10million records each across 50 or so entities), we don't have a DBA, etc.. I just wanted to talk to someone that has either:

a) figured out how to shard a relational database at multi-tenant scale (e.g., 30,000 customers containing 10M records per customer, across 50 tables each) such that the relational database scales and is as reliable as a NoSQL database. For example, did they just shard on a 'namespace' column and shard on that using the database's sharding capabilities vs creating a db/schema for every customer and having to reshuffle these around.

...OR...

b) Overcome querying deficiencies of a NoSQL database by writing the data to both a NoSQL datastore and something like Elastic Search (which also scales easily). This would make up for a number of the 'cons' of a NoSQL database for this use case based on a small spike project we did last summer. So that's another option I would like to hear from someone about that has implemented this architecture and are fans. I've recently heard from a person using a 200M record, multi-tentant app in mysql that used Elastic Search in front of it too for speed reasons.

Thanks again!

Hardwick

--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/txoW1CyVaPs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/d/optout.

--

David Hardwick, CTO

p. 646.237.5388
m. 703.338.0741
a. 3405 Piedmont Road NE, Suite 325, Atlanta, GA 30305
e. david.h...@bettercloud.com

New Guide: Best Practices for Managing Google Apps Using FlashPanel

Jeff Schnitzer

unread,

Jul 21, 2014, 1:25:21 PM7/21/14

to Google App Engine

There is a tech talk video online (several years old at this point) by a Facebook engineer which describes the lengths they went to to get MySQL to scale. You can probably find it with some googling. The upshot is they basically use MySQL as a NoSQL system: Data is stored in schemaless blobs & there are no joins.

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.

David Hardwick

unread,

Jul 21, 2014, 1:39:43 PM7/21/14

to google-a...@googlegroups.com

Thanks, Jeff. Yes, I'm starting to get some calls in with folks and one company told me they don't make the decision at an application level, they make it per table....so they use both relational and NoSQL. Not going to have massive scale needs and need relational? Then they use relational...going to have mind blowing scale (X thousand records per second) they use NoSQL.

Thanks for the tip on the Google search for how Facebook scaled MySQL, I'm seeing some great articles and videos to get through here.

Rock on,

Hardwick

Nick

unread,

Jul 21, 2014, 8:43:44 PM7/21/14

to google-a...@googlegroups.com

I've had very good results on AppEngine mixing the datastore and full text search api.

I use java, so the search api is used to index java objects, and we use Objectify to store the same object in the datastore.

Basic crud operations, or anything required to be transactional, are performed on the datastore. Queries are run through the search api which returns an id. We then fetch the object from the datastore using that id as the single source of 'truth'.

The only downside is that the search service does not participate in transactions, and its consistency semantics are not documented. If you assume eventual consistentency then it's no worse off than the datastore.

As with every service on appengine, there are limitations however, particularly with the type of data you can store and the precision of that data. For example numeric values in the search index are doubles but constrained to the range of an int, and range queries on dates only take into account the date itself, not the time. In most cases, you can make it work for you, but it isn't as easy as it could be.

Reply all

Reply to author

Forward