The meeting started promptly at 6:30, actually it almost started early thanks to these overachievers Alex, Bill, Dan, Christian and Ron.
We had a number of newcomers including two BSD users (one freebsd and another openbsd)
The presentation started after introductions and the topic was on Vitess which is an OSS mysql scaling and sharding solution.
When using relational DBs you often get to a point when you start asking these questions:
Too many transactions? Too much Data? Too much risk? Vitess has you covered!!
But if you aren't asking these questions you likely don't need to worry about all of this sharding stuff...
Also remember your database is important and websites don't sleep!
heed these words, otherwise angry customers will leave you nastygrams reminding you that you should be providing a REAL service that doesn't suck!
Option 1. Not shart, shard! (DIY method)
* Sharding a relational database is basically splitting your app data on a boundary or key that allows scaled app writes!
* Sharding is also hard! How do I divide my data? How do I route my data? How do I connect to multiple database schemas/nodes?
Dividing data options?
* vertical or functional sharding: model/table
* horizontal: by some sharding key
Routing data options?
* hard coded by model
* range based?
* lookup based?
* Multiple database connection options?
* This is the hardest part to figure out in a DIY sharding solution and will require custom libraries in your code to handle the connections to multi-database hosts
Doing this the DIY way is HARD and is a bit of a mess. There are many ways to mess this up too like choosing the wrong sharding key or outgrowing your largest shard...
Option 2. Just skip relational DB (mysql) and run NoSQL instead!
* MongoDB is often considered an option for webscale databases, this talk doesn't explore this option unfortunately.
* based on my own subjective experience MongoDB replicasets and sharding is heavyweight and resource intensive but I have not used MongoDB at cloudscale and would be hesitant to try...
We had some discussion on things such as moving production databases to RDS and CloudSQL and the woes that come with it. Also some talk on if there is any cost savings in managed database cloud hosting which generally there isn't and probably costs more.
Option 3. Vitess
* created at YouTube in 2010
* Open Soure, written in golang
* CNCF project since Feb 2018
* Now PlanetScale is who does most of the development
* Slack, Square, HubSpot and Flipkart (walmart primary owner) uses Vitess
What is Vitess?
* Vitess is actually best described as a database proxy (vtgate) and a middleware (vttablet) which uses mysql as its data backend
* It simplifies app->database architecture because your app will only need to talk directly to vtgate which will automatically route traffic
* vtgate is a stateless proxy and uses a metadata store (Topology) to determine where to find sharded data (uses etcd, consul, or zookeeper)
* vtgate can also do read and write routing to spread load evenly
More Vitess terms
* Cell: Some datacenter boundary that contains a set of Tablets, a vtgate pool and the app servers that use the cluster
* Keyspace: a logical database (aka schema)
* Shard: a division within a keyspace, typically consisting of a mysql master and many replicas
* Vschema: how data is organized within keyspaces and shards
Features:
* Connection pooling
* Query Guardrails (built-in long running query kill, row returned limits)
* Translucent Sharding both Vertical and Horizontal
* M -> N shard materialized views (new with vReplication)
* Best used for OLTP workloads, not so much for OLAP (datawarehousing)
* supports multiple sharding options!
Setting it up
Avoid trying the k8s or minikube method and just go with a Debian 10 VM and build it out that way
Rails on Vitess:
Vitess does not allow certain SQL commands to work...
After applying the patch it appears to work and we got to see it in Alex's live demo
Final thoughts
Assumes you have a huge app and ops team. Unlocks potential of sharding/resharding your data.
Gives you a playbook for scaling to YouTube Scaled Workloads!