February meeting notes

2 views
Skip to first unread message

Aaron Johnson

unread,
Feb 11, 2020, 9:12:48 PM2/11/20
to QCLUG

The meeting started promptly at 6:30, actually it almost started early thanks to these overachievers Alex, Bill, Dan, Christian and Ron.

We had a number of newcomers including two BSD users (one freebsd and another openbsd)

The presentation started after introductions and the topic was on Vitess which is an OSS mysql scaling and sharding solution.

When using relational DBs you often get to a point when you start asking these questions:
Too many transactions? Too much Data? Too much risk? Vitess has you covered!!
But if you aren't asking these questions you likely don't need to worry about all of this sharding stuff...

Also remember your database is important and websites don't sleep!
heed these words, otherwise angry customers will leave you nastygrams reminding you that you should be providing a REAL service that doesn't suck!

Option 1. Not shart, shard! (DIY method)
  * Sharding a relational database is basically splitting your app data on a boundary or key that allows scaled app writes!
  * Sharding is also hard! How do I divide my data? How do I route my data? How do I connect to multiple database schemas/nodes?

Dividing data options?
  * vertical or functional sharding: model/table
  * horizontal: by some sharding key

Routing data options?
  * hard coded by model
  * range based?
  * lookup based?

* Multiple database connection options?
  * This is the hardest part to figure out in a DIY sharding solution and will require custom libraries in your code to handle the connections to multi-database hosts

Doing this the DIY way is HARD and is a bit of a mess. There are many ways to mess this up too like choosing the wrong sharding key or outgrowing your largest shard...

Option 2. Just skip relational DB (mysql) and run NoSQL instead!

  * MongoDB is often considered an option for webscale databases, this talk doesn't explore this option unfortunately.
  * based on my own subjective experience MongoDB replicasets and sharding is heavyweight and resource intensive but I have not used MongoDB at cloudscale and would be hesitant to try...

We had some discussion on things such as moving production databases to RDS and CloudSQL and the woes that come with it. Also some talk on if there is any cost savings in managed database cloud hosting which generally there isn't and probably costs more.

Option 3. Vitess

* created at YouTube in 2010
* Open Soure, written in golang
* CNCF project since Feb 2018
* Now PlanetScale is who does most of the development
* Slack, Square, HubSpot and Flipkart (walmart primary owner) uses Vitess

What is Vitess?

* Vitess is actually best described as a database proxy (vtgate) and a middleware (vttablet) which uses mysql as its data backend
* It simplifies app->database architecture because your app will only need to talk directly to vtgate which will automatically route traffic
* vtgate is a stateless proxy and uses a metadata store (Topology) to determine where to find sharded data (uses etcd, consul, or zookeeper)
* vtgate can also do read and write routing to spread load evenly

More Vitess terms

* Cell: Some datacenter boundary that contains a set of Tablets, a vtgate pool and the app servers that use the cluster
* Keyspace: a logical database (aka schema)
* Shard: a division within a keyspace, typically consisting of a mysql master and many replicas
* Vschema: how data is organized within keyspaces and shards

Features:

* Connection pooling
* Query Guardrails (built-in long running query kill, row returned limits)
* Translucent Sharding both Vertical and Horizontal
* M -> N shard materialized views (new with vReplication)
* Best used for OLTP workloads, not so much for OLAP (datawarehousing)
* supports multiple sharding options!

Setting it up

Avoid trying the k8s or minikube method and just go with a Debian 10 VM and build it out that way
Then follow the instruction on https://vitess.io/docs/get-started/local

Rails on Vitess:

Vitess does not allow certain SQL commands to work...
But... You can patch Rails to work with Vitess! - https://github.com/ajmaidak/rails/tree/ajm-ar-vitess
After applying the patch it appears to work and we got to see it in Alex's live demo

Final thoughts

Assumes you have a huge app and ops team. Unlocks potential of sharding/resharding your data.
Gives you a playbook for scaling to YouTube Scaled Workloads!

Reply all
Reply to author
Forward
0 new messages