Is Vitess right for us?

Seth Ammons SG

unread,

May 7, 2016, 2:50:34 PM5/7/16

to vitess

Heyo!

At Sendgrid, we are exploring ways to improve (ie, replace) our existing mysql database abstraction layer. Vitess seems like a really solid candidate and would more than likely be a more robust alternative to a homegrown solution.

Does anyone have experience taking a mature set of databases (already with master/read setup, sharding, some tables in the terabyte range and needing sharding) and moving them over to Vitess? While we would like the ability to move over whole databases, it is not a hard requirement; it is possible that we would only develop new services against a new abstraction layer.

Any guidance or advice would be greatly appreciated!

Seth Ammons SG

unread,

May 7, 2016, 2:54:47 PM5/7/16

to vitess

ah, another question I had: when querying through Vitess, we still have support for normal mysql queries, yeah? Like limit/offset, "on duplicate key ignore", sub queries, and pretty much anything else you can do in mysql? From the examples I've seen so far, it is just showing some simple selects. Thanks!

Sugu Sougoumarane

unread,

May 7, 2016, 7:34:37 PM5/7/16

to vitess

I can give you some preliminary answers. We'll probably need to drill down more before you can be more certain.

In terms of production-worthiness, Vitess is used by YouTube and handles a pretty large volume of data and QPS. There are also other serious users that are working at migrating to Vitess. Hopefully, they'll share their experiences here.

In our opinion, Vitess is the "best out there" if you want to scale out MySQL :). We come close to a drop-in replacement, but are not there yet. We're working hard at closing this gap. But you also have to take reality into consideration. Scaling out a database will require some trade-offs. If all the data is not in one place, things will be less efficient, less consistent, or slower. It all depends on what the app can afford to tolerate.

In terms of query mix, Vitess should support most queries that target a single shard. It can also handle trivial cross-shard joins. This is usually sufficient for most apps that have a good sharding strategy. Otherwise, even if we can handle a query, things will get too slow if data has to be gathered from too many places.

I assume you've already been to vitess.io. You should definitely try downloading vitess and give it a try. Let us know if you have more questions. We love to help :).

--
You received this message because you are subscribed to the Google Groups "vitess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vitess+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Seth Ammons SG

unread,

May 9, 2016, 8:11:42 PM5/9/16

to vitess

Cool. Thanks. I went through the getting started guide. I have a few questions (I may have just missed them as I went over things).

For the metadata store, can that be Consul, or are we restricted to ZK/Etcd?
In general, would you think that starting off with only new tables and services and then working to get our bigger databases moved in over time is appropriate?
Do developers/DBAs still access the mysql client? In dev, it is pretty common for folks to explore the db(s) with something like Sequel Pro; is that still available, or is there a different prescribed workflow?
For connection pooling, are we able to specify timeouts to make sure unused resources are released?
For rowcache (I am under the impression that this is memoization), how does one control cache invalidation? Assuming it is a timeout, what granularity of control is there?
I see that transaction support exists within a shard; how does vitess make sure that multiple related transaction queries are routed to the same host? My thinking being you may have writes and some reads in a single transaction that could go to the master or one of many read-only instances, and I would think you would have to guarantee they all go to the same mysql instance.
Is there support for a multi-master approach (or other solution) for when you have writes to the same db/table from different data centers where you would want a write master available with low latency and then replicate across to the other datacenter(s) for later reading?
How often do you see read and write errors? Is there built in support for retries?
For metrics, we are hoping to collect query execution time, success counts, error counts, total data bytes/sec, number of active connections, number of idle connections, and I'm sure others that I'm forgetting. Do we have a way to collect this info?
For traceability, I believe I saw that there is a query killer; are we able to determine in what app/service the query originated? Same for any expensive queries.
And on the practical side, I'm curious what this task would look like (something we are doing this week): we have a table that is just too large. We are migrating it to a new schema that will have less fields and prevent duplicate keys and the table will be sharded based on two columns. What would this task look like in Vitess? In our current infrastructure, this would take on the form of: put out the new schema, begin reading from both, begin to only write to the new one, and run a series of background scripts in parallel to migrate old data to the new table (draining it as we go along).

Welp, those are the questions off the top of my head! I know that is a lot, and so I really do thank you for taking the time to read through them.

Anthony Yeh

unread,

May 9, 2016, 9:27:37 PM5/9/16

to vit...@googlegroups.com

Please see answers inlined:

On Mon, May 9, 2016 at 5:11 PM 'Seth Ammons SG' via vitess <vit...@googlegroups.com> wrote:

Cool. Thanks. I went through the getting started guide. I have a few questions (I may have just missed them as I went over things).
For the metadata store, can that be Consul, or are we restricted to ZK/Etcd?

We use a plugin architecture for the metadata store. It should be straightforward to write a plugin for Consul.

In general, would you think that starting off with only new tables and services and then working to get our bigger databases moved in over time is appropriate?

If there's no urgency in migrating existing databases, starting with new services first is a good strategy.

Do developers/DBAs still access the mysql client? In dev, it is pretty common for folks to explore the db(s) with something like Sequel Pro; is that still available, or is there a different prescribed workflow?

For DBA activities, we do often use tools that talk directly to mysqld, unaware of Vitess. For exploring, you can also point tools at the underlying mysqld, as long as you're ok with each shard appearing as a separate database. If your exploration tool is open source and uses one of the interfaces for which we have Vitess wrappers (JDBC, PDO, PEP 249, Go database/sql), you might be able to get it working in shard-transparent mode (looking in via Vitess) by swapping our library in place of the default MySQL one. Flipkart are doing this with JDBC, which is why they've implemented much of the management interfaces in that spec, although I forget which exploration tool they're running on top of it.

The important thing is that during filtered replication (while you're in the process of doing a resharding event), you need to make sure no DMLs bypass Vitess. That's because they would lack the annotation that Vitess needs to keep both old and new shards up to date by routing binlog events.

For connection pooling, are we able to specify timeouts to make sure unused resources are released?

Yes, there are server-side timeouts for both DQLs and transactions at the vttablet level. We're working on a "launch doc" that explains all the important knobs, but the first place to check in the meantime is all the flags starting with '-queryserver-*' in the listing of 'vttablet -help'.

For rowcache (I am under the impression that this is memoization), how does one control cache invalidation? Assuming it is a timeout, what granularity of control is there?

The vttablet rowcache automatically invalidates itself by listening to its own binlog stream (which requires log-slave-updates). So the cache is always approximately as up-to-date as the replica itself. With that said, Sugu has some plans to recommend a different caching integration in Vitess, and he's thinking about deprecating the current rowcache. I'd recommend spawning a separate thread for that to get his attention if you're interested in discussing more.

I see that transaction support exists within a shard; how does vitess make sure that multiple related transaction queries are routed to the same host? My thinking being you may have writes and some reads in a single transaction that could go to the master or one of many read-only instances, and I would think you would have to guarantee they all go to the same mysql instance.

Once you start the transaction, the Vitess client library maintains a session cookie for you (under the covers) that ensures any other queries in that transaction go to a specific MySQL connection, which is reserved for you from the connection pool until you finish (commit, rollback, or timeout). If you do a read from within a transaction, it will also go to that same connection. So if you open a master transaction, both reads and writes on that transaction will go to the master.

Is there support for a multi-master approach (or other solution) for when you have writes to the same db/table from different data centers where you would want a write master available with low latency and then replicate across to the other datacenter(s) for later reading?

Currently, writes for a given shard have to go to the one, global master. It is possible to have each shard's master in a different datacenter, though. If your data set is amenable, you could define a custom sharding scheme that's keyed by some location field. Sugu also had some ideas a while ago for globally low latency eventually consistent writes, but we haven't had any need for it yet.

How often do you see read and write errors? Is there built in support for retries?

For reads, there are certain types of errors that vtgate might retry for you, if trying on a different tablet might help (for example, mysql connection errors). For writes, there is only one master so there's nowhere else for vtgate to try. In that case, I think we currently just return the error and expect the client to retry, but we are working on a write buffering solution within vtgate that would avoid this.

For metrics, we are hoping to collect query execution time, success counts, error counts, total data bytes/sec, number of active connections, number of idle connections, and I'm sure others that I'm forgetting. Do we have a way to collect this info?

Off the top of my head, I think all of those are exported at /debug/vars by vtgate and vttablet, using Go expvar format (JSON). We have a plugin to push those to InfluxDB, and have plans to export those variables in Prometheus format as well.

For traceability, I believe I saw that there is a query killer; are we able to determine in what app/service the query originated? Same for any expensive queries.

The Vitess client has a feature called Caller ID, where the app can specify a username and a component/subcomponent. The username can be used to control access permissions with the Vitess Table ACL feature (something you otherwise would lose whenever you do connection pooling). The component/subcomponent can be used to narrow down which part of the app sent the query. They get exported in the /debug/vars (so you can use a TSDB to plot various stats per user/component), and reported in vttablet logs (e.g. when it kills a slow query).

And on the practical side, I'm curious what this task would look like (something we are doing this week): we have a table that is just too large. We are migrating it to a new schema that will have less fields and prevent duplicate keys and the table will be sharded based on two columns. What would this task look like in Vitess? In our current infrastructure, this would take on the form of: put out the new schema, begin reading from both, begin to only write to the new one, and run a series of background scripts in parallel to migrate old data to the new table (draining it as we go along).

I'm not sure I fully understand the exact scenario, but in general our toolkit for these kinds of migrations includes:

Pivot schema change: This lets us do slow ALTERs (anywhere from 30s to days), similar to online schema change, but only one server does the work, and then it's restored everywhere else. So we can drop columns, etc. on large tables without any intermediate phase in which updates have to be propagated from one table to another. We're working on a guide that shows how to do this through Vitess.
Vertical split: Vitess can move tables out of a keyspace (logical database) into a new keyspace. During the migration, vtgate handles redirects for you (sending queries/updates to the right place) and vttablet replicates from old to new (via filtered replication) until you do the final switch-over.
Horizontal split: This is the regular resharding workflow as seen in the guide on vitess.io.

So we would probably do a pivot to get the table schema into the final unsharded form, then do a horizontal split to go from unsharded (1 shard) to N shards. During the pivot, the app's queries would have to work against either schema.

Sorry if I've totally missed the point of your question. If I did, it would help if you could maybe give a reduced sample of what the initial and final schemas are supposed to look like.

Bob Violier

unread,

May 10, 2016, 2:49:09 PM5/10/16

to vitess

I'm also quite interested in kind of the same use case as Seth is providing. I even would like to extend the question about multi-master over several data centers.

Currently, writes for a given shard have to go to the one, global master. It is possible to have each shard's master in a different datacenter, though. If your data set is amenable, you could define a custom sharding scheme that's keyed by some location field. Sugu also had some ideas a while ago for globally low latency eventually consistent writes, but we haven't had any need for it yet.

We were also thinking about a sharding scheme which is based on the user's location (or at least the location of his account). This would make sure that all his data is on a specific set of masters in a specific region, and would make sure that reads and writes can happen close by, but reading your answer, it seems that this would not be an option at the moment, because everything still has to go through one global master, is that correct? We are looking into this because we have a lot of writes, that need to happen fast and doing all writes on a server that has a response time of 250ms (Asia->Europe, USA->Europe, etc), is just too much for our use case (I think that Sendgrind is having the same issue @Seth?). Also storing data in only a specific region could be a privacy plus.

I do wonder, how does Youtube solve this problem? Is everything just written in one place (let's say a DC in the US) and then distributed over all read servers around the world? So a write from the Netherlands would then go to the USA to then be synched to the Netherlands? (how about comments?)

Thanks!

Anthony Yeh

unread,

May 10, 2016, 10:43:05 PM5/10/16

to vit...@googlegroups.com

Bob,

Your region-based plan is indeed doable; it's actually what I was referring to. What I meant is that each region you want to define will need its own shard (or set of shards) because each shard has only one master. The master of each shard can be in a different DC, and there's no requirement to go outside the DC if a DML comes into the same DC as the master.

At YouTube we use distributed pipelines for certain updates that are latency-critical. Those pipelines then feed asynchronously into Vitess, which is the source of truth. For other updates, we do the hop across regions to the global master.

--

arth...@github.com

unread,

Aug 17, 2017, 1:13:36 PM8/17/17

to vitess

Hey Seth,

I am curious, did you end-up using Vitess, would you mind sharing your discoveries?

Thanks

Seth Ammons SG

unread,

Aug 17, 2017, 2:36:51 PM8/17/17

to vitess

Hi! We ended up looking at two potential solutions: Vitess or ProxySQL. We liked what Vitess had to offer, but we felt that it was something that we would have to jump in with both feet. We did not see a gradual transition plan. We felt that ProxySQL could give us the crawl, walk, run approach we needed (allowing a more iterative build, measure, learn curve) while allowing others in our org to use our legacy system(s) for data access. We now have ProxySQL running in production doing some heavy lifting.

--
You received this message because you are subscribed to a topic in the Google Groups "vitess" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/vitess/PtI8z8k7CPc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to vitess+un...@googlegroups.com.

Reply all

Reply to author

Forward