Performance and other issue of vitess sharding

705 views
Skip to first unread message

gpfe...@gmail.com

unread,
Jan 8, 2015, 11:24:31 PM1/8/15
to vit...@googlegroups.com
Very glad to see this Golang project!

I have two questions:

1. Vitess introduces vtgate and vttablet between app and MySQL, and store sharding related data in zookeeper,  So each query will first go to vtgate, which will acquire schema sharding rules  from zookeeper, and then be routed to vttablet. I think network round trip cost is really a big overhead even in LAN, I have not do any performance tests, but I hesitate when I figure out the architecture.

2. Transaction related: How transaction within single shard is supported in Vitess? Our app is written in Golang, go-sql-driver(https://github.com/go-sql-driver/mysql/) is used, By reading the source code, I got that it implements transaction Begin() as 'START TRANSACTION', which do not contain sharding  information, So I don't know which vttablet should receive and execute this sql command, I really want to know how Vitess do this, defer until the next sql command which includes sharding information? or switch on/off the autocommit variable?

Thanks.

Sugu Sougoumarane

unread,
Jan 9, 2015, 12:36:27 AM1/9/15
to gpfe...@gmail.com, vitess
On Thu, Jan 8, 2015 at 8:24 PM, <gpfe...@gmail.com> wrote:
Very glad to see this Golang project!

I have two questions:

1. Vitess introduces vtgate and vttablet between app and MySQL, and store sharding related data in zookeeper,  So each query will first go to vtgate, which will acquire schema sharding rules  from zookeeper, and then be routed to vttablet. I think network round trip cost is really a big overhead even in LAN, I have not do any performance tests, but I hesitate when I figure out the architecture.
As a general principle, you have to be willing to give up some efficiency to achieve scalability. However, there are a few areas where Vitess makes MySQL more efficient. So, we kind of get a refund on the additional network hops we add:
1. Connection pooling: The cost of a MySQL connection is approximately 250k [citation needed], whereas the cost of a vitess connection is considered negligible.
2. Rowcache: MySQL's buffer cache is not very efficient for random access reads, which vitess makes up for.
3. Other query protection mechanisms, like results reuse, blacklisting, etc.

In our benchmarks, each network hop adds about 1ms. So, we're looking at around 2ms of additional latency overall. But, YMMV. So, you should just benchmark it for yourself, just in case.
In my opinion, the other benefits of vitess far outweigh this overhead.
 

2. Transaction related: How transaction within single shard is supported in Vitess? Our app is written in Golang, go-sql-driver(https://github.com/go-sql-driver/mysql/) is used, By reading the source code, I got that it implements transaction Begin() as 'START TRANSACTION', which do not contain sharding  information, So I don't know which vttablet should receive and execute this sql command, I really want to know how Vitess do this, defer until the next sql command which includes sharding information? or switch on/off the autocommit variable?
 
We mean that the app is responsible for ensuring that its transactions don't cross shard boundaries. If you sent statements that went to different shards as part of a single transaction, VTGate will try to commit them in statement order, but it's a best effort. A failure half way could lead to data hanging. We've debated adding a flag to explicitly disallow multi-shard transactions. We may implement it if there's a demand.
We're also studying the feasibility of adding 2PC support.

PS: I covered many of these topics in my presentation at @Scale. You can take a look at the video if you haven't already: http://youtu.be/5yDO-tmIoXY.

Anthony Yeh

unread,
Jan 9, 2015, 1:10:44 AM1/9/15
to Sugu Sougoumarane, gpfe...@gmail.com, vitess
I'd like to add some other clarifications about the topology:

1. VTGate caches the routing rules from ZooKeeper. The normal serving path for a query does not involve ZooKeeper. A typical query just goes App -> VTGate -> VTTablet -> MySQL.

2. Although VTTablet sits in front of MySQL, it is always on the same physical machine with MySQL (1:1 correspondence). So VTTablet is not really an extra network hop.

--
You received this message because you are subscribed to the Google Groups "vitess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vitess+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

gpfe...@gmail.com

unread,
Jan 9, 2015, 1:17:19 AM1/9/15
to vit...@googlegroups.com, gpfe...@gmail.com
Thanks for the quick reply

I didn't mean cross-shard transaction in the second question, I just want to know which vttablet shoud execute "START TRANSACTION" when a transaction arrives at vtgate.

Maybe I should just analyse that from the source code.

Thanks again.
Message has been deleted

gpfe...@gmail.com

unread,
Jan 9, 2015, 1:25:53 AM1/9/15
to vit...@googlegroups.com, sou...@google.com, gpfe...@gmail.com
That's sounds reasonable, ZooKeeper will become the bottleneck if every query must decide the destination vttablet by it.

Thanks

gpfe...@gmail.com

unread,
Jan 9, 2015, 2:11:56 AM1/9/15
to vit...@googlegroups.com, gpfe...@gmail.com
Vitess has done more than I imagined, it has re-implement the mysql driver for golang.

Transaction problems can't be solved if go-sql-driver(https://github.com/go-sql-driver/mysql/is used. 

Sugu Sougoumarane

unread,
Jan 9, 2015, 2:30:00 AM1/9/15
to gpfe...@gmail.com, vitess
If I'm reading your question correctly, you've written an app that uses https://github.com/go-sql-driver/mysql/ and want to know if you can just point that at vitess instead of mysql.
The simple answer would be no, because vitess doesn't implement the mysql protocol. It's an rpc protocol instead.
However, we're planning to implement a database/sql compliant driver because somebody else has asked for the same thing.
So, if you're using your driver through database/sql package, you should be able to replace that with our driver.

Of course, you'll also have to bring up a vitess cluster behind all of this.

gpfe...@gmail.com

unread,
Jan 9, 2015, 3:09:20 AM1/9/15
to vit...@googlegroups.com, gpfe...@gmail.com
Yes, I got it, Vitess actually use the mysql C API to handle mysql protocol packets and support transaction by re-designing and implementing database driver interface which is similar to but different from built-in database/sql/driver, 

while https://github.com/go-sql-driver/mysql/ can't support transaction in sharding scenario, because of it's implementation of database/sql/driver.Conn.Begin(), which is just like that:

func (mc *mysqlConn) Begin() (driver.Tx, error) {
if mc.netConn == nil {
errLog.Print(ErrInvalidConn)
return nil, driver.ErrBadConn
}
err := mc.exec("START TRANSACTION")
if err == nil {
return &mysqlTx{mc}, err
}

return nil, err
}Enter code here...


Reply all
Reply to author
Forward
0 new messages