Galera Load Balancer status and capabilities?

107 views
Skip to first unread message

Otto Kekäläinen

unread,
Feb 8, 2016, 5:02:37 AM2/8/16
to codersh...@googlegroups.com
Hello!

Three questions about GLB:

1) Is https://github.com/codership/glb still usable? It has have very
few commits in latest years. Or do you nowadays prefer some other load
balancing solution over glb?

2) How much of the SQL traffic semantics does glb understand? If it
proxies a query to a Galera node, which is up (TCP connection is OK)
but which is not synced to the cluster or has some other failure and
it returns an error to the SQL query, does the proxy "see" the error
in the reply and redirects traffic to another node?

3) When the proxy detects a failure in a node, will it automatically
re-try the same SQL query to another node or will it error that SQL
query and redirect the SQL query to a healthy node only starting from
the next new query?

I am asking about nr 3 because Nginx has a very nice proxy mode, that
if the HTTP request fails to the proxy upstream, it can automatically
fall-back to a second proxy upstream, and if the HTTP request is
successful there, it will return a successful reponse to the original
client. The original client will never see an error, only a slightly
longer delay in the response (as Nginx needs to make the same request
to multiple upstreams before there is a success). It would be nice to
have this failure tolerance on the Galera proxy level.

- Otto

Philip Stoev

unread,
Feb 8, 2016, 5:12:54 AM2/8/16
to Otto Kekäläinen, codersh...@googlegroups.com
Hi,

> Three questions about GLB:

> 1) Is https://github.com/codership/glb still usable? It has have very
> few commits in latest years. Or do you nowadays prefer some other load
> balancing solution over glb?

Yes, we believe that is it usable and people seem to be using it. We do not
recommend one or another load-balancing solution for Galera.

> 2) How much of the SQL traffic semantics does glb understand? If it
> proxies a query to a Galera node, which is up (TCP connection is OK)
> but which is not synced to the cluster or has some other failure and
> it returns an error to the SQL query, does the proxy "see" the error
> in the reply and redirects traffic to another node?

GLB operates on the TCP level only and does not listen in or understand the
traffic being proxied. Therefore, it does not have the ability to detect
that a given SQL query has returned an error.

GLB considers a server has failed if it no longer responds to TCP
connections. For better error detection, one needs to set a "watchdog"
script that can be made more intelligent, but it will operate outside of the
(SQL) connections being proxied.

> 3) When the proxy detects a failure in a node, will it automatically
> re-try the same SQL query to another node or will it error that SQL
> query and redirect the SQL query to a healthy node only starting from
> the next new query?

GLB will not retry the SQL query. The application will need to detect that
the connection is not good and abort it. Upon reconnecting, glb will send
the new connection to another node provided that glb itself has also
detected that the previous node is down. The application should also start
the entire transaction from the beginning rather than just rerun the failed
query.

Otto Kekäläinen

unread,
Feb 26, 2016, 1:46:43 AM2/26/16
to Philip Stoev, codersh...@googlegroups.com
Hello!

2016-02-08 12:12 GMT+02:00 Philip Stoev <philip...@galeracluster.com>:
>> 3) When the proxy detects a failure in a node, will it automatically
>> re-try the same SQL query to another node or will it error that SQL
>> query and redirect the SQL query to a healthy node only starting from
>> the next new query?
>
>
> GLB will not retry the SQL query. The application will need to detect that
> the connection is not good and abort it. Upon reconnecting, glb will send
> the new connection to another node provided that glb itself has also
> detected that the previous node is down. The application should also start
> the entire transaction from the beginning rather than just rerun the failed
> query.

In my opinion the ideal load balancer would understand the semantics
of the connections, and then route and retry failing requests
automatically. I know that MaxScale from MariaDB has a built-in SQL
parser but I haven't investigated yet how smart it is.

Another new option is Nginx TCP load balancing. Many use Galera on web
servers that already have Nginx and HTTP load balancing, so balancing
traffic to port 3306 would be a small step to do in those cases:
https://www.nginx.com/blog/mysql-high-availability-with-nginx-plus-and-galera-cluster/

The fact that there are so many options available (pen, glb, ha proxy,
maxscale, nginx, mysql drivers with LB features in different
programming languages etc) might indicate that none of them is yet
perfect. So we'll continue our quest to find the best one :)

alexey.y...@galeracluster.com

unread,
Feb 26, 2016, 9:35:11 PM2/26/16
to Otto Kekäläinen, Philip Stoev, codersh...@googlegroups.com
On 2016-02-26 03:46, Otto Kekäläinen wrote:
> In my opinion the ideal load balancer would understand the semantics
> of the connections, and then route and retry failing requests
> automatically.

That's why TCP proxies are immediately ruled out as ideal load balancers
;)

> I know that MaxScale from MariaDB has a built-in SQL
> parser but I haven't investigated yet how smart it is.

MaxScale introduces another component in the system, another parser and
another hop in communication. If you want to go really IDEAL, you would
want this functionality in the driver, on the client. And probably
completely rework the client-server protocol (or create a parallel one),
that would support parsing on client instead of the server.

Reply all
Reply to author
Forward
0 new messages