Issues when automatically running DDL queries on Galera cluster

Ciprian Stoica

unread,

Sep 30, 2015, 8:26:03 AM9/30/15

to codership

Hi,

We have a Java web application that runs on WildFly 8.2 with MariaDB 10. We are currently evaluating the migration from a single MariaDB server to a three-node Galera/MariaDB cluster with HAProxy 1.5 as a load balancer.

All works fine when only DML commands are issued against the cluster. But there are situations when our Java app runs DDL commands like CREATE TABLE or ALTER TABLE. For example, in the web interface it is possible to create new customers. Any such customer gets 50 new tables to store its data.

Suppose we add the customer with the id 123, the application will run the following mix of DDL and DML queries:

-- start customer creation
create table Table1_123 (id int not null,.., primary key (id));
create table Table2_123 (id int not null, tid int not null,..., primary key (id));
....
alter table Table2_123 add constraint constraintName foreign key (tid) references Table1_123 (id);
...
insert into Table1_123 ...; -- insert one or more values
insert into Table2_123 ...; -- insert one or more values
...
-- define tables and insert data up to Table50_123

We have wsrep_osu_method=toi (the default).

After creating such a customer, I check the newly created tables and many time I see inconsistencies between the nodes. The tables rhemselves were properly replicated, but not their content. Some new tables on some node contains data that hasn't been replicated to the other nodes.

I made some tests and only connected the Java app to a single node, directly, avoiding HA Proxy. I got the same inconsistencies but the difference was that only the node that the app directly connected to was holding un-replicated data. In the previous test the un-replicated data was scattered on all the nodes.

I don't understand this behavior. I don't get any relevant errors in the logs.

Running such DDL queries is something that happens often. Of course, we would like to do them automatically instead of manually.

What would be the best approach for such situation, when the app runs DDLs as wel as DMLs?

Our priorities are the following (in this order):

1. Maintain consistency during and after DDL are run.

2. Do the fewest changes in the application code.

3. If possible run DDLs automatically and not manually.

Thank you,

Ciprian

Philip Stoev

unread,

Sep 30, 2015, 9:50:59 AM9/30/15

to Ciprian Stoica, codersh...@googlegroups.com

Hello,

DDL as you describe them should work just fine with Galera. There should be
no need for application changes or anything like that.

You can check the following:
- There should be at least something in the MySQL error logs of your nodes.
- When you say that some data is not replicated, what data is that? Can you
demonstrate with SELECT queries what data is missing?
- SHOW STATUS LIKE 'wsrep_last_committed'; on all nodes must show the same
value. Is that the case?
- as a last resort, you can enable --log-bin and --log-slave-updates on all
nodes. This will create a binary log on every node that will record all the
updates that the node has received. You can then compare them using the
mysqlbinlog utility to check if they all contain the updates in question.

Philip Stoev

--
You received this message because you are subscribed to the Google Groups
"codership" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

alexey.y...@galeracluster.com

unread,

Oct 1, 2015, 3:32:57 AM10/1/15

to Ciprian Stoica, codership

It is very likely that your application creates MyISAM tables. Please
check that.

James Wang

unread,

Oct 1, 2015, 4:58:28 AM10/1/15

to codership, cip...@cipris.ro

I had some similar experience: Thought all my tables were InnoDB by default. But a couple were created as MyISAM, so they were not replicated.

Ciprian Stoica

unread,

Oct 1, 2015, 5:13:17 AM10/1/15

to codership, cip...@cipris.ro

Hi Philip,

For example, I've created the customer 1003. Among the customer tables, there are two tables called UserPreference_1003 respectively ClientPreference_1003. Both were properly created on all three nodes. They were supposed to contain 170 respectively 85 rows. But only the tables on node2 contains these rows. On the other two nodes, the tables are empty.

It is the same for all the other customer tables. Not all the customer tables are filled with some rows from the beginning. But those that the application runs inserts on only contain those rows on node2 and they are empty on the other nodes. On another customer insert, it might be possible that node1 or node3 to contain all the above mentioned rows and the other nodes to have empty tables.

SHOW STATUS LIKE 'wsrep_last_committed' shows the same value on all three nodes.

I've enabled --log-bin and --log-slave-updates. I've compared the binlogs between node2 and node3 and node doesn't show the inserts that populate the above tables. I see this in node2 binlog:

...
#150930 15:01:12 server id 1  end_log_pos 88339 Table_map: `dbname`.`ClientPreference_1003` mapped to number 554
#150930 15:01:12 server id 1  end_log_pos 88465 Write_rows: table id 554 flags: STMT_END_F


BINLOG '
OPkLVhMBAAAAXwAAABNZAQAAACoCAAAAAAEAEnZpc2l4ZF9WSURGV0VONF9XRgAVQ2xpZW50UHJl
ZmVyZW5jZV8xMDAzAAoICA8PDw8PBQwDC5YAlgC4C7gLaAEI+AM=
OPkLVhcBAAAAfgAAAJFZAQAAACoCAAAAAAEACv//4P2jTw8AAAAAAOsDAAAAAAAAGUNVU1RPTV9V
U0VSX1BSRUZFUkVOQ0VfMTEEVEVYVCkAQ3VzdG9tIHVzZXIgcHJlZmVyZW5jZSAjMTEgZm9yIGZy
ZWUgdXNpbmcCAAAA
'/*!*/;
# at 88465
# at 88560
#150930 15:01:12 server id 1  end_log_pos 88560 Table_map: `dbname`.`ClientPreference_1003` mapped to number 554
#150930 15:01:12 server id 1  end_log_pos 88686 Write_rows: table id 554 flags: STMT_END_F


BINLOG '
OPkLVhMBAAAAXwAAAPBZAQAAACoCAAAAAAEAEnZpc2l4ZF9WSURGV0VONF9XRgAVQ2xpZW50UHJl
ZmVyZW5jZV8xMDAzAAoICA8PDw8PBQwDC5YAlgC4C7gLaAEI+AM=
OPkLVhcBAAAAfgAAAG5aAQAAACoCAAAAAAEACv//4P2kTw8AAAAAAOsDAAAAAAAAGUNVU1RPTV9V
U0VSX1BSRUZFUkVOQ0VfMTIEVEVYVCkAQ3VzdG9tIHVzZXIgcHJlZmVyZW5jZSAjMTIgZm9yIGZy
ZWUgdXNpbmcCAAAA
'/*!*/;
# at 88686
# at 88781
...

This is missing from the binlog on node3 and node1.

I paste below the settings I have in my.cnf file on node2. The settings are simillar on all the nodes. Only the IP, node name, etc.. is customized for each node.

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock


binlog_format=ROW
bind-address=0.0.0.0
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=1G"
wsrep_cluster_name="example_cluster"
wsrep_cluster_address="gcomm://192.168.210.81,192.168.210.68"
wsrep_sst_method=rsync
wsrep_node_name=MyNode2
wsrep_node_address="192.168.210.67"
wsrep_debug=ON
wsrep_log_conflicts=ON
wsrep_provider_options="cert.log_conflicts=YES"
skip-name-resolve
skip-host-cache
log_basename=node2
log_bin=mysql_logbin.log
log_slave_updates
log_output = 'FILE'
general_log_file='mysql_general.log'
general_log=1
symbolic-links=0
user=mysql
sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO"
max_allowed_packet=128M
max_connections=2048
query_cache_size=15M
table_cache=2048
tmp_table_size=18M
slow_query_log
thread_cache_size=8
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=0
innodb_log_buffer_size=1M
innodb_buffer_pool_size=122M
innodb_log_file_size=10M
innodb_thread_concurrency=8


[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid


!includedir /etc/my.cnf.d

Ciprian

Ciprian Stoica

unread,

Oct 1, 2015, 5:15:25 AM10/1/15

to codership, cip...@cipris.ro

Hi Alexey,

All the tables are InnoDB, have primary keys, etc.. We've met all the Galera recommendations.

Regards,

Ciprian

Ciprian Stoica

unread,

Oct 1, 2015, 5:17:14 AM10/1/15

to codership, cip...@cipris.ro

Hi James,

You had similar experience with the MyISAM tables or also with the InnoDB ones?

Regards,

Ciprian

James Wang

unread,

Oct 1, 2015, 5:23:26 AM10/1/15

to codership, cip...@cipris.ro

MyISAM

Still don't know why my table was created with MyISAM engine though

Ciprian Stoica

unread,

Oct 1, 2015, 5:26:31 AM10/1/15

to codership

Ok. Mine are InnoDB for sure.

show create table UserPreference_1003;



CREATE TABLE `UserPreference_1003` (
  `preferenceId` bigint(20) NOT NULL AUTO_INCREMENT,
  ...
  PRIMARY KEY (`preferenceId`)
) ENGINE=InnoDB AUTO_INCREMENT=1000 DEFAULT CHARSET=utf8

Philip Stoev

unread,

Oct 1, 2015, 5:35:17 AM10/1/15

to Ciprian Stoica, codersh...@googlegroups.com

Hello,

There is only a very limited number of circumstances when Galera will not
replicate a row (usually when the plugin is not loaded, wsrep_on is set to
OFF, etc.)

I guess the next step would be to enable the general query log and capture
the actual sequence of queries that populate those tables. Statements such
as SET GLOBAL wsrep_... , SET SESSION wsrep_... , CREATE ... SELECT, INSERT
... SELECT , triggers, views, temporary tables may be relevant to the
problem at hand.

Philip Stoev

-----Original Message-----
From: Ciprian Stoica
Sent: Thursday, October 01, 2015 12:13
To: codership
Cc: cip...@cipris.ro
Subject: Re: [codership-team] Issues when automatically running DDL queries
on Galera cluster

Ciprian Stoica

unread,

Oct 2, 2015, 8:20:58 AM10/2/15

to codership

I gathered all the queries that run when a customer is created. There are no less than 1145 queries (mixed DDL with DML) in the same transaction. Could be possible that such a big number to cause the issue? But I still don't see any errors in the logs.

Ciprian

Ciprian Stoica

unread,

Oct 6, 2015, 6:56:36 AM10/6/15

to codership

Problem solved. Jboss was using XA transactions. But Galera does not support XA transactions (yet). It is clearly explained here: http://galeracluster.com/documentation-webpages/limitations.html#differences-in-transactions but for some reason I missed reading that page.

I reconfigured Jboss to use non-XA transactions and the problem has gone.

Best regards,

Ciprian

Reply all

Reply to author

Forward