Fresh deployment on oracle linux 9 unsuccessful

61 views
Skip to first unread message

Marcin Groszek

unread,
Feb 3, 2025, 8:30:41 AMFeb 3
to codership
On my test server I have setup 4 nodes cluster using galera manger, but I ran in to some issues.
All vps are OL9, GM Version: 1.8.7, mariadb version 10.11.10. all vps are private IP on the same /24
GM node during the installation did not enabled nginx service at boot, this is an easy fix,
Connection status is off line, and life monitoring is off line.
there is no firewalld running, no iptables rules in place , all nodes and GM node has access to the internet. SSH keys are in place, and all nodes are "greyed out" 
I see requests from GM to all nodes on port 8081:
POST /write?db=gmd HTTP/1.1..Host: 10.0.2.3:8081..User-Agent: Telegraf/1.33.1....
and responds:
HTTP/1.1 204 No Content..Server: nginx/1.20.1..
Disabling selinux did not help.
After the initial deployment galera cluster was working, but  Recover Cluster function "broke" the cluster.

Marcin Groszek

unread,
Feb 3, 2025, 9:19:49 AMFeb 3
to codership
Little update after digging:
Deployment log shows : No agent connection
systemctl status telegraf shows:
mysql-node-1 telegraf[10114]: 2025-02-03T13:11:01Z E! [inputs.execd] stderr: "2025/02/03 07:11:01 E! Error in plugin: Error 1054: Unknown column 'status' in 'WHERE'"
strings /usr/local/bin/mysql_wsrep shows the failed query: SELECT FROM information_schema.INNODB_METRICS WHERE status='enabled'
but status column in information_schema.INNODB_METRICS does not exists.
commenting out gather_innodb_metrics = true in /etc/telegraf/mysql_wsrep-telegraf-plugin.conf overcomed the error , but new error showed after telgraf did started:
[outputs.influxdb] When writing to [http://10.0.2.3:8081]: database "gmd" creation failed: 200 OK: not implemented: CREATE DATABASE
well, granting new privileges to  mysqld_exporter on the node nor manual creation of  gmd database did not corrected the issue and the error continued.
Then I looked at the gma service and logs shows following error:
Feb 03 07:35:28 mysql-node-1 gma[13954]: time="2025-02-03T07:35:28-06:00" level=error msg="Failed to create stream: rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\""
Feb 03 07:35:28 mysql-node-1 gma[13954]: time="2025-02-03T07:35:28-06:00" level=error msg="Error while serving: rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\"; slee>
Feb 03 07:35:29 mysql-node-1 gma[13954]: time="2025-02-03T07:35:29-06:00" level=info msg="Connecting (used scheme http)..."
Feb 03 07:35:29 mysql-node-1 gma[13954]: time="2025-02-03T07:35:29-06:00" level=info msg="Creating agentcom stream..."

Marcin Groszek

unread,
Feb 4, 2025, 5:50:27 AMFeb 4
to codership
One more update, but still.
gmd service on OL9 does not creates /var/run/gmd.pid file and telegraf configuration checks for it at start. The solution was to replace pid_file = "/var/run/gmd.pid" line in /etc/telegraf/telegraf.conf with systemd_unit = "gmd.service" and this eliminated telegraf errors.
Well, I have verified that db nodes are writing to influx on manager node via telegraf but the GUI interface is not updating the status of nodes nor reading any data nor db nodes logs.
There is a little info under Jobs : name: mysql_node_1status: healthy comment: node status is healthy
web log has 1 repeating error :
2025/02/03 15:49:30 [info] 828#828: *12889 client sent invalid request while reading client request line, client: 10.0.2.5, server: 10.0.2.3, request: "PRI * HTTP/2.0"
2025/02/03 15:49:30 [info] 831#831: *12890 client sent invalid request while reading client request line, client: 10.0.2.7, server: 10.0.2.3, request: "PRI * HTTP/2.0"
where 10.0.2.5 and 10.0.2.7 are db nodes.
and influxdb errors shows : 
2025/02/03 15:23:01 [error] 831#831: *10 connect() failed (111: Connection refused) while connecting to upstream, client: 10.0.2.6, server: 10.0.2.3, request: "POST /write?db=gmd HTTP/1.1", upstream: "http://127.0.0.1:8086/write?db=gmd", host: "10.0.2.3:8081"

Unfortunately there is very little info available on line to continue troubleshooting.

Alexey Yurchenko

unread,
Feb 4, 2025, 6:48:56 AMFeb 4
to codership
Hi Marcin, could you please report this at https://github.com/codership/galera-manager-support/issues and describe the infrastructure you have: where and how you install GMD, where are the nodes, are they VMs, how they were provisioned. It sounds like you were not using the installer which does all the configuration heavy lifting.

Marcin Groszek

unread,
Feb 4, 2025, 7:39:59 AMFeb 4
to codership
I have used the installer, version  1.8.7, downloaded from galeracluster.com and followed  the instructions in a link received via email, all nodes have been created from GUI and galera cluster has been successfully deployed via GMS on VMs all running ol9. Not AWS, all VMs on the same hardware server, private network 10.0. no firewall.
Galera cluster was operational, but the GMS was only good for restarting the nodes. No stats, online status red, monitoring status red, nodes not synced. 

Reply all
Reply to author
Forward
0 new messages