You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
122 lines
5.8 KiB
122 lines
5.8 KiB
7 years ago
|
From 7fe4d007da92381c692b5ae47cec7f63e06b1a6a Mon Sep 17 00:00:00 2001
|
||
|
From: vaLentin chernoZemski <valentin@siteground.com>
|
||
|
Date: Thu, 13 Oct 2016 13:17:59 +0300
|
||
|
Subject: [PATCH 1/2] heartbeat/mysql - Fixed bug where crm_admin is never
|
||
|
called, leaving master scores to -1 in certain conditions.
|
||
|
|
||
|
Consider the following scenario:
|
||
|
|
||
|
- crm got mysql master slave resource configured without providing check_level and test_table in the config
|
||
|
- crm is put into maintenance mode
|
||
|
- mysql replication is adjusted automatically or by hand
|
||
|
- crm is restarted on all nodes
|
||
|
- crm resources are reprobed
|
||
|
- crm is put into live mode
|
||
|
- at this point all nodes are working as expected but NONE of them got any master-mysql score set thus defaulting to -1. monitor of the resource never called crm_master.
|
||
|
- master fails
|
||
|
- crm will refuse to elect any slaves with the following error
|
||
|
|
||
|
failednode.com pengine: debug: master_color: mysql:0 master score: -1
|
||
|
|
||
|
When ms_mysql resource is configured master-mysql attribute/score for each node is not set by default thus returning -1. This translates to 'never promote this service as master on this machine'
|
||
|
|
||
|
master-mysql should be set to positive value by the resource agent when RA decides that this machine is suitable for master.
|
||
|
|
||
|
In the configuration set specified above if crm never did any operations on the mysql service such as start/stop/promote/demote score on particular node score remains -1 for that node. It just never called crm_master.
|
||
|
|
||
|
When current master fails and new one needs to be promoted/elected crm is unable to choose new master with following error:
|
||
|
|
||
|
failednode.com pengine: debug: master_color: mysql:1 master score: 0 ---> because node that hosts mysql:1 is down
|
||
|
failednode.com pengine: debug: master_color: mysql:0 master score: -1 --> because the current live node got initial default valule
|
||
|
|
||
|
Respectively we fail to promote new master node for the particular service.
|
||
|
|
||
|
failednode.com pengine: info: master_color: ms_mysql: Promoted 0 instances of a possible 1 to master
|
||
|
|
||
|
When failover procedure is started crm calls resource agents (read ocfs 'init' script with action 'monitor' on all live nodes that host the have the particular master/slave resource started.
|
||
|
|
||
|
This monitor operation is expected to return master-mysql scorenum here. But it did not due to specific conditions and configurations.
|
||
|
|
||
|
To solve this issue we modified the mysql resource agent to always export master-mysql scores depending on the response if called with 'monitor'.
|
||
|
|
||
|
Scores are exported by calling:
|
||
|
|
||
|
crm_master -l reboot -v SCORE - if status is success. The higher the score, the better the chance to elect this node,
|
||
|
crm_master -l reboot -D - if monitor operation fails thus instructing the engine that the current node can not be used as master as it got some issues.
|
||
|
---
|
||
|
heartbeat/mysql | 18 +++++++++++++++++-
|
||
|
1 file changed, 17 insertions(+), 1 deletion(-)
|
||
|
|
||
|
diff --git a/heartbeat/mysql b/heartbeat/mysql
|
||
|
index be914d3b2..707bff33c 100755
|
||
|
--- a/heartbeat/mysql
|
||
|
+++ b/heartbeat/mysql
|
||
|
@@ -719,13 +719,22 @@ mysql_monitor() {
|
||
|
fi
|
||
|
|
||
|
mysql_common_status $status_loglevel
|
||
|
-
|
||
|
rc=$?
|
||
|
|
||
|
# TODO: check max connections error
|
||
|
|
||
|
# If status returned an error, return that immediately
|
||
|
if [ $rc -ne $OCF_SUCCESS ]; then
|
||
|
+ if ( ocf_is_ms ); then
|
||
|
+ # This is a master slave setup but monitored host returned some errors.
|
||
|
+ # Immediately remove it from the pool of possible masters by erasing its master-mysql key
|
||
|
+ # When new mysql master election is started and node got no or negative master-mysql attribute the following is logged
|
||
|
+ # nodename.com pengine: debug: master_color: mysql:0 master score: -1
|
||
|
+ # If there are NO nodes with positive vaule election of mysql master will fail with
|
||
|
+ # nodename.com pengine: info: master_color: ms_mysql: Promoted 0 instances of a possible 1 to master
|
||
|
+ $CRM_MASTER -D
|
||
|
+ fi
|
||
|
+
|
||
|
return $rc
|
||
|
fi
|
||
|
|
||
|
@@ -742,13 +751,20 @@ mysql_monitor() {
|
||
|
rc=$?
|
||
|
|
||
|
if [ $rc -ne 0 ]; then
|
||
|
+ # We are master/slave and test failed. Delete master score for this node as it is considered unhealthy because of this particular failed check.
|
||
|
+ ocf_is_ms && $CRM_MASTER -D
|
||
|
ocf_exit_reason "Failed to select from $test_table";
|
||
|
return $OCF_ERR_GENERIC;
|
||
|
fi
|
||
|
+ else
|
||
|
+ # In case no exnteded tests are enabled and we are in master/slave mode _always_ set the master score to 1 if we reached this point
|
||
|
+ ocf_is_ms && $CRM_MASTER -v 1
|
||
|
fi
|
||
|
|
||
|
if ocf_is_ms && ! get_read_only; then
|
||
|
ocf_log debug "MySQL monitor succeeded (master)";
|
||
|
+ # Always set master score for the master
|
||
|
+ $CRM_MASTER -v 2
|
||
|
return $OCF_RUNNING_MASTER
|
||
|
else
|
||
|
ocf_log debug "MySQL monitor succeeded";
|
||
|
|
||
|
From 8ba16bcd7ff23be983570df0afe447beabd1c682 Mon Sep 17 00:00:00 2001
|
||
|
From: vaLentin chernoZemski <valentin@siteground.com>
|
||
|
Date: Mon, 23 Jan 2017 10:46:52 +0200
|
||
|
Subject: [PATCH 2/2] heartbeat/mysql - don't run ocf_is_ms check in a subshell
|
||
|
|
||
|
---
|
||
|
heartbeat/mysql | 2 +-
|
||
|
1 file changed, 1 insertion(+), 1 deletion(-)
|
||
|
|
||
|
diff --git a/heartbeat/mysql b/heartbeat/mysql
|
||
|
index 707bff33c..9e779e4f9 100755
|
||
|
--- a/heartbeat/mysql
|
||
|
+++ b/heartbeat/mysql
|
||
|
@@ -725,7 +725,7 @@ mysql_monitor() {
|
||
|
|
||
|
# If status returned an error, return that immediately
|
||
|
if [ $rc -ne $OCF_SUCCESS ]; then
|
||
|
- if ( ocf_is_ms ); then
|
||
|
+ if ocf_is_ms ; then
|
||
|
# This is a master slave setup but monitored host returned some errors.
|
||
|
# Immediately remove it from the pool of possible masters by erasing its master-mysql key
|
||
|
# When new mysql master election is started and node got no or negative master-mysql attribute the following is logged
|