You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
121 lines
5.8 KiB
121 lines
5.8 KiB
From 7fe4d007da92381c692b5ae47cec7f63e06b1a6a Mon Sep 17 00:00:00 2001 |
|
From: vaLentin chernoZemski <valentin@siteground.com> |
|
Date: Thu, 13 Oct 2016 13:17:59 +0300 |
|
Subject: [PATCH 1/2] heartbeat/mysql - Fixed bug where crm_admin is never |
|
called, leaving master scores to -1 in certain conditions. |
|
|
|
Consider the following scenario: |
|
|
|
- crm got mysql master slave resource configured without providing check_level and test_table in the config |
|
- crm is put into maintenance mode |
|
- mysql replication is adjusted automatically or by hand |
|
- crm is restarted on all nodes |
|
- crm resources are reprobed |
|
- crm is put into live mode |
|
- at this point all nodes are working as expected but NONE of them got any master-mysql score set thus defaulting to -1. monitor of the resource never called crm_master. |
|
- master fails |
|
- crm will refuse to elect any slaves with the following error |
|
|
|
failednode.com pengine: debug: master_color: mysql:0 master score: -1 |
|
|
|
When ms_mysql resource is configured master-mysql attribute/score for each node is not set by default thus returning -1. This translates to 'never promote this service as master on this machine' |
|
|
|
master-mysql should be set to positive value by the resource agent when RA decides that this machine is suitable for master. |
|
|
|
In the configuration set specified above if crm never did any operations on the mysql service such as start/stop/promote/demote score on particular node score remains -1 for that node. It just never called crm_master. |
|
|
|
When current master fails and new one needs to be promoted/elected crm is unable to choose new master with following error: |
|
|
|
failednode.com pengine: debug: master_color: mysql:1 master score: 0 ---> because node that hosts mysql:1 is down |
|
failednode.com pengine: debug: master_color: mysql:0 master score: -1 --> because the current live node got initial default valule |
|
|
|
Respectively we fail to promote new master node for the particular service. |
|
|
|
failednode.com pengine: info: master_color: ms_mysql: Promoted 0 instances of a possible 1 to master |
|
|
|
When failover procedure is started crm calls resource agents (read ocfs 'init' script with action 'monitor' on all live nodes that host the have the particular master/slave resource started. |
|
|
|
This monitor operation is expected to return master-mysql scorenum here. But it did not due to specific conditions and configurations. |
|
|
|
To solve this issue we modified the mysql resource agent to always export master-mysql scores depending on the response if called with 'monitor'. |
|
|
|
Scores are exported by calling: |
|
|
|
crm_master -l reboot -v SCORE - if status is success. The higher the score, the better the chance to elect this node, |
|
crm_master -l reboot -D - if monitor operation fails thus instructing the engine that the current node can not be used as master as it got some issues. |
|
--- |
|
heartbeat/mysql | 18 +++++++++++++++++- |
|
1 file changed, 17 insertions(+), 1 deletion(-) |
|
|
|
diff --git a/heartbeat/mysql b/heartbeat/mysql |
|
index be914d3b2..707bff33c 100755 |
|
--- a/heartbeat/mysql |
|
+++ b/heartbeat/mysql |
|
@@ -719,13 +719,22 @@ mysql_monitor() { |
|
fi |
|
|
|
mysql_common_status $status_loglevel |
|
- |
|
rc=$? |
|
|
|
# TODO: check max connections error |
|
|
|
# If status returned an error, return that immediately |
|
if [ $rc -ne $OCF_SUCCESS ]; then |
|
+ if ( ocf_is_ms ); then |
|
+ # This is a master slave setup but monitored host returned some errors. |
|
+ # Immediately remove it from the pool of possible masters by erasing its master-mysql key |
|
+ # When new mysql master election is started and node got no or negative master-mysql attribute the following is logged |
|
+ # nodename.com pengine: debug: master_color: mysql:0 master score: -1 |
|
+ # If there are NO nodes with positive vaule election of mysql master will fail with |
|
+ # nodename.com pengine: info: master_color: ms_mysql: Promoted 0 instances of a possible 1 to master |
|
+ $CRM_MASTER -D |
|
+ fi |
|
+ |
|
return $rc |
|
fi |
|
|
|
@@ -742,13 +751,20 @@ mysql_monitor() { |
|
rc=$? |
|
|
|
if [ $rc -ne 0 ]; then |
|
+ # We are master/slave and test failed. Delete master score for this node as it is considered unhealthy because of this particular failed check. |
|
+ ocf_is_ms && $CRM_MASTER -D |
|
ocf_exit_reason "Failed to select from $test_table"; |
|
return $OCF_ERR_GENERIC; |
|
fi |
|
+ else |
|
+ # In case no exnteded tests are enabled and we are in master/slave mode _always_ set the master score to 1 if we reached this point |
|
+ ocf_is_ms && $CRM_MASTER -v 1 |
|
fi |
|
|
|
if ocf_is_ms && ! get_read_only; then |
|
ocf_log debug "MySQL monitor succeeded (master)"; |
|
+ # Always set master score for the master |
|
+ $CRM_MASTER -v 2 |
|
return $OCF_RUNNING_MASTER |
|
else |
|
ocf_log debug "MySQL monitor succeeded"; |
|
|
|
From 8ba16bcd7ff23be983570df0afe447beabd1c682 Mon Sep 17 00:00:00 2001 |
|
From: vaLentin chernoZemski <valentin@siteground.com> |
|
Date: Mon, 23 Jan 2017 10:46:52 +0200 |
|
Subject: [PATCH 2/2] heartbeat/mysql - don't run ocf_is_ms check in a subshell |
|
|
|
--- |
|
heartbeat/mysql | 2 +- |
|
1 file changed, 1 insertion(+), 1 deletion(-) |
|
|
|
diff --git a/heartbeat/mysql b/heartbeat/mysql |
|
index 707bff33c..9e779e4f9 100755 |
|
--- a/heartbeat/mysql |
|
+++ b/heartbeat/mysql |
|
@@ -725,7 +725,7 @@ mysql_monitor() { |
|
|
|
# If status returned an error, return that immediately |
|
if [ $rc -ne $OCF_SUCCESS ]; then |
|
- if ( ocf_is_ms ); then |
|
+ if ocf_is_ms ; then |
|
# This is a master slave setup but monitored host returned some errors. |
|
# Immediately remove it from the pool of possible masters by erasing its master-mysql key |
|
# When new mysql master election is started and node got no or negative master-mysql attribute the following is logged
|
|
|