Linux Tricks and Techniques

For Linux Experts

Use the website for easy leaning of Linux

Friday, 1 July 2016

VCS Cluster not Starting

VCS Cluster not Starting

Once I faced an issue with one of our VCS node as there were two nodes in the vcs cluster on which two different services were running. Basically in the cluster nodes the services get started on another node if any of the node goes down but what will happen if one of the node is already down and we reboot the another server. In this case the node will try to failover the services and will only be trying to connect to another node which down and will get hang as shown in figure:

This node is trying to connect with another node and will show an error of can not connect to VCS Engine. The reason for this is during the cluster configuration the we configure the gab tab which looks for the seeding of node in case of any failure. So the solution for this is to stop the seeding of cluster but make sure that another node is already down. In case of such error please do follow the steps mentioned below and if the services do not comes up then stop the seeding.

hastatus -sum
VCS ERROR V-16-1-10600 Cannot connect to VCS engine
VCS WARNING V-16-1-11046 Local system not available

hasys -state
VCS ERROR V-16-1-10600 Cannot connect to VCS engine

hastop -all -force
VCS ERROR V-16-1-10600 Cannot connect to VCS engine

hastart / hastart -onenode
dmesg: Exiting: Another copy of VCS may be running engine_A.log
2013/10/22 15:16:43 VCS NOTICE V-16-1-11051 VCS engine join version=4.1000
2013/10/22 15:16:43 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.1 03/03/05-14:58:00
2013/10/22 15:16:43 VCS NOTICE V-16-1-10114 Opening GAB library
2013/10/22 15:16:43 VCS NOTICE V-16-1-10619 'HAD' starting on: db1
2013/10/22 15:16:45 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2013/10/22 15:17:00 VCS CRITICAL V-16-1-11306 Did not receive cluster membership, manual intervention may be needed for seeding

#gabconfig -a
GAB Port Memberships
#lltstat -nvv
LLT node information:
Node State Link Status Address
* 0 db1 OPEN
bge1 UP 00:03:BA:15
bge2 UP 00:03:BA:15
bge1 DOWN
bge2 DOWN

bash-2.05$ lltconfig
LLT is running

ps -ef | grep had
root 826 1 0 15:16:43 ? 0:00 /opt/VRTSvcs/bin/had
root 836 1 0 15:16:45 ? 0:00 /opt/VRTSvcs/bin/hashadow

If only one of two nodes can connect through llt (see your lltstat -nvv where one node is present and the other is down) then the cluster will attempt to start but will wait for both nodes to be available.

This is done to ensure in a heartbeat disconnection scenario or split-brain condition that you do not have 2 seperate clusters starting.

If this is a known condition, you can run the command

# gabconfig -c -x

This removes the number of nodes needed to seed a cluster, but this command should only be performed if you are certain the other node does not already have a running cluster. You should also diagnose why the other nodes' heartbeat links are not visable from llt.


No comments:
Write comments