Monday, 17 March 2014

VCS Interview questions answers

1) What is split brain and amnesia prevention in cluster ?

2) Suppose one of the high-priority heartbeat connection between nodes is lost, what will be the condition of the cluster known as ? What action VCS will take in such a scenario ?

When one of the high-priority heartbeat connectivity between nodes is lost and there is only one remaining heartbeat link, VCS will place the node in a special membership category known as jeopardy membership.
In such case, VCS will autodisable the SG amd the servicegroup state will not change i.e offline or online servicegroups continue to be in that state. VCS  prevents any failover from happening to prevent data corruption.
3) During patching if we want to stop a servicegroup from failover, what actions we will take ?
It is a best practice to freeze a servicegroup during server patching activity. When you freeze a Group, VCS will take no action on that Group or its Resources. It will not try to bring the servicegroup online on any other node.  After the maintenance is over, you can bring the resources online and VCS will refresh its view at that time.

4) How do you check logs of a servicegroup and a resource in VCS ? How will you troubleshoot if a resource has faulted ?
The default VCS log directory is /var/VRTSvcs/log
The main event log of VCS is /var/VRTSvcs/log/engine_A.log. This file is the best place to begin troubleshooting for a failed resource.
Individual agent types have their own log files e.g Mount_A.log , Apache_A.log or Weblogic_A.log. These log files contain more  detailed info than the engine_A.log.

Checking for the word 'clean' can provide clues related to the cause of failure.

5) What is the main purpose of llt and had daemons ?
LLT is the transport mechanism of VCS and is responsible for load balancing of cluster communications and maintaining heartbeat. HAD is the main VCS deamon and is responsible for taking operator input and performing the relevant actions. HAD also takes all types of corrective actions required.

6) What are the difference between LLT and GAB ?
LLT and GAB purpose and differences
LLT is the layer 2 protocol developed by Veritas. It takes care of the heartbeat connection and is a carrier.
GAB distributes the cluster config among nodes. GAB uses LLT as its tranport mechanism for distributing cluster configuration changes.
HAD communicates with GAB and maintains / tracks all cluster configuration. Uses main.cf file to build cluster config. HAD also takes all types of corrective actions required.

7) What is the difference between high priority and low priority link in VCS ?
High priority link is used for transmitting cluster communication and configuration information between nodes and to GAB as well as for heartbeat communications.
Low priority link is used only for heartbeat in normal scenario, but in case of failure of high priority link it can take over the task of transmitting cluster communication also.

8) What are the components in VCS i/o fencing setup ?
Following are the components required for IO fencing in VCS
i)    Coordinator diskgroup with 3 disks
ii)    Data diskgroup
iii)    Dynamic multipathing software (VXDMP)

9) What is a Jeopardy condition in VCS ? What happens to the ServiceGroup and Resources running on a system which is under Jeopardy condition ?
Jeopardy membership condition occurs when a node in a cluster is having only one heartbeat connection remaining with the rest of the cluster. At this point, VCS cannot reliably distinguish between a node failure or network failure if the last heartbeat interconnect also fails. Hence under jeopardy condition, VCS prevents the ServiceGroup from failover. The Applications and ServiceGroup running on the node keep on running as usual and will not be failed over in case of a Node failure. But in case of a resource or group fault, the ServiceGroup fails over to available systems in the cluster. This is a safety mechanism to prevent data corruption.

10) During RACE condition for membership arbitration in case of a node or link failure, how VCS will determine the eligible host for aquiring the lock on co-ordinator disks ? Which sub-cluster will win the RACE and based on what logic ?
During a RACE condition, the partitioned nodes will form a sub-cluster and try to acquire the co-ordinatore disks. Among the nodes in the sub-cluster , the node with the lowest LLT ID will run for the RACE on behalf of itself and other nodes in sub-cluster. If it is successful it will eject keys of other systems (i.e nodes which are not part of the newly formed sub-cluster) from the co-ordinator disks and send a WON_RACE communication to nodes in its sub-cluster. Other nodes which fail to acquire the disks will panic.

No comments:

Post a Comment