Wednesday, May 16, 2012

VMware Fault Tolerance FAQ












Purpose

This article provides information for users of VMware Fault Tolerance (FT). The article contains Frequently Asked Questions that can help to resolve Fault Tolerance related issues.

Resolution

What is VMware Fault Tolerance?
VMware Fault Tolerance is a feature that allows a new level of guest redundancy. Information regarding this feature can be found in the vSphere Availability Guide for your version of ESX.
How do I turn it on?
The feature is enabled on a per virtual machine basis. Instructions for enabling Fault Tolerance can be found in the Turning on Fault Tolerance for Virtual Machines section of the vSphere Availability Guide for your version of ESX.
What happens when I turn on Fault Tolerance?
In very general terms, a second virtual machine is created to work in tandem with the virtual machine you have enabled Fault Tolerance on. This virtual machine resides on a different host in the cluster, and runs in virtual lockstep with the primary virtual machine. When a failure is detected, the second virtual machine takes the place of the first one with the least possible interruption of service. More specific information about how this is achieved can be found in the Protecting Mission-Critical Workloads with VMware Fault Tolerance whitepaper.
Why can't I turn Fault Tolerance on?
VMware Fault Tolerance can be enabled on any virtual machine that resides in a cluster that meets the necessary requirements. If you have difficulty enabling Fault Tolerance for a specific virtual machine, see The Turn on Fault Tolerance option is disabled (1010631).
How do I turn Fault Tolerance off?
Instructions for disabling Fault Tolerance can be found in the article in Disabling or Turning Off VMware FT (1008026).
How do I tell if my environment is ready for Fault Tolerance?
The VMware SiteSurvey Tool is used to check your environment for compliance with VMware Fault Tolerance. It can be downloaded at http://www.vmware.com/download/shared_utilities.html.
Where do I find the product's website?
VMware has a website for the Fault Tolerance product available online here at http://www.vmware.com/products/fault-tolerance/.
What happens during a failure?
When a host running the primary virtual machine fails, a transparent failover occurs to the corresponding secondary virtual machine. During this failover, there is no data loss or noticeable service interruption. In addition, VMware HA automatically restores redundancy by restarting a new secondary virtual machine on another host. Similarly, if the host running the secondary virtual machine fails, VMware HA starts a new secondary virtual machine on a different host. In either case there is no noticeable outage by an end user.
What is the logging time delay between the Primary and Secondary Fault Tolerance virtual machines?
The actual delay is based on the network latency between the Primary and Secondary. vLockstep executes the same instructions on the Primary and Secondary, but because this happens on different hosts, there could be a small latency, but no loss of state. This is typically less than 1 ms. Fault Tolerance includes synchronization to ensure that the Primary and Secondary are synchronized.
In a cluster with more than 3 hosts, can you tell Fault Tolerance where to put the Fault Tolerance virtual machine or does it chose on its own?
You can place the original (or Primary virtual machine). You have full control with DRS or VMotion to assign to it to any node. The placement of the Secondary, when created, is automatic based on the available hosts. But when the secondary is created and placed, you can VMotion it to the preferred host.
What happens if the host containing the primary virtual machine comes back online (after a node failure)?
This node is put back in the pool of available hosts. There is no attempt to start or migrate the primary to that host.
Is the failover from the primary virtual machine to the secondary virtual machine dynamic or does Fault Tolerance restart a virtual machine?
The failover from primary to secondary virtual machine is dynamic, with the secondary continuing execution from the exact point where the primary left off. It happens automatically with no data loss, no downtime, and little delay. Clients see no interruption. After the dynamic failover to the secondary virtual machine, it becomes the new primary virtual machine. A new secondary virtual machine is spawned automatically
Where are Fault Tolerance failover events logged?
All failover events are logged by vCenter.
I encountered an error message that I can't find in the knowledge base.  Where else should I check?
The vSphere Availability Guide contains a list of known errors in the Fault Tolerance Error Messages section.
Does Fault Tolerance support Intel Hyper-Threading Technology?Yes, Fault Tolerance does support Intel Hyper-Threading Technology on systems that have it enabled. Enabling or disabling Hyper-Threading has no impact on Fault Tolerance.
What happens if vCenter Server is offline when a failover event occurs?
Once Fault Tolerance is configured for a virtual machine, vCenter Server need not be online for FT to work. Even if vCenter Server is offline, failover will still occur from the primary to the secondary virtual machine. Additionally, the spawning of a new secondary virtual machine will also occur without vCenter Server.


Fault Tolerance Checklist 

Required c ESX/ESXi Hardware: Ensure that the processors are supported: AMD Barcelona+, Intel Penryn+ (run the CPU compatibility tool to determine compatibility).

Required c ESX/ESXi Hardware: Ensure that HV (Hardware Virtualization) is enabled in the BIOS.

Optional c ESX/ESXi Hardware: Ensure that power management (also known as power-capping) is turned OFF in the BIOS (performance implications).

Optional c ESX/ESXi Hardware: Ensure that hyper-threading is turned OFF in the BIOS (performance implications).

Required c Storage: Ensure that FT protected virtual machines are on shared storage (FC, iSCSI or NFS). When using NFS, increase timeouts and have a dedicated NIC for NFS traffic.

Required c Storage: Ensure that the datastore is not using physical RDM (Raw Disk Mapping). Virtual RDM is supported.

Required c Storage: Ensure that there is no requirement to use Storage VMotion for VMware FT VMs since Storage VMotion is not supported for VMware FT VMs.

Required c Storage: Ensure that NPIV (N-Port ID Virtualization) is not used since NPIV is not supported with VMware FT.

Optional c Storage: Ensure that virtual disks on VMFS3 are thick-eager zeroed (thin or sparsely allocated will be converted to thick-eager zeroed when VMware FT is enabled requiring additional storage space).

Optional c Storage: Ensure that ISOs used by the VMware FT protected VMs are on shared storage accessible to both primary and secondary VMs (else errors reported on secondary as if there is no media, which might be acceptable).

Optional c Network: Ensure that at least two NICs are used (NIC teaming) for ESX  management/VMotion and VMware FT logging. VMware recommends four VMkernel NICs: two dedicated for VMware VMotion and two dedicated for VMware FT.

Required c Network: Ensure that at least gigabit NICs are used (10 Gbit NICs can be used as well as jumbo frames enabled for better performance).

Optional c Redundancy: Ensure that the environment does not have a single point of failure (i.e. use NIC teaming, multiple network switches, and storage multipathing).

Required c vCenter Server: Ensure that the primary and secondary ESX hosts and virtual machines are in an HA-enabled cluster.

Required c vCenter Server: Ensure that there is no requirement to use DRS for VMware FT protected virtual machines; in this release VMware FT cannot be used with VMware DRS (although manual VMotion is allowed).

Required c vCenter Server: Ensure that host certificate checking is enabled (enabled by default) before you add the ESX/ESXi host to vCenter Server.

Required c ESX/ESXi: Ensure that the primary and secondary ESX/ESXi hosts are running the same build of VMware ESX/ESXi.

Required c Virtual Machines: Ensure that the virtual machines are NOT using more than 1 vCPU (SMP is not supported).

Required c Virtual Machines: Ensure that there is no user requirement to use NPT/EPT (Nested Page Tables/Extended Page Tables) since VMware FT disables NPT/EPT on the ESX host.

Required c Virtual Machines: Ensure that there is no user requirement to hot add or remove devices since hot plugging devices cannot be done with VMware FT.

Required c Virtual Machines: Ensure that there is no user requirement to use USB (USB must be disabled) and sound devices (must not be configured) since these are not supported for  ecord/Replay (and VMware FT).

Required c Virtual Machines: Ensure that there is no user requirement to have virtual machine snapshots since these are not supported for VMware FT. Delete snapshots from existing virtual machines before protecting with VMware FT.

Required c Virtual Machines: Ensure that virtual machine hardware is upgraded to v7.

Optional c Virtual Machines: Ensure that there are will be no more than four (to eight) VMware FT enabled virtual machine primaries or secondaries on any single ESX/ESXi host (suggested general guideline based on ESX/ESXi host and VM size and workloads which can vary).

Required c Guest OS: Ensure that the virtual machines do not use a paravirtualized guest OS.

Required c 3rd Party: Ensure MSCS clustered virtual machines will have MSCS clustering removed prior to protecting with VMware FT (and make sure that the virtual machines are not SMP).











No comments:

Post a Comment