Wednesday, May 30, 2012

VMware Fault Tolerance

  • VMware Fault Tolerance: What it is and how it works
  • New SiteSurvey utility from VMware checks for Fault Tolerance compatibility
  • More details on VMware’s Fault Tolerance feature

I. And VMware said, ‘Let there be Fault Tolerance’
Fault Tolerance was introduced as a new feature in vSphere that provided something that was missing in VMware Infrastructure 3 (VI3), the ability to have continuous availability for a virtual machine in case of a host failure. High Availability (HA) was a feature introduced in VI3 to protect against host failures, but it caused the VM to go down for a short period of time while it was restarted on another host. FT takes that to the next level and guarantees the VM stays operational during a host failure by keeping a secondary copy of it running on another host server. If a host fails, the secondary VM becomes the primary VM and a new secondary is created on another functional host.
The primary VM and secondary VM stay in sync with each other by using a technology called Record/Replay that was first introduced with VMware Workstation. Record/Replay works by recording the computer execution on a VM and saving it as a log file. It can then take that recorded information and replay it on another VM to have a replica copy that is a duplicate of the original VM.


II. Power to the processors
The technology behind the Record/Replay functionality is built in to certain models of Intel and AMD processors. VMware calls it vLockstep. This technology required Intel and AMD to make changes to both the performance counter architecture and virtualization hardware assists (Intel VT and AMD-V) that are inside the physical processors. Because of this, only newer processors support the FT feature. This includes the third-gen AMD Opteron based on the AMD Barcelona, Budapest and Shanghai processor families, and Intel Xeon processors based on the Penryn and Nehalem micro-architectures and their successors. VMware has published a knowledgebase article that provides more details on this.


III. But how does it do that?
FT works by creating a secondary VM on another ESX host that shares the same virtual disk file as the primary VM, and then transferring the CPU and virtual device inputs from the primary VM (record) to the secondary VM (replay) via a FT logging network interface card (NIC) so it is in sync with the primary VM and ready to take over in case of a failure. While both the primary and secondary VMs receive the same inputs, only the primary VM produces output such as disk writes and network transmits. The secondary VM’s output is suppressed by the hypervisor and is not on the network until it becomes a primary VM, so essentially both VMs function as a single VM.
It’s important to note that not everything that happens on the primary VM is copied to the secondary VM. There are certain actions and instructions that are not relevant to the secondary VM, and to record everything would take up a huge amount of disk space and processing power. Instead, only non-deterministic events are recorded, which include inputs to the VM (disk reads, received network traffic, keystrokes, mouse clicks, etc.,) and certain CPU events (RDTSC, interrupts, etc.). Inputs are then fed to the secondary VM at the same execution point so it is in exactly the same state as the primary VM.
The information from the primary VM is copied to the secondary VM using a special logging network that is configured on each host server. This requires a dedicated gigabit NIC for the FT Logging traffic (although not a hard requirement, this is highly recommended). You could use a shared NIC for FT Logging for small or test/dev environments and for testing the feature. The information that is sent over the FT Logging network between the host can be very intensive depending on the operation of the VM.
VMware has a formula that you can use to determine this:
VMware FT logging bandwidth ~= (Avg disk reads (MB/s) x 8 + Avg network input (Mbps)) x 1.2 [20% headroom]
To get the VM statistics needed for this formula you need to use the performance metrics that are supplied in the vSphere client. The 20% headroom is to allow for CPU events that also need to be transmitted and are not included in the formula. Note that disk or network writes are not used by FT as these do not factor in to the state of the virtual machine.

As you can see, disk reads will typically take up the most bandwidth. If you have a VM that does a lot of disk reading you can reduce the amount of disk read traffic across the FT Logging network by using a special VM parameter. By adding a replay.logReadData = checksum parameter to the VMX file of the VM, this will cause the secondary VM to read data directly from the shared disk, instead of having it transmitted over the FT logging network. For more information on this see this knowledgebase article.

IV. Every rose has its thorn
While Fault Tolerance is a useful technology, it does have many requirements and limitations that you should be aware of. Perhaps the biggest is that it currently only supports single vCPU VMs, which is unfortunate as many big enterprise applications that would benefit from FT usually need multiple vCPU’s (vSMP). Don’t let this discourage you from running FT, however, as you may find that some applications will run just fine with one vCPU on some of the newer, faster processors that are available as detailed here. Also, VMware has mentioned that support for vSMP will come in a future release. It’s no easy task trying to keep a single vCPU in lockstep between hosts and VMware developers need more time to develop methods to try and keep multiple vCPUs in lockstep between hosts. Additional requirements for VMs and hosts are as follows:
Host requirements:
  • CPUs: Only recent HV-compatible processors (AMD Barcelona+, Intel Harpertown+), processors must be the same family
  • All hosts must be running the same build of VMware ESX
  • Storage: shared storage (FC, iSCSI, or NAS)
  • Hosts must be in an HA-enabled cluster
  • Network and storage redundancy to improve reliability: NIC teaming, storage multipathing
  • Separate VMotion NIC and FT logging NIC, each Gigabit Ethernet (10 GB recommended). Hence, minimum of 4 NICs (VMotion, FT Logging, two for VM traffic/Service Console)
  • CPU clock speeds between the two ESX hosts must be within 400 Mhz of each other.
VM requirements:
  • VMs must be single-processor (no vSMP)
  • All VM disks must be “thick” (fully-allocated) and not thin; if a VM has a thin disk it will be converted to thick when FT is enabled.
  • No non-replayable devices (USB, sound, physical CD-ROM, physical floppy, physical Raw Device Mappings)
  • Make sure paravirtualization is not enabled by default (Ubuntu Linux 7/8 and SUSE Linux 10)
  • Most guest operating systems are supported with the following exceptions that apply only to hosts with third generation AMD Opteron processors (i.e. Barcelona, Budapest, Shanghai): Windows XP (32-bit), Windows 2000, Solaris 10 (32-bit). See this KB article for more.
In addition to these requirements your hosts must also be licensed to use the FT feature, which is only included in the Advanced, Enterprise and Enterprise Plus editions of vSphere.

V. How to use Fault Tolerance in your environment
Now that you know what FT does, you’ll need to decide how you will use it in your environment. Because of high overhead and limitations of FT you will want to use it sparingly. FT could be used in some cases to replace existing Microsoft Cluster Server (MSCS) implementations, but it’s important to note what FT does not do, which is to protect against application failure on a VM. It only protects against a host failure.
If protection for application failure is something you need, then a solution like MSCS would be better for you. FT is only meant to keep a VM running if there is a problem with the underlying host hardware. If protecting against an operating system failure is something you need, than VMware High Availability (HA) is what you want, as it can detect unresponsive VMs and restart them on the same host server.
FT and HA can be used together to provide maximum protection. If both the primary host and secondary host failed at the same time, HA would restart the VM on another operable host and spawn a new secondary VM.

VI. Important notes
One important thing to note: If you experience an OS failure on the primary VM, like a Windows Blue Screen Of Death (BSOD), the secondary VM will also experience the failure as it is an identical copy of the primary. The HA virtual machine monitor  will detect this, however, restart the primary VM, and then spawn a new secondary VM.
Another important note: FT does not protect against a storage failure. Since the VMs on both hosts use the same storage and virtual disk file it is a single point of failure. Therefore it’s important to have as much redundancy as possible to prevent this, such as dual storage adapters in your host servers attached to separate switches, known as multi-pathing). If a path to the SAN fails on one host, FT will detect this and switch over to the secondary VM, but this is not a desirable situation. Furthermore if there was a complete SAN failure or problem with the VM’s LUN, the FT feature would not protect against this.

VII. So should you actually use FT? Enter SiteSurvey
Now that you’ve read all this, you might be wondering if you meet the many requirements to use FT in your own environment. VMware provides a utility called SiteSurvey that will look at your infrastructure and see if it is capable of running FT. It is available as either a Windows or Linux download and once you install and run it, you will be prompted to connect to a vCenter Server. Once it connects to the vCenter Server you can choose from your available clusters to generate a SiteSurvery report that shows whether or not your hosts support FT and if the hosts and VMs meet the individual prerequisites to use the feature.
You can also click on links in the report that will give you detailed information about all the prerequisites along with compatible CPU charts. These links go to VMware’s website and display the help document for the SiteSurvey utility, which is full of great information, including some of the following prerequisites for FT.
  • The vLockstep technology used by FT requires the physical processor extensions added to the latest processors from Intel and AMD. In order to run FT, a host must have an FT-capable processor, and both hosts running an FT VM pair must be in the same processor family.
  • When ESX hosts are used together in an FT cluster, their processor speeds must be matched fairly closely to ensure that the hosts can stay in sync. VMware SiteSurvey will flag any CPU speeds that are different by more than 400 MHz.
  • ESX hosts running the FT VM pair must be running at least ESX 4.0, and must be running the same build number of ESX.
  • FT requires each member of the FT cluster to have a minimum of two NICs with speeds of at least 1 Gb per second. Each NIC must also be on the same network.
  • FT requires each member of the FT cluster to have two virtual NICs, one for logging and one for VMotion. VMware SiteSurvey will flag ESX hosts which do not contain as least two virtual NICs.
  • ESX hosts used together as a FT cluster must share storage for the protected VMs. For this reason VMware SiteSurvey lists the shared storage it detects for each ESX host and flags hosts that do not have shared storage in common. In addition, a FT-protected VM must itself be stored on shared storage and any disks connected to it must be shared storage.
  • At this time, FT only supports single-processor virtual machines. VMware SiteSurvey flags virtual machines that are configured with more than one processor. To use FT with those VMs, you must reconfigure them as single-CPU VMs.
  • FT will not work with virtual disks backed with thin-provisioned storage or disks that do not have clustering features enabled. When you turn on FT, the conversion to the appropriate disk format is performed by default.
  • Snapshots must be removed before FT can be enabled on a virtual machine. In addition, it is not possible to take snapshots of virtual machines on which FT is enabled.
  • FT is not supported with virtual machines that have CD-ROM or floppy virtual devices backed by a physical or remote device. To use FT with a virtual machine with this issue, remove the CD-ROM or floppy virtual device or reconfigure the backing with an ISO installed on shared storage.
  • Physical RDM is not supported with FT. You may only use virtual RDMs.
  • Paravirtualized guests are not supported with FT. To use FT with a virtual machine with this issue, reconfigure the virtual machine without a VMI ROM.
  • N_Port ID Virtualization (NPIV) is not supported with FT. To use FT with a virtual machine with this issue, disable the NPIV configuration of the virtual machine.
Below is some sample output from the SiteSurvey utility showing host and VM compatibility with FT and what features and components are compatible or not:


Another method for checking to see if your hosts meet the FT requirements is to use the vCenter Server Profile Compliance tool. To use this method, select your cluster in the left pane of the vSphere Client, then in the right pane select the Profile Compliance tab. Click the Check Compliance Now link and it will begin checking your hosts for compliance including FT as shown below:


VIII. Are we there yet? Turning on Fault Tolerance
Once you meet the requirements, implementing FT is fairly simple. A prerequisite for enabling FT is that your cluster must have HA enabled. You simply select a VM in your cluster, right-click on it, select Fault Tolerance and then select “Turn On Fault Tolerance.”

A secondary VM will then be created on another host. Once it’s complete you will see a new Fault Tolerance section on the Summary tab of the VM that will display information including FT status, secondary VM location (host), CPU and memory in use by the secondary VM, the secondary VM lag time (how far behind the primary it is in seconds) and the bandwidth in use for FT logging.

Once you have enabled FT there are alarms available that you can use to check for specific conditions such as FT state, latency, secondary VM status and more.

VIII. Fault Tolerance tips and tricks
Some additional tips and tidbits that will help you understand and implement FT are listed below.
  • Before you enable FT be aware of one important limitation, VMware currently recommends that you do not use FT in a cluster that consists of a mix of ESX and ESXi hosts. The reason is that ESX hosts might become incompatible with ESXi hosts for FT purposes after they are patched, even when patched to the same level. This is a result of the patching process and will be resolved in a future release so that compatible ESX and ESXi versions are able to interoperate with FT even though patch numbers do not match exactly. Until this is resolved you will need to take this into consideration if you plan on using FT and make sure you adjust your clusters that will have FT enabled VMs so they only consist of only ESX or ESXi hosts and not both.
  • VMware spent a lot of time working with Intel/AMD to refine their physical processors so VMware could implement its vLockstep technology, which replicates non-deterministic transactions between the processors by reproducing the CPU instructions on the other processor. All data is synchronized so there is no loss of data or transactions between the two systems. In the event of a hardware failure you may have an IP packet retransmitted, but there is no interruption in service or data loss as the secondary VM can always reproduce execution of the primary VM up to its last output.
  • FT does not use a specific CPU feature but requires specific CPU families to function. vLockstep is more of a software solution that relies on some of the underlying functionality of the processors. The software level records the CPU instructions at the VM level and relies on the processor to do so; it has to be very accurate in terms of timing and VMware needed the processors to be modified by Intel and AMD to ensure complete accuracy. The SiteSurvey utility simply looks for certain CPU models and families, but not specific CPU features, to determine if a CPU is compatible with FT. In the future, VMware may update its CPU ID utility to also report if a CPU is FT capable.
  • Currently there is a restriction that hosts must be running the same build of ESX/ESXi; this is a hard restriction and cannot be avoided. You can use FT between ESX and ESXi as long as they are the same build. Future releases may allow for hosts to have different builds.
  • VMotion is supported on FT-enabled VMs, but you cannot VMotion both VMs at the same time. Storage VMotion is not supported on FT-enabled VMs. FT is compatible with Distributed Resource Scheduler (DRS) but will not automatically move the FT-enabled VMs between hosts to ensure reliability. This may change in a future release of FT.
  • In the case of a split-brain scenarios (i.e. loss of network connectivity between hosts) the secondary VM may try and become the primary resulting in two primary VMs running at the same time. This is prevented by using a lock on a special FT file; once a failure is detected both VMs will try and rename this file, and if the secondary succeeds it becomes the primary and spawns a new secondary. If the secondary fails because the primary is still running and already has the file locked, the secondary VM is killed and a new secondary is spawned on another host.
  • You can use FT on a vCenter Server running as a VM as long as it is running with a single vCPU.
  • There is no limit to the amount of FT-enabled hosts in a cluster, but you cannot have FT-enabled VMs span clusters. A future release may support FT-enabled VMs spanning clusters.
  • There is an API for FT that provides the ability to script certain actions like disabling/enabling FT using PowerShell.
  • The four FT-enabled VM limit is per host, not per cluster, and is not a hard limit, but is recommended for optimal performance.
  • The current version of FT is designed to be used between hosts in the same data center, and is not designed to work over wide area network (WAN) links between data centers due to latency issues and failover complications between sites. Future versions may be engineered to allow for FT usage between external data centers.
  • Be aware that the secondary VM can slow down the primary VM if it is not getting enough CPU resources to keep up. This is noticeable by a lag time of several seconds or more. To resolve this try setting a CPU reservation on the primary VM which will also be applied to the secondary VM and will ensure they will run at the same CPU speed. If the secondary VM slows down to the point that it is severely impacting the performance of the primary VM, FT between the two will cease and a new secondary will be found on another host.
  • When FT is enabled any memory limits on the primary VM will be removed and a memory reservation will be set equal to the amount of RAM assigned to the VM. You will be unable to change memory limits, shares or reservations on the primary VM while FT is enabled.
  • Patching hosts can be tricky when using the FT feature because of the requirement that the hosts must have the build level. There are two methods you can use to accomplish this. The simplest method is to temporarily disable FT on any VMs that are using it, update all the hosts in the cluster to the same build level and then reenable FT on the VMs. This method requires FT to be disabled for a longer period of time; a workaround if you have four or more hosts in your cluster is to VMotion your FT enabled VMs so they are all on half your ESX hosts. Then update the hosts without the FT VMs so they are the same build levels. Once that is complete disable FT on the VMs, VMotion them to the updated hosts, reenable FT and a new secondary will be spawned on one of the updated hosts that has the same build level. Once all the FT VMs are moved and reenabled, update the remaining hosts so they are the same build level, and then VMotion the VMs so they are balanced among your hosts.

IX. And there’s more! Additional resources
We’ve provided you with a lot of information on the new FT feature that should help you understand how it works, how to set it up ,and how use it. For even more information on FT you can check out the following resources:
VMware White Papers:
Documentation:
Multimedia:
Utilities
VMworld sessions:
Additional Information:
VMware KB Articles:

Saturday, May 26, 2012


How do I use das.isolationaddress[x]?

by Duncan Epping

Recently I received a question on twitter how the vSphere HA advanced option "das.isolationaddress" should be used. This setting is used when there is the desire or a requirement to specify an additional isolation address. The isolation address is used by a host which "believes" it is isolated. In other words, if a host isn't receiving heartbeats anymore it pings the isolation address to validate if it still has network access or not. If it does still have network access (response from isolation address) then no action is taken, if the isolation address does not respond then the "isolation response" is triggered.
Out of the box the "default gateway" is used as an isolation address. In most cases it is recommended to specify at least one extra isolation address. This would be done as follows:
  • Right click your vSphere Cluster and select "Edit settings"
  • Go to the vSphere HA section and click "Advanced options"
  • Add "das.isolationaddress0" under the option column
  • And add the "IP Address" of the device you want to use as an isolation address under the value column
Now if you want to specify another isolation address you should add "das.isolationaddress1". In total 10 isolation addresses will be used. Keep in mind that all of these will be pinged in parallel! Many seem to be under the impression that this happens sequential, but that is not the case!
Now if for whatever reason the default gateway should not be used you could disable this by adding the "das.usedefaultisolationaddress" to "false". A usecase for this would be when the default gateway is a "non-pingable" device, in most scenarios it is not needed though to use "das.usedefaultisolationaddress".
I hope this helps when implementing your cluster,

Wednesday, May 16, 2012

VMware Fault Tolerance FAQ












Purpose

This article provides information for users of VMware Fault Tolerance (FT). The article contains Frequently Asked Questions that can help to resolve Fault Tolerance related issues.

Resolution

What is VMware Fault Tolerance?
VMware Fault Tolerance is a feature that allows a new level of guest redundancy. Information regarding this feature can be found in the vSphere Availability Guide for your version of ESX.
How do I turn it on?
The feature is enabled on a per virtual machine basis. Instructions for enabling Fault Tolerance can be found in the Turning on Fault Tolerance for Virtual Machines section of the vSphere Availability Guide for your version of ESX.
What happens when I turn on Fault Tolerance?
In very general terms, a second virtual machine is created to work in tandem with the virtual machine you have enabled Fault Tolerance on. This virtual machine resides on a different host in the cluster, and runs in virtual lockstep with the primary virtual machine. When a failure is detected, the second virtual machine takes the place of the first one with the least possible interruption of service. More specific information about how this is achieved can be found in the Protecting Mission-Critical Workloads with VMware Fault Tolerance whitepaper.
Why can't I turn Fault Tolerance on?
VMware Fault Tolerance can be enabled on any virtual machine that resides in a cluster that meets the necessary requirements. If you have difficulty enabling Fault Tolerance for a specific virtual machine, see The Turn on Fault Tolerance option is disabled (1010631).
How do I turn Fault Tolerance off?
Instructions for disabling Fault Tolerance can be found in the article in Disabling or Turning Off VMware FT (1008026).
How do I tell if my environment is ready for Fault Tolerance?
The VMware SiteSurvey Tool is used to check your environment for compliance with VMware Fault Tolerance. It can be downloaded at http://www.vmware.com/download/shared_utilities.html.
Where do I find the product's website?
VMware has a website for the Fault Tolerance product available online here at http://www.vmware.com/products/fault-tolerance/.
What happens during a failure?
When a host running the primary virtual machine fails, a transparent failover occurs to the corresponding secondary virtual machine. During this failover, there is no data loss or noticeable service interruption. In addition, VMware HA automatically restores redundancy by restarting a new secondary virtual machine on another host. Similarly, if the host running the secondary virtual machine fails, VMware HA starts a new secondary virtual machine on a different host. In either case there is no noticeable outage by an end user.
What is the logging time delay between the Primary and Secondary Fault Tolerance virtual machines?
The actual delay is based on the network latency between the Primary and Secondary. vLockstep executes the same instructions on the Primary and Secondary, but because this happens on different hosts, there could be a small latency, but no loss of state. This is typically less than 1 ms. Fault Tolerance includes synchronization to ensure that the Primary and Secondary are synchronized.
In a cluster with more than 3 hosts, can you tell Fault Tolerance where to put the Fault Tolerance virtual machine or does it chose on its own?
You can place the original (or Primary virtual machine). You have full control with DRS or VMotion to assign to it to any node. The placement of the Secondary, when created, is automatic based on the available hosts. But when the secondary is created and placed, you can VMotion it to the preferred host.
What happens if the host containing the primary virtual machine comes back online (after a node failure)?
This node is put back in the pool of available hosts. There is no attempt to start or migrate the primary to that host.
Is the failover from the primary virtual machine to the secondary virtual machine dynamic or does Fault Tolerance restart a virtual machine?
The failover from primary to secondary virtual machine is dynamic, with the secondary continuing execution from the exact point where the primary left off. It happens automatically with no data loss, no downtime, and little delay. Clients see no interruption. After the dynamic failover to the secondary virtual machine, it becomes the new primary virtual machine. A new secondary virtual machine is spawned automatically
Where are Fault Tolerance failover events logged?
All failover events are logged by vCenter.
I encountered an error message that I can't find in the knowledge base.  Where else should I check?
The vSphere Availability Guide contains a list of known errors in the Fault Tolerance Error Messages section.
Does Fault Tolerance support Intel Hyper-Threading Technology?Yes, Fault Tolerance does support Intel Hyper-Threading Technology on systems that have it enabled. Enabling or disabling Hyper-Threading has no impact on Fault Tolerance.
What happens if vCenter Server is offline when a failover event occurs?
Once Fault Tolerance is configured for a virtual machine, vCenter Server need not be online for FT to work. Even if vCenter Server is offline, failover will still occur from the primary to the secondary virtual machine. Additionally, the spawning of a new secondary virtual machine will also occur without vCenter Server.


Fault Tolerance Checklist 

Required c ESX/ESXi Hardware: Ensure that the processors are supported: AMD Barcelona+, Intel Penryn+ (run the CPU compatibility tool to determine compatibility).

Required c ESX/ESXi Hardware: Ensure that HV (Hardware Virtualization) is enabled in the BIOS.

Optional c ESX/ESXi Hardware: Ensure that power management (also known as power-capping) is turned OFF in the BIOS (performance implications).

Optional c ESX/ESXi Hardware: Ensure that hyper-threading is turned OFF in the BIOS (performance implications).

Required c Storage: Ensure that FT protected virtual machines are on shared storage (FC, iSCSI or NFS). When using NFS, increase timeouts and have a dedicated NIC for NFS traffic.

Required c Storage: Ensure that the datastore is not using physical RDM (Raw Disk Mapping). Virtual RDM is supported.

Required c Storage: Ensure that there is no requirement to use Storage VMotion for VMware FT VMs since Storage VMotion is not supported for VMware FT VMs.

Required c Storage: Ensure that NPIV (N-Port ID Virtualization) is not used since NPIV is not supported with VMware FT.

Optional c Storage: Ensure that virtual disks on VMFS3 are thick-eager zeroed (thin or sparsely allocated will be converted to thick-eager zeroed when VMware FT is enabled requiring additional storage space).

Optional c Storage: Ensure that ISOs used by the VMware FT protected VMs are on shared storage accessible to both primary and secondary VMs (else errors reported on secondary as if there is no media, which might be acceptable).

Optional c Network: Ensure that at least two NICs are used (NIC teaming) for ESX  management/VMotion and VMware FT logging. VMware recommends four VMkernel NICs: two dedicated for VMware VMotion and two dedicated for VMware FT.

Required c Network: Ensure that at least gigabit NICs are used (10 Gbit NICs can be used as well as jumbo frames enabled for better performance).

Optional c Redundancy: Ensure that the environment does not have a single point of failure (i.e. use NIC teaming, multiple network switches, and storage multipathing).

Required c vCenter Server: Ensure that the primary and secondary ESX hosts and virtual machines are in an HA-enabled cluster.

Required c vCenter Server: Ensure that there is no requirement to use DRS for VMware FT protected virtual machines; in this release VMware FT cannot be used with VMware DRS (although manual VMotion is allowed).

Required c vCenter Server: Ensure that host certificate checking is enabled (enabled by default) before you add the ESX/ESXi host to vCenter Server.

Required c ESX/ESXi: Ensure that the primary and secondary ESX/ESXi hosts are running the same build of VMware ESX/ESXi.

Required c Virtual Machines: Ensure that the virtual machines are NOT using more than 1 vCPU (SMP is not supported).

Required c Virtual Machines: Ensure that there is no user requirement to use NPT/EPT (Nested Page Tables/Extended Page Tables) since VMware FT disables NPT/EPT on the ESX host.

Required c Virtual Machines: Ensure that there is no user requirement to hot add or remove devices since hot plugging devices cannot be done with VMware FT.

Required c Virtual Machines: Ensure that there is no user requirement to use USB (USB must be disabled) and sound devices (must not be configured) since these are not supported for  ecord/Replay (and VMware FT).

Required c Virtual Machines: Ensure that there is no user requirement to have virtual machine snapshots since these are not supported for VMware FT. Delete snapshots from existing virtual machines before protecting with VMware FT.

Required c Virtual Machines: Ensure that virtual machine hardware is upgraded to v7.

Optional c Virtual Machines: Ensure that there are will be no more than four (to eight) VMware FT enabled virtual machine primaries or secondaries on any single ESX/ESXi host (suggested general guideline based on ESX/ESXi host and VM size and workloads which can vary).

Required c Guest OS: Ensure that the virtual machines do not use a paravirtualized guest OS.

Required c 3rd Party: Ensure MSCS clustered virtual machines will have MSCS clustering removed prior to protecting with VMware FT (and make sure that the virtual machines are not SMP).











Monday, May 14, 2012

Comparison between VMFS3 & VMFS5

Comparison between VMFS3 & VMFS5

VMFS3

VMFS5

  • Single Largest extent of 2TB less 512bytes
  • Uses MSDOS partition table
  • Supports 64TB Spanned Volume (32 extents x 2TB)
  • Different block size based on the datastore size
    (1MB/2MB/4MB/8MB)
  • Max size of RDM in physical compatibility mode would be 2TB less 512bytes.
  • Single largest extent of 64TB
  • Uses GPT partition table
  • Supports 64TB Spanned Volume (32 extents with any size combination)
  • Unified 1MB Block Size
  • Performance improvements in comparison with VMFS3
  • Max size of RDM in virtual compatibility mode would be 2TB less 512bytes.
  • Max size of RDM in physical compatibility mode would be 64TB.
  • You will use VMFS3 if you have a vSphere environment which is mix of vSphere 4.x (also 3.x) & 5.x hosts. Presently VMFS5 datastores cannot be accessible to hosts running an ESXi version less than vSphere 5.0
  • You can upgrade from VMFS3 to VMFS5 but cannot downgrade.
  • On upgrading a VMFS3 datastore, the datastore starts to use GPT partition format only after the datastore has been extended (within the extent) beyond 2TB (less 512bytes). Not sure if the behaviour is similar for a spanned volume as well, but I believe that is how it should behave.
  • Storage vMotion (migration across datastores) between a VMFS3 and VMFS5 volume is supported if you are using a vSphere 5.x host.
  • vMotion (migration across hosts) in vSphere 5.x is supported in either cases when shared storage is VMFS3 or VMFS5.
From http://virtual-drive.in

Fault Tolerance

Fault Tolerance

Fault Tolerance (FT) is a feature of vSphere HA cluster. Hence to be able to use FT you need to configure an HA cluster first. FT is enabled on those VMs which need a higher level of protection. On enabling FT on the selected VM, a selected VM becomes primary and a new secondary VM is created on one of the hosts in the HA cluster.
For being able to configure FT on VMs:
  • Configure a VMKernel portgroup enabled for FT Logging on all ESXi hosts which would host FT VMs
  • Virtual disks on FT VMs have to be “Thick Provisioned Eager Zeroed”
  • ISO/Floppy images have to be stored on Shared Storage
  • Hosts in cluster Should meet vMotion requirements, (again I don’t think this is requirement per se but would be very beneficial)
FAQs on FT VMs:
Q. Does FT support vSMP on VMs?
A. No. VMs with vSMP cannot be enabled for FT. i.e. VMs with more than 1 vCPU cannot be protected with FT.
Q. Is FT supported on AMD processors.
A. Yes, both Intel & AMD processors are supported for FT. The VMware’s KB#1008027 article lists what processors and guest operating systems are supported for FT.
Q. How many FT VMs can be hosted on a single ESXi host?
A. Maximum 4 FT VMs supported per ESXi host (either Primaries or Secondaries).
Q. What is a primary VM?
A. Primary VM is a VM which is serving the user requests. The access to this VM is always read-write.
Q. What is a secondary VM?
A. Secondary is a backup VM, which is promoted to primary in case of (original) Primary VM failure. When it is a Secondary VM, the access to this VM is always read-only.

From: http://virtual-drive.in/

Wednesday, May 2, 2012

Vmware Admin Interview Questions & Answers

1. VMWare Kernel is a Proprietary Kenral and is not based on any of the UNIX operating systems, it's a kernel developed by VMWare Company. 

2. The VMKernel can't boot it by itself, so that it takes the help of the 3rd party operating system. In VMWare case the kernel is booted by RedHat Linux operating system which is known as service console. 

3. The service console is developed based up on Redhat Linux Operating system, it is used to manage the VMKernel  

4. To restart webaccess service on vmware 
service vmware-webaccess restart – this will restart apache tomcat app 

5. To restart ssh service on vmware 
service sshd restart 

6. To restart host agent(vmware-hostd) on vmware esx server 
service mgmt-vmware restart 

7. Path for the struts-config.xml 
/usr/lib/vmware/webAccess/tomcat/apache-tomcat-5.5.17/webapps/ui/WEB-INF/ 

8. To start the scripted install the command is 
    esx ks=nfs:111.222.333.444:/data/KS.config ksdevice=eth0 
               location                                             device name

9. Virtual Network in Simple……………….
Virtual Nic(s) on Virtual Machine(s) -----> 
Physical Nic on the ESX Server (Virtual Switch - 56 Ports)  ----->
Physical Switch Port              Should be trunked with all the VLANS to which the VM's need access  
All the ESX servers should be configured with Same number of Physical Nics (vSwitches) and Connectivity also should be same, So that vMotion succeeds 
All the Virtual Machines are connected to one vSwitch with Different VLANS, this means the Physical Nic(vSwitch) needs to be trunked with the same VLANS on the Physical Switch Port

10 What are the three port groups present in ESX server networking 
   1. Virtual Machine Port Group - Used for Virtual Machine Network 
   2. Service Console Port Group - Used for Service Console Communications 
   3. VMKernel Port Group - Used for VMotion, iSCSI, NFS Communications

11. What is the use of a Port Group? 
The port group segregates the type of communication.

12. What are the type of communications which requires an IP address for sure ? 
   Service Console and VMKernel (VMotion and iSCSI), these communications does not happen without an ip address (Whether it is a single or dedicated)

13. In the ESX Server licensing features VMotion License is showing as Not used, why? 
    Even though the license box is selected, it shows as "License Not Used" until, you enable the VMotion option for specific vSwitch

 14. How the Virtual Machineort group communication works ? 
     All the vm's which are configured in VM Port Group are able to connect to the physical machines on the network. So this port group enables communication between vSwitch and Physical Switch to connect vm's to Physical Machine's

15. What is a VLAN ? 
     A VLAN is a logical configuration on the switch port to segment the IP Traffic. For this to happen, the port must be trunked with the correct VLAN ID.

16. Does the vSwitches support VLAN Tagging? Why?  
     Yes, The vSwitches support VLAN Tagging, otherwise if the virtual machines in an esx host are connected to different VLANS, we need to install a separate physical nic (vSwitch) for every VLAN. That is the reason vmware included the VLANtagging for vSwitches. So every vSwitch supports upto 1016 ports, and BTW they can support 1016 VLANS if needed, but an ESX server doesn’t support that many VM’s. :)  

17. What is Promiscuous Mode on vSwitch ? What happens if it sets to Accept? 
     If the promiscuous mode set to Accept, all the communication is visible to all the virtual machines, in other words all the packets are sent to all the ports on vSwitch 
     If the promiscuous mode set to Reject, the packets are sent to inteded port, so that the intended virtual machine was able to see the communication.

18. What is MAC address Changes ? What happens if it is set to Accept ? 
When we create a virtual machine the configuration wizard generates a MAC address for that machine, you can see it in the .vmx (VM Config) file. If it doesn't matches with the MAC address in the OS this setting does not allow incoming traffic to the VM. So by setting Reject Option both MAC addresses will be remains same, and the incoming traffic will be allowed to the VM.

19. What is Forged Transmits ? What happens if it is set to Accept ? 
When we create a virtual machine the configuration wizard generates a MAC address for that machine, you can see it in the .vmx (VM Config) file. If it doesn't matches with the MAC address in the OS this setting does not allow outgoing traffic from the VM. So by setting Reject Option both MAC addresses will be remains same, and the outgoing traffic will be allowed from the VM.

20. What are the core services of VC ? 
VM provisioning , Task Scheduling and Event Logging

21. Can we do vMotion between two datacenters ? If possible how it will be? 
Yes we can do vMotion between two datacenters, but the mandatory requirement is the VM should be powered off.

22. What is VC agent? and what service it is corresponded to? What are the minimum req's for VC agent installation ? 
VC agent is an agent installed on ESX server which enables communication between VC and ESX server. 
The daemon  associated with it is called vmware-hostd , and the service which corresponds to it is called as mgmt-vmware, in the event of VC agent failure just restart the service by typing the following command at the service console       
     " service mgmt-vmware restart " 
VC agent installed on the ESX server when we add it to the VC, so at the time of installtion if you are getting an error like " VC Agent service failed to install ", check the /Opt size whether it is sufficient or not.

23. How can you edit VI Client Settings and VC Server Settings ? 
Click Edit Menu on VC and Select Client Settings to change VI settings 
Click Administration Menu on VC and Select VC Management Server Configuration to Change VC Settings

24. What are the files that make a Virtual Machine  ? 
     .vmx - Virtual Machine Configuration File 
     .nvram - Virtual Machine BIOS 
     .vmdk - Virtual Machine Disk file 
     .vswp - Virtual Machine Swap File 
     .vmsd - Virtual MAchine Snapshot Database 
     .vmsn - Virtual Machine Snapshot file 
     .vmss - Virtual Machine Suspended State file 
     .vmware.log - Current Log File 
     .vmware-#.log - Old Log file

25. What are the devices that can be added while the virtual Machine running 
In VI 3.5 we can add Hard Disk and NIC's while the machine running.
In vSphere 4.0 we can add Memory and Processor along with HDD and NIC's while the machine running 

26. How to set the time delay for BIOS screen for a Virtual Machine? 
Right Click on VM, select edit settings, choose options tab and select boot option, set the delay how much you want.

27. What is a template ? 
We can convert a VM into Template, and it cannot be powered on once its changed to template. This is used to quick provisioning of VM's.

23. What to do to customize the windows virtual machine clone,? 
copy the sysprep files to Virtual center directory on the server, so that the wizard will take the advantage of it.

24. What to do to customize the linux/unix virtual machine clone,? 
VC itself includes the customization tools, as these operating systems are available as open source.

25. Does cloning from template happens between two datacenters ? 
Yes.. it can, if the template in one datacenter, we can deploy the vm from that template in another datacenter without any problem.

26. What are the common issues with snapshots? What stops from taking a snapshot and how to fix it ? 
If you configure the VM with Mapped LUN's, then the snapshot failed. If it is mapped as virtual then we can take a snapshot of it. 
If you configure the VM with Mapped LUN's as physical, you need to remove it to take a snapshot.

27. What are the settings that are taken into to consideration when we initiate a snapshot ? 
Virtual Machine Configuration (What hardware is attached to it) 
State of the Virtual Machine Hard Disk file ( To revert back if needed) 
State of the Virtual Machine Memory (if it is powered on)

28. What are the requirements for Converting a Physical machine to VM ? 
An agent needs to be installed on the Physical machine 
VI client needs to be installed with Converter Plug-in 
A server to import/export virtual machines

29. What is VMWare consolidated backup ? 
It is a backup framework, that supports 3rd party utilities to take backups of ESX servers and Virtual Machines. Its not a backup service.

The user must be member of administrator, The user should have "Logon as service" privileges - To give a user these privileges,open local sec policy, select Logon as service policy and add the user the user should have read access to AD to send queries