Bigger isn’t always better. We are going to explore right-sizing hardware for vSphere hosts in a failover cluster and choosing the correct settings for reserving sufficient failover capacity.
Basic HA deployment of vSphere or any other HA technology for the matter, starts at 2 hosts – one host fails, another takes over. This is not a scenario you will find in the wild very often, mainly due to the attractive vSphere ‘Essentials’ licensing scheme covering 3 hosts and prohibitive penalty of reserving 50% of your resources just for failover. At 3 hosts with default configuration we are setting aside one third.
Do we really need to be so conservative?
vSphere resources set aside for HA are defined in either Host Failures Cluster Tolerates or Percentage and are enforced by HA Admission Control. The first setting in the default out-of-the-box configuration is simply set to 1 host regardless of the host count in a cluster. Chances are you never had to deal with HA Admission Control unless one of the admins ‘temporarily’ disabled it to turn on ‘just one more VM’ and caused HA to cease functioning altogether.
The alternatives to defaults: Gain efficiency by sacrificing uptime – is it really a viable choice?
In a HA failover scenario, you are operating in a degraded state – do you really need that ‘Test-Dev’ group of VMs to stay up? When an HA event is taking place your team’s efforts are focused on investigating the outage, as opposed to taking that brand-new ERP software for a spin.
Most of us already manage HA failover behavior by setting per-VM restart priority in case of a HA event. So what is the difference?
By changing ‘Host Failures Cluster Tolerates to ‘% of Resources Reserved for Failover’ you are effectively loosening HA Admission Control Policy. This has a direct impact on how many VMs can be powered on in your cluster.
What are the numbers?
It’s napkin math time. A run-of-the-mill 3 host cluster with 1 entire host set for failover wastes 33% of resources at all times. Changing to ‘% of Resources Reserved’ for failover pre-populates at 25%. You gain 7 percentage points in efficiency which equates to a 21% decrease in wasted resources.
Is it worth it?
A 21% decrease in waste does sound pretty good. Understand changing this WILL impact how many of your VMs can successfully failover.
But is this a good idea?
It is certainly a better one than mucking with HA Admission Control.
So what does any of this have to do with host sizing?
If you made it this far, you can clearly see how much easier this would have been if you had more hosts to work with. Next time you are adding or replacing hosts in your cluster take the time to at least consider the math of going from 3 high performance hosts to 5 midrange units. Is it more hardware, licensing, networking? Yes.
Is it a better fir for your environment? Do the numbers make sense? You should find out.
JacobR, PEI