Late on a September afternoon in 1812, outside a village on the road to Moscow, Napoleon had a problem. After nine hours of grinding battle in which both armies sustained massive losses, the Russian armies were on the verge of disintegration. Napoleon’s staff was begging him to commit his elites, the Imperial Guard, and complete the victory, but Marshall Bessières asked “Will you risk your last reserves eight hundred miles from Paris?” He would not, and although the French army would march into Moscow a week later on September 14, it would also march back out five weeks later, retreating back to the Polish border. Nearly five sixths of the 685,000 man army that started the campaign had been lost, and the end of Napoleon’s control of the continent was in sight.
While my inner history geek finds this fascinating in and of itself, there is an architecturally significant moral to this story. The lack of reserves limits your options and the limited set of options you’re left with tends to range from bad to worse. Rather than reserves of soldiers, supplies, and ammunition, we deal in reserves of storage, memory, processor, and bandwidth. Exhausting these reserves can lead to catastrophe:
A system without spare capacity means that a single unexpected event can have major knock-on consequences. cc @SW_Trains
— Kris Coverdale (@kriscoverdale) November 22, 2013
It can be tempting for some to seek out high levels of utilization. In their minds, a system that spends the majority of its time at fifty percent utilization is wasting fifty percent of its resources. After all, while storage, memory, processor, and bandwidth are cheaper than in the past, the cost is still non-zero. Far better, in their opinion, to more closely manage the allocation and eliminate the waste.
The problem, of course, is that exceeding the critical level of a resource will degrade a system. Whether that degradation is in the form of a crash or is handled more gracefully via throttling, queuing, suspending functionality, etc. it is still degradation. The only way to prevent degradation is to insure that sufficient excess capacity is in place to handle peak loads. For storage, this would be an amount sufficient to hold data, etc. generated by peak load for the period it would take to recognize and respond to the impending shortage.
Peak load is the critical metric as this reflects the worst case scenario. In situations where resources are shared, peak load across all systems should be used. Average load is useless in this context due to its smoothing out the peaks and valleys (remember, if you stick one foot in a bucket of ice water and the other in bucket of boiling water, on average, you’re comfortable).
Maintaining reserve capacity is guaranteed to allocate excess resources that will be wasted money – just as backups, disaster recovery environments, and insurance all represent wasted money (until needed). Obsessing about the cost of those excess resources without factoring in the cost of an outage or slowdown is a perfect example of being penny-wise and pound foolish.