Dentistry has made me a liar.
One of my tasks each morning is to make sure my youngest son brushes his teeth. Someone, somewhere decided that two minutes of tooth brushing will ensure optimal oral hygiene, which target has been transmitted by our dentist to our very bright, but very literal six year-old. Every morning when he has thoroughly brushed and I give him the go-ahead to rinse, he asks “Dad, was that two minutes?”, to which I reply “yes, yes it was”, regardless of how long it took. I’m a horrible person, yes, but trying to explain the nuances to him at this age would be a pain on par with trimming my nails with a chainsaw – the lie works out better for all involved.
My daily moral dilemma has a very common source – metrics are frequently signals of a condition, rather than the condition itself. When you cannot use a direct measure (e.g. number of widgets per hour), it’s usual to substitute a proxy that indicates the desired condition (at least that’s the plan). A simplistic choice, however, will lead to metrics that fail to hold true. Two minutes spent brushing well should yield very good results, however, inadequate brushing, no matter how long you spend doing it, will always be inadequate. This is a prime example of what Seth Godin termed “…measuring what’s easy to measure as opposed to what’s important”.
Examples of these types of measures in software development are well-known:
- Lines of Code: Rather than productivity, this just measure typing. Given two methods equal in every other way, would the longer one be better?
- Bug Counts: Not all bugs are created equal. One substantive bug can outweigh thousands of cosmetic ones.
- Velocity/Turn Time: Features and (again) bugs are not created equal. Complexity, both business and technical, as well as clarity of the problem, tend to be have more impact on time to complete than effort expended or size.
As John Sonmez noted in “We Can’t Measure Anything in Software Development”: “We can track the numbers, but we can’t draw any good conclusions from them.”
There are a number of reasons these measures are unreliable. First, is the tenuous ties between the measures and what they hope to represent as noted above. Second, is the phenomenon known as Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure”. In essence, when people know that a certain number is wanted/expected, the system will be gamed to achieve that number. Most importantly, however, is that value is the desired result, not effort. In manufacturing, more widgets per hour means greater profits (assuming sufficient demand). For software development, excess production can likely yield excess risk.
None of this is to suggest that metrics, including those above, are useless. What is important, however, is not the number, but what the number signals (particularly when metrics are combined). Increasing lines of code over time coupled with increasing bug counts and/or decreasing velocity may signal increased complexity/technical debt in your codebase, allowing you to investigate before things reach critical mass. Capturing the numbers to use as an early warning mechanism will likely bear much more fruit than using them as a management tool, where they likely become just a lie we tell ourselves and others.