Mixed Signals and Messed-Up Metrics

yes...no...um, maybe?

Dentistry has made me a liar.

One of my tasks each morning is to make sure my youngest son brushes his teeth. Someone, somewhere decided that two minutes of tooth brushing will ensure optimal oral hygiene, which target has been transmitted by our dentist to our very bright, but very literal six year-old. Every morning when he has thoroughly brushed and I give him the go-ahead to rinse, he asks “Dad, was that two minutes?”, to which I reply “yes, yes it was”, regardless of how long it took. I’m a horrible person, yes, but trying to explain the nuances to him at this age would be a pain on par with trimming my nails with a chainsaw – the lie works out better for all involved.

My daily moral dilemma has a very common source – metrics are frequently signals of a condition, rather than the condition itself. When you cannot use a direct measure (e.g. number of widgets per hour), it’s usual to substitute a proxy that indicates the desired condition (at least that’s the plan). A simplistic choice, however, will lead to metrics that fail to hold true. Two minutes spent brushing well should yield very good results, however, inadequate brushing, no matter how long you spend doing it, will always be inadequate. This is a prime example of what Seth Godin termed “…measuring what’s easy to measure as opposed to what’s important”.

Examples of these types of measures in software development are well-known:

  • Lines of Code: Rather than productivity, this just measure typing. Given two methods equal in every other way, would the longer one be better?
  • Bug Counts: Not all bugs are created equal. One substantive bug can outweigh thousands of cosmetic ones.
  • Velocity/Turn Time: Features and (again) bugs are not created equal. Complexity, both business and technical, as well as clarity of the problem, tend to be have more impact on time to complete than effort expended or size.

As John Sonmez noted in “We Can’t Measure Anything in Software Development”: “We can track the numbers, but we can’t draw any good conclusions from them.”

There are a number of reasons these measures are unreliable. First, is the tenuous ties between the measures and what they hope to represent as noted above. Second, is the phenomenon known as Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure”. In essence, when people know that a certain number is wanted/expected, the system will be gamed to achieve that number. Most importantly, however, is that value is the desired result, not effort. In manufacturing, more widgets per hour means greater profits (assuming sufficient demand). For software development, excess production can likely yield excess risk.

None of this is to suggest that metrics, including those above, are useless. What is important, however, is not the number, but what the number signals (particularly when metrics are combined). Increasing lines of code over time coupled with increasing bug counts and/or decreasing velocity may signal increased complexity/technical debt in your codebase, allowing you to investigate before things reach critical mass. Capturing the numbers to use as an early warning mechanism will likely bear much more fruit than using them as a management tool, where they likely become just a lie we tell ourselves and others.

11 thoughts on “Mixed Signals and Messed-Up Metrics

  1. Some good points raised, particularly about lines of code measuring typing rather than productivity. You may be interested in a blog post one of my previous co-workers created on metrics, “http://www.cainhopwood.com/2012/01/metrics-the-good-the-bad-the-ugly/”, as he was grappling with setting targets for and measuring his development team.

    Like

  2. Love the article by Tom DeMarco. His point that we need to first make sure we are building something useful before we worry about controlling the process is well-taken. As an aside, his book Structured Analysis and System Specification by Tom DeMarco and P. J. Plauger (May 21, 1979) was instrumental in how I forever-after viewed software design. It’s still on my must-read list and has so many ideas that transcend the specifics of top-down design and apply equally well to object-oriented architectures. Even though top-down design and Yourdon methodology is no longer taught, I would still recommend the book to every junior programmer.

    Like

    • Absolutely. Control is a very tricky thing in non-linear systems and to think that we can just tweak a knob in response to a particular signal (particularly if our understanding of the signal is shaky) and get predictable results is beyond optimistic.

      Like

  3. Pingback: What Makes a Microservice “Micro”? | Form Follows Function

  4. Somewhat more micro that your examples, but I just read an excerpt from a book claiming that you should never, under any circumstances, encapsulate more than four objects. Why four? According to him, it’s because four is the maximum number- very circular. Arbitrarily chosen because it made sense to him. The needs of the project, the appropriateness of the design, the future expectations of the application- none of these are relevant. The number is the metric.

    Like

    • nice…I never trust absolute statements (which is an absolute statement 🙂 ) Particularly without an explanation of why the theory applies.

      Something that seems universal probably means that we just haven’t found the right case where it fails to apply.

      Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.