Form Follows Function on SPaMCast 446

SPaMCAST logo

It’s time for another appearance on Tom Cagley’s Software Process and Measurement (SPaMCast) podcast.

This week’s episode, number 446, features Tom’s essay on questions, a powerful tool for coaches and facilitators. A Form Follows Function installment based on my post “Go-to People Considered Harmful” comes next and Kim Pries rounds out the podcast with a Software Sensei column on servant leadership.

Our conversation in this episode continues with the organizations as system concept and how concentrating institutional knowledge in go-to people creates a dependency management nightmare. Social systems run on relationships and when we allow knowledge and skill bottlenecks to form, we set our organization up for failure. Specialists with deep knowledge are great, but if they don’t spread that knowledge around, we risk avoidable disasters when they’re unavailable. Redundancy aids resilience.

You can find all my SPaMCast episodes using under the SPaMCast Appearances category on this blog. Enjoy!

Go-to People Considered Harmful

Neck of Codd bottle

Okay, so the title’s a little derivative, but it’s both accurate and it fits in with the “organizations as systems” theme of recent posts. Just as dependency management is important for software systems, it’s likewise just as critical for social systems. Failures anywhere along the chain of execution can potentially bring the whole system to a halt if resilience isn’t considered in the design (and evolution) of the system.

Dependency issues in social systems can take a variety of forms. One that comes easily to mind is what is referred to as the “bus factor” – how badly the team is affected if a person is lost (e.g. hit by a bus). Roy Osherove’s post from today, “A Critical Chain of Bus Factors”, expands on this. Interlocking chains of dependencies can multiply the bus factor:

A chain of bus factors happens when you have bus factors depending on bus factors:

Your one developer who knows how to configure the pipeline can’t test the changes because the agent is down. The one guy in IT who has access to the agent needs to reboot it, but does not have access. The one person who has access to reboot it (in the Infra team) is sick, so now there are three people waiting, and there is nothing in this earth that can help that situation.

The “bus factor”, either individually or as a cascading chain, is only part of the problem, however. A column on CIO.com, “The hazards of go-to people”, identifies the potential negative impacts on the go-to person:

They may:

  • Resent that they shoulder so much of the burden for the entire group.
  • Feel underpaid.
  • Burn out from the stress of being on the never-ending-crisis treadmill.
  • Feel trapped and unable to progress in their careers since they are so important in the role that they are in.
  • Become arrogant and condescending to their peers, drunk with the glory of being important.

The same column also lists potential problems for those who are not the go-to person:

When they realize that they are not one of the go-to people they might:

  • Feel underappreciated and untrusted.
  • Lose the desire to work hard since they don’t feel that their work will be recognized or rewarded.
  • Miss out on the opportunities to work on exciting or important things, since they are not considered dedicated and capable.
  • Feel underappreciated and untrusted.

A particularly nasty effect of relying on go-to people is that it’s self-reinforcing if not recognized and actively worked against. People get used to relying on the specialist (which is, admittedly, very effective right up until the bus arrives) and neglect learning to do for themselves. Osherove suggests several methods to mitigate these problems: pairing, teaching, rotating positions, etc. The key idea being, spreading the knowledge around.

Having individuals with deep knowledge can be a good thing if they’re a reservoir supplying others and not a pipeline constraining the flow. Intentional management of dependencies is just as important in social systems as in software systems.

Why does software development have to be so hard?

Untangling this could be tricky

A series of 8 tweets by Dan Creswell paints a familiar, if depressing, picture of the state of software development:

(1) Developers growing up with modern machinery have no sense of constrained resource.

(2) Thus these developers have not developed the mental tools for coping with problems that require a level of computational efficiency.

(3) In fact they have no sensitivity to the need for efficiency in various situations. E.g. network services, mobile, variable rates of change.

(4) Which in turn means they are prone to delivering systems inadequate for those situations.

(5) In a world that is increasingly networked & demanding of efficiency at scale, we would expect to see substantial polarisation.

(6) The small number of successful products and services built by a few and many poor attempts by the masses.

(7) Expect commodity dev teams to repeatedly fail to meet these challenges and many wasted dollars.

(8) Expect smart startups to limit themselves to hiring a few good techies that will out-deliver the big orgs and define the future.

The Fallacies of Distributed Computing are more than twenty years old, but Arnon Rotem-Gal-Oz’s observations (five years after he first made them) still apply:

With almost 15 years since the fallacies were drafted and more than 40 years since we started building distributed systems – the characteristics and underlying problems of distributed systems remain pretty much the same. What is more alarming is that architects, designers and developers are still tempted to wave some of these problems off thinking technology solves everything.

Why?

Is it really this hard to get it right?

More importantly, how do we change this?

In order to determine a solution, we first have to understand the nature of the problem. Dan’s tweets point to the machines developers are used to, although in fairness, those of us who lived through the bad old days of personal computing can attest that developers were getting it wrong back then. In “Most software developers are not architects”, Simon Brown points out that too many teams are ignorant of or downright hostile to the need for architectural design. Uncle Bob Martin in “Where is the Foreman?”, suggests the lack of a gatekeeper to enforce standards and quality is why “our floors squeak”. Are we over-emphasizing education and underestimating training? Has the increasing complexity and the amount of abstraction used to manage it left us with too generalized a knowledge base relative to our needs?

Like any wicked problem, I suspect that the answer to “why?” lies not in any one aspect but in the combination. Likewise, no one aspect is likely, in my opinion, to hold the answer in any given case, much less all cases.

People can be spoiled by the latest and greatest equipment as well as the optimal performance that comes for working and testing on the local network. However, reproducing real-world conditions is a bit more complicated than giving someone an older machine. You can simulate load and traffic on your site, but understand and accounting for competing traffic on the local network and the internet is a bit more difficult. We cannot say “application x will handle y number of users”, only that it will handle that number of users under the exact same conditions and environment as we have simulated – a subtle, but critical difference.

Obviously, I’m partial to Simon Brown’s viewpoint. The idea of a coherent, performant design just “emerging” from doing the simplest thing that could possibly work is ludicrous. The analogy would be walking into an auto parts store, buying components individually, and expecting them to “just work” – you have to have some sort of idea of the end product in mind. On the other hand, attempting to specify too much up front is as bad as too little – the knowledge needed is not there and even if it were, a single designer doesn’t scale when dealing with any system that has a team of any real size.

Uncle Bob’s idea of a “foreman” could work under some circumstances. Like Big Design Up Front, however, it doesn’t scale. Collaboration is as important to the team leader as it is to the architect. The consequences of an all-knowing, all-powerful personality can be just as dire in this role as for an architect.

In “Hordes of Novices”, Bob Martin observed “When it’s possible to get a degree in computer science without writing any code, the quality of the graduates is questionable at best”. The problem here is that universities are geared to educate, not train. Just because training is more useful to an employer (at least in the short term), does not make education unimportant. Training deals with this tool at this time while how to determine which tool is right for a given situation is more in the province of education. It’s the difference between how to do versus how to figure out how to do. Both are necessary.

As I’ve already noted, it’s a thorny issue. Rather than offering an answer, I’d rather offer the opportunity for others to add to the conversation in the comments below. How did we get here and how do we go forward?

Specialists, Generalists and “Blame the Victim”

An article I read this weekend on TechRepublic caught my eye with its title: “Insist medical practices hire specialists rather than one generalist”. The author (Donovan Colbert) discusses another post by another author (Erik Eckel) who had been called in to clean up a mess left by a previous IT provider. The previous provider had incurred a significant cost for the client and failed to correct the issue. Donovan’s summed it up thusly:

What kind of business throws a Raid 10, 32 GB, SSD, dual CPU/multi-core server at a performance problem and yet has a network that some hack threw together with daisy-chained D-Link unmanaged switches? A healthcare provider.

Wow. Blame the victim much? He goes on to say:

Erik claims the problem is amateur IT consultants, but I think that is a symptom. The problem is healthcare professionals; more specifically, the doctors who run the medical practices cause many of these issues.

I wonder what doctor confronted with a specific and serious medical aliment would consider the services of a one-stop general practice to perform the necessary services to restore her to health. No MD would go in for surgery if one guy claimed he was capable of doing it all — the cutting, the clamping, the anesthetics, and any other specialties required to operate; and yet, doctors will hire one consultant to handle their entire IT solution, including systems, applications, networking, wireless, hardware, and printers.

I’m not sure what it is about health care professionals that pushes Donovan’s buttons, but it seems to me that Erik is right on the money. No doctor (or anyone else, for that matter) is going to depend on a generalist for specialist procedures, but most people don’t have a stable full of specialists on call, either. They start out with a general practitioner, who will refer them out to specialists as appropriate. That same model should work for IT as well.

Just as the patient (in most cases) isn’t a doctor, the doctor isn’t an IT person. Expecting a doctor, or for that matter, any non-IT business person, to independently manage their services is as ridiculous as expecting a patient without any medical background to be able to independently manage their own care.

The issue in this instance appears to be one of professionalism. The prior provider lacked the requisite knowledge and failed their customer by not calling in someone who had the skills needed. Worse, that provider caused the client to pay for something that wasn’t needed. Ironically, it was Donovan himself who identified the answer:

My gift as an IT professional is not how outrageously skilled and knowledgeable I am from one end of the IT spectrum to another; my gift is that I know enough to realize when I am out of my scope, and to say, “we need to bring in someone else who specializes in this area to deal with this issue.”