Organizations as Systems – Kurosawa, Clausewitz, and Chess

16th Century Market Scene

In order to respond appropriately to the context we find ourselves in, it’s helpful that we be able to correctly define that context. It’s something humans aren’t always good at.

Not too long ago, Sun Tzu’s The Art of War was all the rage as among executives. While the book contains some excellent lessons that have applications beyond the purely military, as someone in my Twitter feed noted recently, “Business is not war”.

[Had I realized that the tweet, in combination with another article, would trigger something in my byzantine thought processes, I would have bookmarked it to give them credit – sorry!]

Business is, indeed, not war. In fact, one of the nuggets of wisdom to be found in Clausewitz’s treatise, On War, is that war is often not war. Specifically, what he is saying is that the reality of a concept often diverges from our (mis)understanding of that concept. Our perception is colored by factors such as our experience, beliefs, and interests. Additionally, our tendency to employ abstraction can be both tool and trap. Ignoring irrelevant detail can simplify reasoning about something, assuming that the detail ignored is actually irrelevant. Ignoring relevant detail can quickly lead to problems.

The game of chess illustrates this. Chess involves strategy and has its origins as an abstract simulation of war. Beyond promoting a very rudimentary type of strategic thought, chess is far from capable of simulating the complex social system of warfare. Perhaps if all the pieces were sentient and had both agency and agenda (bonus points for contradictory ones potentially conflicting with the player’s agenda), it might come closer. Perhaps if the boundaries of the arena were indeterminate, it might come closer. Perhaps if the state of the terrain, the composition and disposition of forces (friend, as well as foe), and the goals of the opponent were less transparent, it might come closer.

In short, the more certainty there is, the less accuracy there is. Where the human aspect is ignored or minimized, you may gain certainty, but it comes at the cost of losing contact with reality. Social systems are highly complex and treating them otherwise is like looking for a gas leak with a lighter – you may be able to do so, but your chances of liking the results are pretty small.

This post was originally planned to be for last week, but I stumbled into a Twitter conversation that illustrates my point (specifically re: leadership and management), so I wrote that first as a preamble. Systems of practice designed for a context where value equals effort expended are unlikely to work well in a knowledge work context where the relationship between effort and value is less direct (where, in fact, the value curve may invert past a certain point). Putting an updated veneer on the technique with data and algorithms won’t improve the results if the technique is fundamentally mismatched to the context (or if there is a disconnect between what you can measure and what you actually want). Sometimes, the most important thing to learn about management is when not to manage.

Disconnects between complex contexts and simplistic practices transcend the management of an organization, reaching into the very architecture of the enterprise itself (both in the organization and its relationship to its ecosystem). Poorly designed organizations (which includes those with no intentional design) can wind up with their employees faced with perverse incentives to act in a manner that conflicts with the best interests of the organization. When the employee is actually under pressure from the organization to sabotage the organization, the problem is not with the employee.

Just as with a software system, social systems have both problem and solution architectures. Likewise, in both cases the quality of the solution architecture is dependent on how well (or not) it addresses the architecture of the problem. Recognizing the various contexts in play and then resolving the conflicts between them (to include resolving challenges arising from the resolution of the original conflicts) is the essence of architectural design, regardless of the type of system (software or social). Rather than a static, one time activity, it is an ongoing need for sensing system health and responding appropriately throughout the lifecycle of the system (in fact, stopping the process will likely hasten the end of the lifecyle by way of achieving a state where the system cannot be corrected).

Form Follows Function on SPaMCast 411


This week’s episode of Tom Cagley’s Software Process and Measurement (SPaMCast) podcast, number 411, features Tom’s essay on Servant Leadership (which I highly recommened), John Quigley on managing requirements as a part of product management, a Form Follows Function installment based on my post “Organizations as Systems – ‘Uneasy Lies the Head that Wears the Crown'”, and Kim Pries on software craftsmanship.

Tom and I discuss the danger of trying to use simplistic explanations for the interactions that make up complex human systems. No one has the power to force things in a particular direction, rather the direction comes about as a result of the actions and interactions of everyone involved. It might be comforting to believe that there’s one single lever for change, but it’s wrong.

You can find all my SPaMCast episodes using under the SPAMCast Appearances category on this blog. Enjoy!

Organizations as Systems – “Uneasy Lies the Head that Wears the Crown”

Bavarian Crown and Regalia, Royal Treasury Munich


One of the benefits of having a (very) wide range of interests is that every so often a flash of insight gets dropped into my lap. In this case, it was a matter of “We must recognise that single events have multiple causes” showing up as a suggested read from Aeon on the same day that Thomas Power retweeted this:

The image in the tweet is an excerpt from an interview with Rory Stewart, Conservative Member of Parliament for Penrith in the UK. The collision of themes between the two articles struck me.

“You get there and you pull the lever, and nothing happens.”

The behavior of a system is determined not by the structure of the components of that system, but by the relationships and interactions between those components. Moreover, those relationships and interactions are dynamic and complex, even when that’s contrary to the designer’s intent. In fact, the gap between the behavior as intended and as experienced introduces a tension. I would argue that it’s less a matter of nothing happening when the “lever” is pulled and more that something different from what’s expected happens. Rather than simple cause and effect, “if this, then that”, multiple factors are in play.

In mechanical systems, parts wear, subtly changing the physics of the mechanism. Foreign objects invading the system can impose change in a more dramatic fashion. Context, both that of the system’s internals and its environment, influences its operation.

As was noted in the Aeon article, agency adds to the complexity. In social systems, all of the “components” are individuals with agency, making those systems chaotic in at least the colloquial sense of the word. Using Tom Graves’ sense-making framework, SCAN, these interactions fall into the more uncertain quadrants, either “Ambiguous but Actionable” or “Not-known, None-of-the-above”. Attempting to deal with them as though they fell into the “Simple and Straightforward” quadrant increases the likelihood of getting unexpected results.

Learning/sense-making is critical to dealing with change, whether internal or external (or both). The manner in which change is appreciated and reacted to, affects the health of the system. Consider three boilers: one where pressure is continuously monitored and adjusted, one which is equipped with a pressure relief valve which will open prior to a catastrophic failure, and one where problems are signaled by an explosion. It’s a trivial exercise to come up with examples of social systems, from businesses all the way up to political systems, using the third method. It’s probably a more interesting exercise to consider why that’s the case for so many.

In a recent post, “Architecting the shadows”, Tom Graves discussed the phenomenon of ad hoc, unofficial “shadow” organizational interactions that arise in order to get work done:

In SCAN terms, we could summarise the generic positioning of all ‘shadow’ functions – shadow-IT, shadow-business-models, shadow-management and more – much as follows:

Scan Diagram: Official vs. Shadow

In other words, the ‘shadow’-world exists to deal with and resolve all the uncertainties and over-simplifications that overly-mechanistic management models tend to overlook. Even in more aware management-models, in which some exploration of the uncertain is officially sanctioned and allowed, the shadow-world will still always need to exist – particularly whenever the work gets closer towards real-time action:

Scan Diagram: Official vs. Shadow showing sanctioned Shadow Activity

In closing the post, Tom makes the following observation:

As the literal ‘the architecture of the enterprise’, a real enterprise-architecture must, by definition, cover every aspect of the enterprise – including all of the ‘shadow’-elements. And yet, also by definition, those ‘shadow’-elements cannot be brought ‘under control’ – not least because they deal with the themes and factors that are beyond the reach of conventional concepts of ‘control’.

The “conventional concepts of ‘control'”, the deluded belief that complex interactions can be managed as though they were simple, poses an immense risks to organizations. Even attempting to treat those interactions as merely complicated, rather than complex, introduces a gap between reality and perception, between “the way we do things” and the way things actually get done. When the concept and reality of the system’s interactions differ, it’s more likely that the components of the system will wind up working at cross-purposes.

In a comment on Tom’s post, I noted that where the shadow elements are a “French Resistance”, flouting the rules in order to actually get work done, that’s a red flag.

The most important thing to learn about management and governance is knowing when and how to manage or govern and more importantly, when not to. Knowing what can actually be controlled is an important first step.

Form Follows Function on SPaMCast 389


This week’s episode of Tom Cagley’s Software Process and Measurement (SPaMCast) podcast, number 389, features Tom’s essay on Agile acceptance testing, Kim Pries talking about soft skills, and a Form Follows Function installment on sense-making and decision-making in the practice of software architecture.

Tom and I discuss my post “OODA vs PDCA – What’s the Difference?”. We talk about what differentiates John Boyd’s Observe-Orient-Decide-Act loop from the Plan-Do-Check-Act cycle made famous by Deming.

You can find all my SPaMCast episodes using under the SPAMCast Appearances category on this blog. Enjoy!

NPM, Tay, and the Need for Design

Take a couple of seconds and watch the clip in the tweet below:

While it would be incredibly difficult to predict that exact outcome, it is also incredibly easy to foresee that it’s a possibility. As the saying goes, “forewarned is forearmed”.

Being forewarned and forearmed is an important part of what an architect does. An architect is supposed to focus on the architecturally significant aspects of a system. I like to use Ruth Malan‘s definition of architectural significance due to its flexibility:

Decisions (both those that were made and those that were left unmade) that end up taking systems offline and causing very public embarrassment are, in my opinion, architecturally significant.

Last week, two very public, very foreseeable failures took place: first was the chaos caused by a developer removing his modules from NPM, which was followed by Microsoft having to pull the plug on its Tay chatbot when it was “trained” to spew offensive comments in less than 24 hours. In my opinion, these both represented design failures resulting from a lack of due consideration of the context in which these systems would operate.

After all, can anyone really claim that no one would expect that people on the internet would try to “corrupt” a chatbot? According to Azeem Azhar as quoted in Business Insider, not really:

“Of course, Twitter users were going to tinker with Tay and push it to extremes. That’s what users do — any product manager knows that.

“This is an extension of the Boaty McBoatface saga, and runs all the way back to the Hank the Angry Drunken Dwarf write in during Time magazine’s Internet vote for Most Beautiful Person. There is nearly a two-decade history of these sort of things being pushed to the limit.”

The current claim, as reported in, is that Tay was developed with filtering built-in, but there was a “critical oversight” for a specific kind of attack. According to the article, it’s believed that the attack vector involved asking Tay to “repeat after me”.

Or, as Matt Ballantine put it:

Likewise, who could imagine issues with a centralized repository of cascading dependencies? Failing to consider what would happen if someone suddenly pulled one of the bottom blocks out led to a huge inconvenience to anyone depending on that module or any downstream module. There’s plenty of blame to go around: the developer who took his toys and went home, those responsible for NPM’s design, and those who depended on it without understanding its weaknesses.

“The Iron Law of Tools” is “that which does for you will also do to you”. Understanding the trade-offs allows you to plan for risk mitigation in advance. Ignoring them merely ensures that they will have to be dealt with in crisis mode. This is something I covered in a previous post, “Dependency Management is Risk Management”.

Effective design involves not only the internals of a system but its externals as well. The conditions under which the system will be used, it’s context, is highly significant. That means considering not only the system’s use cases, but also its abuse cases. A post written almost a year ago by Brandon Harris, “Designing for Evil”, conveys this well:

When all is said and done, when you’ve set your ideas to paper, you have to sit down and ask yourself a very specific question:

How could this feature be exploited to harm someone?

Now, replace the word “could” with the word “will.”

How will this feature be exploited to harm someone?

You have to ask that question. You have to be unflinching about the answers, too.

Because if you don’t, someone else will.

When I began working on this post, the portion above was what I had in mind to say. In essence, I planned a longer-form version of what I’d tweeted about the Tay fiasco:

However, before I had finished writing the post, Greger Wikstrand posted “The fail fast fallacy”. Greger and I have been carrying on a conversation about innovation over the last few months. While I had initially intended to approach this as a general issue of architectural practice rather than innovation, the points he makes are just too apropos to leave out.

In the post, Greger points out that the focus seems to have shifted from learning to failure. Learning from experience can be the best way to test an idea. However, it’s not the only way:

Evolution and nature has shown us that there are two, equally valid, approaches to winning the gene game. The first approach is to get as much offspring as possible and “hope” many of them survive (r-selection). The second approach is to have few offspring but raise them and nurture them carefully (K-selection). Biologists tell us that the first strategy works best in a harsh, unpredictable environment where the effort of creating offspring is low. The second strategy works better in an environment where there is less change and offspring are more expensive to produce. Some of the factors that favour r-selection seems to be large uncompeted resources. K-selection is more favourable in resource scarce, low predator areas.

The phrase “…where the effort of creating offspring is low” is critical here. The higher the “cost” of the experiment, the more risk is involved in failure. This makes it advisable to tilt the playing field by supporting and nurturing the “offspring”.

In response to Greger’s post, Casimir Artmann posted two excellent articles that further elaborated on this. In “Fail Fast During Adventures”, he noted that “There is a fine line between fail fast and Darwin Awards in IRL.” His point, preparation beforehand and being willing to abort during an experiment before failure is equivalent to suffering a fatality can be effective learning strategies. Lessons that you don’t live to apply aren’t worth much.

Casimir followed with “Fail is not an Option”, in which he stated:

I want the project to succeed, but I plan for things going wrong so that the consequences wouldn’t be to huge. Some risk are manageable, as walking alone, but not alone and off-trail. That’s to risky. If you doing outdoor adventures, you are probably more prepared and skilled than a ordinarie project member, and thats a huge benefit.

I guess the best advice, when doing completely new things with IT, is to start really small so that the majority of your business is not impacted if there is a failure. When something goes wrong, be sure that you could go back to safe place. Point of no return is like being on a sailing boot in the middle of the Atlantic where you can’t go back.

That’s excellent advice. “Fail Fast” has the advantage of being able to fit on a bumper sticker, but the longer, more nuanced version is more likely to serve you well.

OODA vs PDCA – What’s the Difference?


In my post “Architecture and OODA Loops – Fast is not Enough”, I stated that sense-making and decision-making were critical skills for the practice of software architecture. I further stated that I found the theories of John Boyd, particularly his OODA loop, useful in understanding and describing effective sense-making and decision-making. My conclusion was that in order to decide and act in the most effective manner possible, one must observe the context as effectively as possible and orient, or make sense of those observations, as effectively as possible. In other words, the quality of decision depended on the quality of the cognition.

One of the many things that I enjoy about blogging and engaging on media like Twitter and LinkedIn is the feedback. The questions and comments I get in response to one post ensure that I don’t have to worry about writer’s block getting in the way of my next one. In this instance, Greger Wikstrand obliged, providing the topic for this post:

As I replied to Greger, I was familiar with the PDCA cycle and it has some similarities to Boyd’s OODA loop, but I preferred OODA for a variety of reasons. Before I get into those reasons, however, it would be useful to lay out what distinguishes the two methods. In a guest post on the blog Slightly East of New, “PDCA vs. OODA — Why not take both?”, Deane Lenane does just this:

Some people who are familiar with the canon of both of these men’s work, often fall into the error of seeing the O-O-D-A loop as a function of the P-D-C-A loop or vice versa. I think this is a mistake.

The P-D-C-A cycle or loop is primarily an analytical approach that can be used with great success in a completely internal manner. One does not need to consult the external environment or adjust to unfolding circumstances to make the P-D-C-A loop work. P-D-C-A can be used with great success on the shop floor with the data that is available. Analysis which involves the use of a more or less complete data set to reach a conclusion. We use the data to make a decision about how to proceed, we than check and act to confirm or reject the hypothesis that our analysis has led us to.

O-O-D-A is more concerned with synthesizing an action out of an incomplete data set. Since we can never recognize all of the variables that we are forced to deal with in any environment, we must be able to make a decision that we believe will give us the highest probability for success. The synthesis of an action from the observation and orientation of a complex and mysterious environment, subject to frequent and unpredictable change, is the essence of the O-O-D-A loop.

My conclusion is that P-D-C-A is primarily involved with analysis perhaps using some synthesis and that O-O-D-A is primarily involved with synthesis using all of the analytical data points possible but considering that the data set will always be largely incomplete.

Three aspects of Lenane’s differentiation point me toward OODA: “environment” (i.e. the evaluation of the external context we must deal with rather than the more inward looking PDCA), “synthesizing an action”, and “incomplete data set”. Systems exist in an environment, an ecosystem with a back-story. Failing to account for those ensures problems if not outright failure. Likewise, our aim is to make a decision to the best of our ability under uncertainty. These concepts mesh nicely with the practice of architecture in my opinion.

Another aspect relates back to the PCA acronym in Greger’s tweet. That took some research for us to find a good resource, because:

However, Greger did find one (slide 3 of this deck). Essentially, it states that Perception of the world leads to Cognition which leads to Action that changes the world. Perception being the key word here. How we perceive reality is arguably more important than reality itself, because that perception will color our thinking that drives our action. The Orient portion of the OODA explicitly recognizes that our observations are filtered through our experience, education, biases, etc. Understanding and adjusting for this (to the extent we can) is important. As Tom Graves observed in “Enterprise-architect – applied-scientist, or alchemist?”:

Perhaps the most important thing here is to notice things that don’t fit our expectations – they’re often very easy to miss, especially given Gooch’s Paradox, that “things not only have to be seen to believed, but also have to be believed to be seen”.

Software architecture involves making decisions in the presence of uncertainty. In order to make the best decisions possible, we need to have the best possible grasp of our context (n.b. remembering that we are part of that context). We also need to remember that the context is not static. Each action (or inaction, for that matter) can lead to the emergence of some new issue. We can’t operate with a “one and done” philosophy.

[PDCA Loop diagram by Tagimaguiter via Wikimedia Commons]

First Do No Harm – the Practice of Software Development

Medieval Anatomy Illustration

Analogies are never perfect, but reading Erik Dietrich’s “Do Programmers Practice Computer Science?” brought one to mind. Software development has much in common with the practice of medicine. Software development, like medicine, involves the application of knowledge. Also like medicine, this application is made complex by considerations of context. Yet another commonality is that in both disciplines, there are (or, at least, should be, limits regarding experimentation).

Erik’s post used the following comparison of developers to electricians:

Let’s consider three actors in the realm of physics, as a science.

  1. A physicist, who runs electricity through things to see if they explode.
  2. An electrical engineer, who takes the knowledge of what explodes from the physicist and designs circuitry for houses.
  3. An electrician, who builds houses using the circuits designed by the electrical engineer.

I list these out to illustrate that there are layers of abstraction on top of actual science. Is an electrician a scientist, and does the electrician use science? Well, no, not really. His work isn’t advancing the cause of physics, even if he is indirectly using its principles.

Let’s do a quick exercise that might be a bit sobering when we think of “computer science.” We’ll consider another three actors.

  1. Discrete mathematician, looking to win herself a Fields medal for a polynomial time factoring algorithm.
  2. R&D programmer, taking the best factoring algorithms and turning them into RSA libraries.
  3. Line of business programmer, securing company’s Sharepoint against script kiddies uploading porn.

Programming is knowledge work and non-repetitive, so the comparison is unfair in some ways. But, nevertheless, what we do is a lot more like what an electrician does than what a scientist does. We’re not getting paid to run experiments — we’re getting paid to build things.

There is definitely some validity in this. The three roles in each example have many similarities. His observation that development work is “non-repetitive”, however, is key. Electricians work in a more certain context than doctors who may need to account for body chemistry or metabolism. Likewise, developers may find environmental factors (e.g. memory usage profile, network load, etc.) produce uncertainty in the course of their work. Whereas the plumbing and electrical systems in a house are mostly separate, biological systems and information systems tend to be more intertwined.

Another similarity between software development and the practice of medicine is the feedback loop. The physicist will never hear back from the electrician, but physicians doing research are not similarly removed from practitioners. Practice and theory in medicine have a chicken and egg relationship where neither is clearly dominant, but each influences the other. Likewise with software development. Ethics and practicality in both cases constrain pure research.

As Erik noted, developers are “…not getting paid to run experiments — we’re getting paid to build things”. That being said, the uncertainties mean that, like physicians, we can’t be positive about the exact outcome without trying a particular course of action (which isn’t really an experiment):

Like doctors, those involved in software development have an ethical obligation to let our “patients” know when we’re learning on the job and what the risks are (not to mention the obligation to try things that are in their best interests and not just something we want to test drive). In addition to considerations of professionalism, more open communication has its benefits. We can solve problems and advance the practice at the same time.