Back to the OODA – Making Design Decisions

OODA Loop Diagram

A few weeks back, in my post “Enterprise Architecture and the Business of IT”, I mentioned that I was finding myself drawn more and more toward Enterprise Architecture (EA) as a discipline, given its impact on my work as a software architect. Rather than a top-down approach, seeking to design an enterprise as a whole, I find myself backing into it from the bottom-up, seeking to ensure that the systems I design fit the context in which they will live. This involves understanding not only the technology, but also how it interacts, or not, with multiple social systems in order to (ideally) carry out the purpose of the enterprise.

Tom Graves is currently in the middle of a series on whole-enterprise architecture on his Tetradian blog. The third post of the series, “Towards a whole-enterprise architecture standard – 3: Method”, focuses on the need for a flexible design method:

But as we move towards whole-enterprise architecture, we need architecture-methods that can self-adapt for any scope, any scale, any level, any domains, any forms of implementation – and that’s a whole new ball-game…

He further states:

the methods we need for the kind of architecture-work we’re facing now and, even more, into the future, will need not only to work well with any scope, any scale and so on, but must have deep support for non-linearity built right into the core – yet in a way that fully supports formal rigour and discipline as well.

To begin answering the question “But where do we start?”, Tom looks at the Plan-Do-Check-Act (PDCA) cycle, which he finds wanting because:

… the reality is that PDCA doesn’t go anything like far enough for what we need. In particular, it doesn’t really touch all that much on the human aspects of change

This is a weakness that I mentioned in “OODA vs PDCA – What’s the Difference?”. PDCA, by starting with “Plan” and without any reference to context, is flawed in my opinion. Even if one argues that assessing the context is implied, leaving it to an implication fails to give it the prominence it deserves. In his post, Tom refers to ‘the squiggle’, a visualization of the design process:

Damien Newman's squiggle diagram

In an environment of uncertainty (which pretty much includes anything humans are even peripherally involved with), exploration of the context space will be required to understand the gross architecture of the problem. In reconciling the multiple contexts involved, challenges will emerge and will need to be integrated into the architecture of the solution as well. This fractal and iterative exploratory process is well represented by ‘the squiggle’ and, in my opinion, well described by the OODA (Observe, Orient, Decide, and Act) loop.

In “Architecture and OODA Loops – Fast is not Enough”, I discussed how the OODA loops maps to this kind of messy, multi-level process of sense-making and decision-making. The “and” is extremely important in that while decision-making with little or no sense-making may be quick, it’s less likely to effective due to the disconnect from reality. On the other hand, filtering and prioritizing (parts of the Orient component of the loop) is also needed to prevent analysis paralysis.

In my opinion, its recognition and handling of the tension between informed decision-making and quick decision-making makes OODA an excellent candidate for a design meta-method. It is heavily subjective, relying on context to guide appropriate response. That being said, an objective method that’s divorced from context imposes a false sense of simplicity with no real benefit (and very real risks).

Reality is messy; our design methodology should work with, not against that.

[OODA Loop diagram by Patrick Edwin Moran via Wikimedia Commons]

Advertisements

Microservices, Monoliths, and Conflicts to Resolve

Two tweets, opposites in position, and both have merit. Welcome to the wonderful world of architecture, where the only rule is that there is no rule that survives contact with reality.

Enhancing resilience via redundancy is a technique with a long pedigree. While microservices are a relatively recent and extreme example of this, they’re hardly groundbreaking in that respect. Caching, mirroring, load-balancing, etc. has been with us a long, long time. Redundancy is a path to high availability.

Centralization (as exemplified by monolithic systems) can be a useful technique for simplification, increasing data and behavioral integrity, and promoting cohesion. Like redundancy, it’s a system design technique older than automation. There was a reason that “all roads lead to Rome”. Centralization provides authoritativeness and, frequently, economies of scale.

The problem with both techniques is that neither comes without costs. Redundancy introduces complexity in order to support distributing changes between multiple points and reconciling conflicts. Centralization constrains access and can introduce a single point of failure. Getting the benefits without the incurring the costs remains a known issue.

The essence of architectural design is decision-making. Given that those decisions will involve costs as well as benefits, both must be taken into account to ensure that, on balance, the decision achieves its aims. Additionally, decisions must be evaluated in the greater context rather than in isolation. As Tom Graves is fond of saying “things work better when they work together, on purpose”.

This need for designs to not only be internally optimal, but also optimized for their ecosystem means that these, as well as other principles, transcend the boundaries between application architecture, enterprise IT architecture, and even enterprise architecture. The effectiveness of this fractal architecture of systems of systems (both automated and human) is a direct result of the appropriateness of the decisions made across the full range of the organization to the contexts in play.

Since there is no one context, no rule can suffice. The answer we’re looking for is neither “microservice” nor “monolith” (or any other one tactic or technique), but fit to purpose for our context.

When Reality Gets in the Way – Applying Systems Thinking to Design

It’s easy to sympathize with this:

It’s also more than a little dangerous if our desire for simplicity moves us to act as if reality isn’t as complex as it is. Take, for example, a recent tweet from John Allspaw about over-simplification:

My observation in return:

As I noted in my previous post, it’s part of human nature to gravitate towards easy answers. We are conditioned to try to impose rules on reality, even when those rules are mistaken. Sometimes this is the result of treating symptoms in an ad hoc manner, as evidenced by this recent twitter exchange:

This goes by the name of the “balloon effect”, pressure on one area of the problem just pushes it into another in the way that squeezing a balloon displaces the air inside.

Sometimes our response is born of bias. In sociology, for example, this phenomenon has its own name: “normative sociology”:

The whole “normative sociology” concept has its origins in a joke that Robert Nozick made, in Anarchy, State and Utopia, where he claimed, in an offhand way, that “Normative sociology, the study of what the causes of problems ought to be, greatly fascinates us all”(247). Despite the casual manner in which he made the remark, the observation is an astute one. Often when we study social problems, there is an almost irresistible temptation to study what we would like the cause of those problems to be (for whatever reason), to the neglect of the actual causes. When this goes uncorrected, you can get the phenomenon of “politically correct” explanations for various social problems – where there’s no hard evidence that A actually causes B, but where people, for one reason or another, think that A ought to be the explanation for B.

Some historians likewise have a tendency to over-simplify, fixating on aspects that “ought to be” rather than determining what is (which is another way of saying what can be reasonably defended).

Decision-making is the essence of design. Thought processes that poorly match reality, whether due to bias or insufficient analysis or both, are unlikely to yield optimal results. Systems thinking, “…viewing ‘problems’ as parts of an overall system, rather than reacting to specific parts, outcomes or events, and thereby potentially contributing to further development of unintended consequences”, is an approach more likely to achieve a successful outcome.

When the end result will be a software system integrated into a social system (i.e. a system that is a component of an ecosystem), it makes sense to understand the problem space as the as-is system to be remediated. This holds true whether that as-is system is an automated one or not. While it is not feasible to minutely analyze the problem space, much less design in detail the solution, failing to appreciate the full context on a high level presents risks. These risks include not only those inherent in satisfying the needs of the overlooked context(s), but also those challenges that emerge from the interactions of the various contexts that make up the problem space.

Deciding on a particular design direction is, obviously, a decision. Deferring that determination is, likewise, a decision. Refusing to make a definite decision is a decision as well. The answer is not to push all decisions off to as late a date as possible, but to make decisions in the moment that are defensible given the information at hand. Looking at the problem space as a whole in the context of its ecosystem provides the perspective required to make the optimal decision.

Law of Unintended Consequences – Security Edition

Bank Vault

More isn’t always better. When it comes to security, more can even be worse.

As the use of encryption has increased, management of encryption keys has emerged as a pain point for many organizations. The amount of encrypted data passing through corporate firewalls, which has doubled over the last year, poses a severe challenge to security professionals responsible for protecting corporate data. The mechanism that’s intended to protect information in transit does so regardless of whether the transmission is legitimate or not.

Greater complexity, which means greater inconvenience, can lead to decreased security. Usability increases security by increasing compliance. Alarm fatigue means that as the number of warnings increase, so does the likelihood of their being ignored

Like any design issue, security should be approached from a systems thinking viewpoint (at least in my opinion). Rather than a one-dimensional, naive approach, a holistic one that recognizes and deals with the interrelationships is more likely to get it right. Thinking solely in terms of actions while ignoring the reactions that result from them hampers effective decision-making.

To be effective, security should be comprehensive, coordinated, collaborative, and contextual.

Comprehensive security is security that involves the entire range of security concerns: application, network, platform (OS, etc.), and physical. Strength in one or more of these areas means little if only one of the others is fatally compromised. Coordination of the efforts of those responsible for these aspects is essential to ensure that the various security enhance rather than hinder security. This coordination is better achieved via a collaborative process that reconciles the costs and benefits systemically than a prescriptive one imposed without regard to those factors. Lastly, practices should be tailored to the context of the problem at hand. Value at risk and amount of exposure are two factors that should help determine the effort expended. Putting a bank vault door on the garden shed not only wastes money, but also hinders security by taking those resources away from an area of greater need.

As with most quality of service concerns, security is not a binary toggle but a continuum. Matching the response to the need is a good way to stay on the right side of the law of unintended consequences.

Of Blind Men and Elephants and Excessive Certainty

Blind men and the elephant

There’s an old poem about six blind men and an elephant, where each in turn declare that an elephant is like a wall, a spear, a snake, a tree, a fan, and a rope. Each accurately described what he was able to discern from his own limited point of view, yet all were wrong about the subject as a whole. As the poet noted:

Moral:

So oft in theologic wars,
The disputants, I ween,
Rail on in utter ignorance
Of what each other mean,
And prate about an Elephant
Not one of them has seen!

Sometimes our attitudes color our perception of others:

Management is often the butt of our disdain, expressed in cartoon form:

However, as Sandro Mancuso related in “Not all managers are stupid”:

I still remember the day when our managers in a large organisation told us we should still go live after we reported a major problem a couple of months before the deadline…There was a problem in a couple of unfinished flows, which would cause hundreds of thousands of trades to be misreported to the regulators. After we explained the situation, managers told us to work harder go ahead with the release anyway.

How could they tell us to go live in a situation like that? They should all be fired. Arrested. How could they ask us to drop the quality and go live with a known problem of that size?…

More than once we made it clear that focusing our time on getting the system ready to production would not gives us any time to finish the automation for the problematic flows and thousands of trades would be misreported. But they did not listen. Or so we thought.

After a few meetings with the business, we discovered a few things. They were not being irresponsible or stupid, as we developers thought. The deadline was set by the regulators and could not be moved. The cost of not reporting the trades was far higher than misreporting them. Not reporting the trades would not only be followed by heavy fines, but also by possible reputation damage. Companies would have extra time to correct any misreported trades before being fined.

For us, in the development team, it was the first time we realised that going live with a few known issues would be better than not going live at all.

Designing the architecture of a solution, at its core, is an exercise in decision-making. Whether the system in question is a software system or a human system, effective decision-making must be preceded by sense-making to identify the architecture of the problem. Contexts need to be identified in order to be synthesized into the architecture of the problem.

Bias, being too certain of our understanding to make the effort to validate it, is a good way to miss out on what’s in front of us. Failing to recognize our potential for bias makes it harder to overcome that bias. That failure restricts our ability to appreciate the full range of contexts to be synthesized and puts us in the same position as the blind men with the elephant.

It’s extremely difficult to solve a problem you don’t understand.

Architecture in Context – Part 2

By Charlie Alfred and Gene Hughson

Up until this point, we’ve discussed what it means for Architecture to have context.  Contexts enable us to reason about the behavior of a group of stakeholders, whether they be buyers, end users, support staff, distributors, manufacturers or suppliers.

As we’ve pointed out, virtually all products or services are multi-context.  This means that the architecture of these products or services, if they expect to be effective, must also be multi-context.  A multi-context architecture is a lot like juggling.  You must balance your attention across an array of concerns and try to satisfy an array of stakeholders.  The following sidebar illustrates this point:

In the early 1960’s, the United States found itself engaged in a space race with the USSR.  Government officials in both countries  believed strongly that the first country to travel successfully in space and land men on the moon would have a significant military advantage.

President Kennedy collaborated with NASA to initiate a space program designed to put men on the moon’s surface and return them safely to earth, and achieve this by the end of the 1960’s.  On July 20, 1969, the Apollo 11 mission was successful at landing Neil Armstrong and Buzz Aldrin on the moon, and four days later, on July 24th, 1969, the crew safely splashed down in  the Pacific Ocean, where they were brought to safety by the USS Hornet.

The Apollo 11 mission is a high point of 20th Century US history for many.  It also does an excellent job of highlighting several concepts related to Architecture in Context.

  • Contexts:
    • U.S. Executive Branch,
    • U.S. Legislative Branch,
    • NASA astronauts
    • NASA scientists and engineers
    • U.S. Taxpayers, etc.
  • Value Expectations:
    • Land astronauts safely on the moon during 1960’s
    • Return the astronauts to earth safely
  • Pain Points:
    • Ceding space to the USSR would put the US Military at a strategic disadvantage
  • Priorities:
    • US President had more clout in this mission than Legislative Branch and NASA
    • Return the astronauts safely to earth is higher priority than landing on the moon
    • Landing on the moon is more important for NASA and US Executive Branch than for US Taxpayers as a whole
  • Challenges:
    • Difficulty of lunar landing complicated by moon’s craters and hills
    • Difficulty of space capsule reentry is driven by speed, atmospheric friction, and earth’s orbit (i.e.land in water, close enough to battleship)
    • Size and design of the space capsule depends on the power of the booster rockets that are needed to launch the capsule beyond the earth’s gravitational pull

[http://en.wikipedia.org/wiki/List_of_Apollo_missions]

Due to space constraints (pun intended), this sidebar merely scratches the surface of the interesting aspects of the Apollo 11 mission.  Another critical point is that the Apollo and Gemini missions that preceded Apollo 11 demonstrated feasibility of technical approaches to address important challenges.  For example, Apollo 7 was the first manned mission to orbit the earth and splashdown safely, and Apollo 8 was the first manned mission to orbit the moon and return safely.  It is worth noting how NASA recognized how difficult and important the reentry challenge was, and how early mission decisions were made to overcome it.

Balancing importance, difficulty and centrality (cross-dependency among challenges) can be a daunting problem.  What best practices exist to help you solve it?

Best Practice 1:  Identify and understand your key contexts.

Understanding your contexts means identifying them on the basis of similar goals, priorities, external forces, and pain points.  When doing this, be sure to focus on  the “why” questions:

  • Why do stakeholder groups A and B have the same pain point?
  • Do they have the same priority for relieving this pain point, or is this a much higher priority for one group vs. the other?
  • Is the pain point caused by the same external forces?
  • Is the pain point obstructing the same goals, or different goals?

Questions like these will enable you to cleanse your context notions so that the behaviors and external forces in each context are as consistent as possible.

Best Practice 2:  Understand your key contexts as soon as possible

One tendency in multi-context system development is to leave future contexts as some other day’s problem.  The tendency is to focus on the stakeholders for the first release as the only ones who matter:

  • We won’t be marketing to those groups (countries) for years, why waste time thinking about them now?
  • Those stakeholders are so new, they won’t know what they need.
  • Why spend all the money and  effort interviewing or doing other forms of research when their needs are likely to change?

These objections are based on the perception that identifying and understanding a context requires a lot of effort.  In reality, it requires much less than it appears.  A context is a generalization of behavior and the quality of this generalization can vary with how imminently it is needed.

I realize this sounds like it violates the YAGNI (“you ain’t going to need it”) principle, but it doesn’t.  There is a big difference between anticipating a future context and doing a little defensive design to accommodate its variations than there is of building in full support for it.  The first case is one of defining good interfaces to encapsulate variation.  The other is an implementation and testing burden.   Additionally, it is better to be aware of potential conflicts between contexts before resolving them involves significant design and code changes.

Best Practice 3:  Understand the challenges in satisfying each context’s pain points

Pain points and challenges are similar, and may be confused.  Challenges are concerns of the solution provider.  A challenge is one or more forces that must be overcome to provide value.
A pain point is similar, however it is linked to stakeholders within a context.  This difference shows up in goals, priorities and external factors are considered important.  For example, a pain point might be the need to keep trade secrets confidential from hackers, while the challenge might deal with specific types of cyber threats.

As the solution architect, we derive challenges from pain points in contexts.   Contexts are important here, as similar pain points may exist in related contexts, but with different goals, priorities, or even external challenges.  For example, within an investment management firm, response time needs differ for portfolio managers, traders and compliance officers.

Challenges are framed as problem statements — specific issues that the solution provider must overcome in order to provide value.  It is important to keep the relationships between contexts, pain points and challenges.  In general, contexts have many-to-many relationships to contexts and to pain points.  In other words, there could be several challenges for certain pain points.  Some challenges may be quite similar across several pain points, potentially spanning many contexts.

Best Practice 4:  Carefully consider the nature of challenges when combining over contexts

Challenges are like chemical elements.  Each has its own structure and properties.  However, in a system (especially a multi-context system), challenges combine and form molecules.  Molecules have their own chemical properties, aside from their constituent elements.  For example, Carbon and Oxygen independently are staples of life, while their combination, Carbon Monoxide, is an odorless, tasteless, and most notably, deadly gas.

As mentioned above, challenges take a common form:

  • How does the challenge create or detract from value expectations in a context?
  • Which external forces cause the challenge’s impact to be magnified or compounded

These two properties make it relatively easy to look at a challenge as a chemical element, or combine it with other challenges to view it as a  molecule.  Here are a few ways to characterize challenges that can be useful in examining their impact:

Some useful dimensions for categorizing challenges are:

Compatibility (of two challenges)

  • Compatible – independent problems, no issues solving simultaneously
  • Friction – partially dependent problems, some tradeoffs and/or risks
  • Antagonistic – highly dependent, serious tradeoffs and risks

Breadth (of a challenge)

  • Pervasive – scope of challenges reaches throughout solution
  • Regional – scope of challenges pervades an area but encapsulated
  • Local – scope of challenges limited to a narrow area

Occurrence (of a challenge)

  • Persistent – challenge occurs all or most of the time
  • Intermittent – challenge occurs with a predictable frequency
  • Conditional – challenge occurs in certain situations or conditions

These dimensions and categories can be useful for determining how to manage or aggregate challenges:

  • Within a context, as in how the challenge interacts and combines with others
  • Across context, as in whether similar challenges in two or more contexts are sufficiently alike to merge, or whether they combine into a more significant one

Best Practice 5: Prioritize challenges to identify the most important ones to address
 
Software architecture, just as software engineering, is a discipline of deciding among alternative approaches to reach a decision.  While some decisions get made in small sets (e.g. this system will be 3-Tier, .NET, IIS with a SqlServer database), many, if not most, decisions are made independently.

One important thing to remember is that virtually every decision made reduces the degrees of freedom for the rest.  Sometimes this reduction is desirable, but many times it paints the subsequent decisions into a corner.

If the goal of a software architect is to make good decisions for the system, it makes sense that s/he should address the highest priority challenges first (while the degrees of freedom are more available).  But how should challenges be prioritized?  Three criteria need to be balanced:

  1. Importance:
    • How many contexts does the challenge affect?
    • How much of an  impact does the challenge make on each one?
    • What is the weighted average of this impact, given the relative importance of each context?
  2. Difficulty:
    • The more difficult a challenge, the more degrees of freedom it is likely to need
    • The more difficult a challenge, the more degrees of freedom it will consume
    • The more degrees a challenge will consume, the earlier it should be tackled.
  3. Centrality (also called Core):
    • Challenges are also dependent on the solutions to other challenges
    • The dependent challenge should be addressed before or concurrently with the challenges that depend on them

Architectural design takes place in an environment of constraints.  Constraint should be understood as a neutral concept, because it both prevents and enables.  The same cup that constrains the flow of water also enables you to bring that water to your lips.  Managed appropriately, constraints provide the structure and form that yield the desired function(s).  Part of that management is understanding that design decisions are constraints.  Decisions made in isolation risk inappropriately constraining a design in terms of the whole.

Remediation of architectural constraints is, by its very nature, expensive.  Rewiring is more involved than repainting; replacing a foundation is far more extensive still.  Understanding and accounting for all of the contexts involved allows you to see the architecture of the problem as a whole.  The architecture of the problem then becomes the skeleton upon which the architecture of the solution can be built, incrementally, iteratively, and most important, effectively.

Architecture in Context – Part 1

By Charlie Alfred and Gene Hughson

It’s a common occurrence on online forums to see someone ask what architectural style is the right one. Likewise, it’s common to see a reply of “it depends” because “context is king”. Many will nod sagely at this; of course it is, context is king. But, what is context? Specifically, what do we mean by the term “context” in terms of software and solution architecture?

At a high level, context defines the problem environment for which a solution is intended to provide value. That definition, however, is far too superficial to suffice, because only the most trivial of systems will have a single context. Refining our definition, we can state that a context is a grouping of stakeholders sharing similar goals, priorities, and perceptions. In practice, unless you’re dealing with a system that you are developing yourself for your sole use, you will be dealing with multiple contexts, and they are likely to overlap like the Olympic logo. If you are puzzled by this statement, consider silverware. We need knives, forks and spoons and we have different types of each of them.

Identifying contexts and understanding how they interrelate gives definition to the architecture of the problem, which is a necessary prerequisite to designing the architecture of a solution (note the use of “a solution” instead of “the solution” – it’s intentional). This involves considerable work to identify the goals, priorities, and perceptions in that the natural tendency is for a stakeholder to ask for what they think will solve their problem rather than enumerate their problems (which is the point of the apocryphal “faster horses” quote frequently attributed to Henry Ford). Nonetheless, breaking out those raw needs is crucial to success.

Where you have multiple contexts, chances are the goals, priorities, and perceptions diverge rather than complement. Take, for example, a pickup truck. One context may be those who haul bulky and/or heavy goods daily. Another context may be made up of those who frequently tow trailers. A third context might be those who haul or tow from time to time, but mainly use the truck for transportation. The value of any given feature of the truck: miles per gallon of fuel, size of the bed, whether or not the bed is covered, quality of the sound system, torque, etc. will vary based on the context we’re considering. Additionally, external forces (such as the laws of physics) will come into play and complicate making design decisions. For example, long, open truck beds maximize hauling capacity, but will also harm fuel efficiency.

Another complicating factor is the presence of peripheral stakeholders, those whose contexts will affect the direct stakeholders. In the sense of our pickup truck example, we can consider mechanics as an example of this. Trucks that have more room in the engine compartment are easier (therefore cheaper) to service but will suffer in terms of fuel efficiency due to increased size and weight. In the IT realm, development and support teams would fall into this category. Although their contexts would not be primary drivers, ignoring them could well impose a cost on the direct stakeholders in terms of turnaround time for enhancements and fixes.

Examples of multi-context software systems can be found many places. The more generic a system, the more contexts it will have. Commodity office software (word processors, spreadsheets, etc.) are a prime example. Consider Microsoft Excel, which is a simple row/column tabulator, a statistical analysis tool, a database client, and, with its macro language, is an application development platform. Stretching to cover this many needs turns the Excel User Experience into a game of Twister. Platforms, such as Android are another example. Highly configurable applications like Salesforce.com (which has arguably crossed the line between product and platform) span multiple contexts as well. In the enterprise IT space, first SOA and now microservice architectures are nothing if not an attempt to address the plethora of contexts in play via separation of concerns.

A holistic consideration of the contexts at hand is an important success factor when evolving a system iteratively and incrementally. Without that consideration, decisions can be made that are conducive to one context but antagonistic to another. Deferring the reconciliation of the two contextual conflicts until later may result in considerable architectural refactoring. This refactoring will involve time and expense at a minimum, and if time is constrained, the likelihood of incurring technical debt is high. Each additional conflict that is blundered into due to the lack of up front consideration increases the risk to the system. This is not to endorse a detailed Big Design Up Front (BDUF), but a recognition that problem awareness and problem avoidance is cheaper than rework. If the context has been identified, then it’s YAGNI – not “You Ain’t Gonna Need It” but “You Are Gonna Need It”.

The next post will in this series will address concrete practices to integrate multiple contexts into a unified architecture of the problem that can serve as the foundation for a coherent architecture of a solution.