Beware Premature Certainty – Embracing Ambiguous Requirements

You have to measure it to manage it!

Many see ambiguity as antithetical to software requirements. However, as Ruth Malan has observed:

Pick your battles with ambiguity careful. She is a wily foe. Not to be dominated. Rather invited to reveal the clarities we can act on. Make decisions with the understanding that we need to watchful, for our assumptions will, sooner or later, become again tenuous in that fog of ambiguity and uncertainty that change churns up.

Premature certainty, locking in too soon to a particular quality of service metric or approach to a functional requirement can cause as many problems as requirements that are too vague. For quality of service, also known as “non-functional”, requirements, this manifests as metrics that lack a basis in reality, metrics that do not account for differing circumstances, and/or metrics that fail to align with their objective. For functional requirements, premature certainty presents as designs masquerading as requirements (specifying “how” instead of “what”) and/or contradictory requirements. For both functional and quality of service requirements, this turns what should be an aid to understanding into an impediment, as well as a source of conflict.

Quality of service requirements are particularly susceptible to this problem. In order to manage quality, metrics must be defined to measure against. Unfortunately, validity of those metrics is not always a priority. Numbers may be pulled from thin air, or worse, marketing materials. Without an understanding of the costs, five 9s availability looks like a no-brainer.

Committing to a metric without understanding the qualitative value of that measure is risky. A three-second response time sounds quick, but will it feel quick? Is it reasonable given the work to be performed? Is it reasonable across the range of environmental conditions that can be expected? How exactly does one measure maintainability? As Tom Graves, in “Metrics for qualitative requirements” noted:

To put it at perhaps its simplest, there’s a qualitative difference between quantitative-requirements and qualitative ones: and the latter cannot and must not be reduced solely to some form of quantitative metric, else the quality that makes it ‘qualitative’ will itself be lost.

In another post, “Requisite Fuzziness”, Tom points out that measures are proxies for qualities, not the qualities themselves. A naive approach can fail to meet the intended objective: “The reality is that whilst vehicle-speed can be measured quite easily, often to a high degree of precision, ‘safe speed’ is highly-variable and highly-contextual.” This context-bound nature begets a concept he refers to as “requisite fuzziness”:

It’s sort-of related to probability, or uncertainty, but it’s not quite the same: more an indicator of how much we need to take that uncertainty into account in system-designs and system-governance. If there’s low requisite-fuzziness in the context, we can use simple metrics and true/false rules to guide decision-making for that context; but if there’s high requisite-fuzziness, any metrics must be interpreted solely as guidance, not as mandatory ‘rules’.

The benefits of requisite fuzziness for functional requirements is somewhat counter-intuitive. Most would argue against ambiguity, as indeed Jeff Sutherland did in “Enabling Specifications: The Key to Building Agile Systems”:

The lawyers pointed out that a patent application is an “enabling specification.” This is a legal term that describes a document that allows the average person knowledgeable in the domain to create the feature without having any discussion with the originators of the enabling specification.

In general, requirements are NOT enabling specifications. On a recent project at a large global company we discovered that hundreds of pages of requirements were not enabling specifications. On the average 60% of what was in the documents was useless to developers. It caused estimates to double in size. Even worse, 10% of what was needed by developers to implement the software was not in the requirements.

The enabling specifications used at PatientKeeper provided a global description of a feature set framed as lightweight user stories with screen shots, business logic, and data structures. The enabling specification was used to generate user stories which then formed the product backlog. The global feature description was updated regularly by the Product Owner team and was a reference to the state of the system that allowed developers to see where the user stories in the product backlog came from.

A user story needes to be a mini-enabling specification for agile teams to operate at peak performance. If it is not, there will be the need for continued dialogue with the Product Owner during the sprint to figure out what the story means. This will reduce story process efficiency and cripple velocity.

While this level of detail may well enable a team to develop efficiently, it may cripple the team’s ability to develop effectively. Given the interdependent relationship between architecture and requirements, overly prescriptive requirements can introduce risk. When design elements (such as “data structures” above) find their way into requirements, it may be that the requirement cannot be implemented as specified within the constraints of the architecture. Without the dialog Jeff referred to, either the requirement will be ignored or the integrity of the architecture will be violated. Neither of these options are advisable.

Another danger inherent in this situation is that of solving the wrong problem. This was addressed by Charlie Alfred in “Invisible Requirements”:

Coming to solutions (requirements) too quickly often times overlooks potentially more beneficial solutions. To illustrate this, consider the Jefferson Memorial.

Several years ago, excessive erosion of the Jefferson Memorial was noticed. A brief investigation identified excessive cleaning as the cause. Since the memorial must be kept clean, more investigation was necessary. Bird droppings were identified as the culprit, so actions were taken to have fewer birds around.Eventually, however, someone asked why the birds were such a problem with the Jefferson Memorial and not the others. Another study determined that the birds frequented the memorial, not for love of Jefferson, but for love of the many tasty spiders that made their home there. Probing further, the spiders were thriving because the insect population was proliferating.Finally, understanding that the insects were attracted by the memorial lights at dusk and dawn identified the ultimate solution. Turn off the lights. Initial solutions driven by quick decisions by memorial managers (i.e. powerful stakeholders) provided expensive ill-suited solutions for an ill-understood problem. The root cause and final solution requirement were well hidden, only brought to light by extensive and time consuming trial and error solutions. Each required solution inappropriately framed the problem, missing the associated hidden causes and final necessary requirement.

“Expensive”, coupled with “extensive and time consuming”, should give pause, particularly when used to describe failures. Naively implementing a set of requirements without technical analysis may well harm the customer. In “Changing Requirements: You Have a Role to Play!”, Raja Bavani noted:

You have to understand what business analysts or product owners provide you. You have to ask questions as early as you can. You have to think in terms of test scenarios and test data. You have to validate your thoughts and assumptions whenever you are in doubt. You have to think about related user stories and conflicting requirements. Instead of doing all these, if you are going to remain a passive consumer of the inputs received from business analysts or product owners, I am sure you are seeding bug issues.

While we don’t want requirements that are deliberately undecipherable, neither can we expect requirements that are both fully developed and cohesive with the architecture as a whole. Rather, we should hope for something like what Robert Galen suggested, that “communicates…goals while leaving the flexibility for my architect to do their job”. They should be, according to J. B. Rainsberger, a ticket to a conversation.

Lisa Crispin captured the reason for this conversation in “Helping the Customer Stick to the Purpose of a User Story”:

Make sure you understand the *purpose* of a user story or feature. Start with the “why”. You can worry later about the “how”. The customers get to decide on the business value to be delivered. They generally aren’t qualified to dictate the technical implementation of that functionality. We, the technical team, get to decide the best way to deliver the desired feature through the software. Always ask about the business problem to be solved. Sometimes, it’s possible to implement a “solution” that doesn’t really solve the problem.

Likewise, Roman Pichler observed:

If I say, for instance, that booking a training course on our website should be quick, then that’s a first step towards describing the attribute. But it would be too vague to characterise the desired user experience, to help the development team make the right architecture choices, and to validate the constraint. I will hence have to iterate over it, which is best done together with the development team.

Rather than passively implementing specifications and hoping that a coherent architecture “emerges”, iteratively refining requirements with those responsible for the architecture stacks the deck in favor of success by surfing ambiguity:

“When you ask a question, what two word answer distinguishes the architect?” And they didn’t miss a beat, answering “It depends” in an instant. “That’s my litmus test for architects.” I told them. “So, how do I tell a good architect?”…I said “They tell you what it depends on.” Yes, “it depends” is a hat tip to multiple simultaneous possibilities and even truths, which is the hallmark of ambiguity (of the kind that shuts down those who are uncomfortable with it). The good architect can sense what the dependencies are, and figure what to resolve and what to live with, to make progress.

Ruth Malan, A Trace in the Sand


Selling SOA

(Mirrored from the Iasa Blog)

Just what the doctor ordered?  More snake oil?

In his recent post “No one wants SOA”, Kevin Orbaker identifies one of the issues with infrastructure in general and Service-Oriented Architecture (SOA) in particular:

SOA is like plumbing in a home/office. No one thinks about plumbing. No one cares about plumbing. They just want water to come out of the faucet when they turn the handle. The faucet is like the UI of your application. It is the final output from which the user interacts with. They see it. They touch it. They interact with it. They don’t know where the water comes from. They don’t really care. They just expect it to work.

To make matters worse, SOA had the misfortune to be one of technology’s silver bullets. Sellers and buyers alike gleefully conflated “can connect” with “will connect”, indulging in an expensive bacchanalia of “build it and they will come”. Broken promises and the Great Recession combined to end the reign of SOA as “the answer”. Yet, as Anne Thomas Manes noted in “SOA is Dead; Long Live Services”, services remain a viable strategy for creating systems of systems.

So, given the negatives of a relatively high start up cost and a history of disappointments, how do you justify a service-oriented solution?

Solutions, ideally, solve problems. This may seem trite, but the “solution in search of a problem” is an all too common occurrence. Letting the problem space justify the solution is key:

  • Do you have one or more systems needing to communicate with multiple other systems?
  • Do your systems have multiple implementations of the same type of integration (e.g. several flavors of “Submit Order”)?
  • Do outages in one system degrade others?
  • Do changes in one system break others?
  • Do integrations force parallel development efforts and releases across more than one team?

Each “yes” answer above furthers the case for investing in the infrastructure and development needed for a service-oriented architecture, provided you can communicate the benefits. Those benefits should be expressed in business terms – removing duplicate integrations isn’t a business benefit, but saving development time for features instead of plumbing is; implementing an enterprise service bus will likely mean nothing, but being able to add new integrations with little or no delay will mean a lot; “store and forward message-oriented middleware” will yield glassy stares, but the analogy of email versus a phone call should make the scalability and response time benefits clear. Being able to quantify those benefits (for example, providing hours spent on developing and maintaining duplicate integrations) enhances the case.

When making the case for a “plumbing” change, the benefits are a big part of the picture, but not the whole picture. Assessing and communicating the costs is critical to credibility. Although it might seem counter-intuitive, a higher up front price is less a problem than unexpected costs that show up later, particularly when those costs could have been predicted. Accounting for the costs beyond hardware and licenses (e.g. training of developers, training of administrators, etc.) that can be reasonably expected demonstrates thoroughness.

Another common obstacle to infrastructure investments is a variation of the tragedy of the commons. Business owners of individual systems may well balk at the costs involved, even if the benefits are tangible. If multiple systems will benefit, it’s understandable that one owner may be unwilling to subsidize the rest. Joint sponsorship by those multiple owners may be appropriate. If the pool of potential beneficiaries is sufficiently broad, another alternative may be to solicit the sponsorship of a higher level of the organization. Strategic initiatives are generally best championed by stakeholders with a strategic focus.

Once the facts have been marshaled, they must be presented. Alan Inglis, in “What story does your architecture tell?”, provides an excellent template for “selling” a solution (based on Nigel Watt’s “How to Structure A Story: The Eight-Point Arc”):

  1. Stasis – The As-Is state.
  2. Trigger – Why now? What is the trigger event that means that we should act now? What is the imperative to change at this point?
  3. The quest – These are the problems with the current state that need to be addressed. What is our motivation to change?
  4. Surprise – This is where we determine the goals to be met by the change. We may wish to solve the problems or we may wish to go further.
  5. Critical choice – The key decisions that need to be made to achieve the goals, or to compromise on the goals. This is where we define our options for change, the change impacts and the decision principles that we will apply in making a decision.
  6. Climax – This is the big decision where we commit to a way forward with the resources necessary to get there and a change owner to drive the change through.
  7. Reversal – We reverse the problems and create an improved future state through a roadmap of coherent actions.
  8. Resolution – The To-Be state.

Alan’s framework provides a very powerful tool to present a solution in a complete, concise, and coherent manner. The narrative of an architecture illustrates it’s value to the stakeholders. They may not want a service-oriented architecture, but they may well want the benefits it brings. Telling the story of that value is the job of an architect.

Architecture as Narrative

(Mirrored from the Iasa Blog)


Architectural design as a form of storytelling is an established theme. Ruth Malan’s January “A Trace in the Sand” highlighted a quote from “The 22 rules of storytelling, according to Pixar” that reinforces the analogy:

#11: Putting it on paper lets you start fixing it. If it stays in your head, a perfect idea, you’ll never share it with anyone.

Getting a design out into the light of day where it can be polished via collaboration and critique is something I definitely value. I decided to see which other Pixar rules spoke to me as well.

#2: You gotta keep in mind what’s interesting to you as an audience, not what’s fun to do as a writer. They can be v. different.

If the technology is driving the architecture rather than the stakeholders’ needs, you have a problem.

#3: Trying for theme is important, but you won’t see what the story is actually about til you’re at the end of it. Now rewrite.

You can shape your solution to the problem, or you can try to impose your solution on the problem. The problem will most likely be fluid, so how robust will a rigid solution be?

#5: Simplify. Focus. Combine characters. Hop over detours. You’ll feel like you’re losing valuable stuff but it sets you free.

Good design is as much about editing as it is about composition. There will be plenty of complexity inherent in the domain, so you don’t need to add your own.

#7: Come up with your ending before you figure out your middle. Seriously. Endings are hard, get yours working up front.

It’s tempting to start solving the problem right away, but that can lead to dead ends. You have to know where you want to go before you pick your route to get there. Just remember that today’s ending will be tomorrow’s beginning.

#8: Finish your story, let go even if it’s not perfect. In an ideal world you have both, but move on. Do better next time.

Your purpose is to deliver value, which can’t happen until there’s an actual delivery. Perfection that never sees the light of day is worthless, and today’s perfection is tomorrow’s “not quite”.

#22: What’s the essence of your story? Most economical telling of it? If you know that, you can build out from there.

Simplicity and focus (#5) is important enough to repeat.

The narrative of an architecture should be an epic, moving from birth to maturity and ultimately to closure. Like Scheherazade, it’s important that our stories do not end too soon. Keeping the principles above in mind can help.

Applications as Platforms – Supporting the Enterprise

(Mirrored from the Iasa Blog)

Carrying the weight of the world

You have a limited number of resources that can work on any endeavor. Call it headcount. People who are excellent at creating user experiences, working with design, and polishing pixels are almost never excellent at building easy to use, scalable, and flexible APIs. And of course the reverse is even more true: plumbers can’t paint worth beans. If you choose to be both an app and a platform you will have half the great plumbers and half the great painters you’d have otherwise.

Charlie Kindel, in his post “Be Either an App or a Platform, Not Both”, points out the difficulties of trying to be both an application and a platform, much of which comes down to the difficulties of designing a platform, period. As he notes in the quote above, trying to satisfy both objectives dilutes your focus. Additionally, once your system is supporting external clients (where external is defined as any client that isn’t built and deployed contemporaneously with your system, i.e. external apps can be built by your co-workers), you now have additional constraints on your ability to change. As he states in another post, “Don’t Build APIs”:

  • Principle 1: All successful APIs will be abused in ways you cannot anticipate.
  • Principle 2: You will support a successful API forever.

In many cases, I would have to agree that choosing between the application path and the platform path is good advice. As is so often the case, however, I can think of an exception – enterprise applications. Many of the issues facing corporate IT departments stem from an isolationist architectural mindset and the attempt to graft interoperability onto those systems after the fact: data islands, inconsistent duplicate data, poor integration, and functionality gaps, not to mention the costs involved in attempting to resolve these myriad issues. This is not to say that these systems are immune to the agility and focus issues Charlie Kindel warned of, just that those dangers are far outweighed by the costs imposed by poorly connected applications in an enterprise environment that are simultaneously redundant and lacking in key capabilities.

Roger Sessions has addressed this at the macro level in a recent series of posts outlining his Snowman Architecture concept (Overview, Economic Benefits, Technical Benefits). In a nutshell, this architecture partitions the application and information architectures of an enterprise along the lines of the business architecture to create capability packages. Each of these packages (the snowmen) incorporates an integral services architecture to provide interoperability, as with any service-oriented architecture. The difference, in my opinion, is that partitioning according to business capabilities should highlight both accidental redundancy and gaps in service as well as ensuring coherent packages.

Whether or not an organization has achieved, or even embraced this type of enterprise-level IT architecture, an application’s architecture will determine how quickly it can adapt to changes in the business environment. Using layers to provide horizontal partitions is, in my opinion, a good strategy to promote flexibility. Combining layers with vertical partitioning by high-level concern, even if only logically rather than physically, will yield designs that are better able to be refactored as needed. I favor using message-oriented designs over which a service facade can be added if and when needed. Ideally, each increment of complexity that’s added should only occur in order to enable additional capability and should carry more benefit than risk.

As was noted above, supporting external clients poses greater challenges. Once an API is available, changes should be handled via strict versioning in order to avoid taking down its consumers. Attempting to coordinate synchronized releases (required if you make a breaking change to an existing service rather than adding a new one) is begging for problems, particularly if the clients are external to your organization. Coupling a strict versioning strategy with the Canonical Data Model pattern enables making changes internal to your application without disrupting external clients.

Designing applications that can also serve as components in an enterprise platform is doubtless more complex than designing a standalone one. The relevant question, however, is not whether complexity is added, but whether the complexity is necessary. Having many simple systems that play together poorly, if at all, can complicate the IT architecture of the enterprise and fail to adequately support business operations. Systems that are structured to enable interoperability will fit into an enterprise environment far easier than ones that have interoperability bolted on after the fact.

Avoiding Platform Rot

(Mirrored from the Iasa Blog)

just never had the time to keep up with the maintenance

Is your OS the latest version? How about your web server? Database server? If not now, when?

A “no” answer to the first three questions is likely not that big a deal. There can be advantages to staying off the bleeding edge. That being said, the last question is the key one. If the answer to that is “I haven’t thought about it”, then there’s potential for problems.

“Technical Debt” is a currently a hot topic. Although the term normally brings to mind hacks and quick fixes, more subtle issues can be technical debt as well. A slide from a recent Michael Feathers presentation (slide 5) is particularly applicable to this:

Technical Debt is: the Effect of
Unavoidable Growth and the Remnants
of Inertia

New features tend to be the priority for systems, particularly early in their lifecycle. The plumbing (that which no one cares about until it quits working), tends to be neglected. Plumbing that is outside the responsibility of the development team (such as operating systems and database management systems) is likely to get the least attention. This can lead to systems running on platforms that are past their end of support date or a scramble to verify that the system can run on a later version. The former carries significant security risks while the latter is hardly conducive to adequately testing that the system will function identically on the updated platform. Additionally, new capabilities as well as operational and performance improvements may be missed out on if no one is paying attention to the platform.

One method to help avoid these types of issues is adoption of a DevOps philosophy, such as Amazon’s:

Amazon applies the motto “You build it, you run it”. This means the team that develops the product is responsible for maintaining it in production for its entire life-cycle. All products are services managed by the teams that built them. The team is dedicated to each product throughout its lifecycle and the organization is built around product management instead of project management.

This blending of responsibilities within a single team and focus on the application as a product (something I consider extremely beneficial) lessens the chance that housekeeping tasks fall between the cracks by removing the cracks. The operations aspect is enhanced by ensuring that its concerns are visible to those developing and the development aspect is enhanced by increased visibility into new capabilities of the platform components. The Solutions Architect role, spanning application, infrastructure, and business, is well placed to lead this effort.

Dependency Management is Risk Management

It depends

How well-managed are your dependencies? Are you aware of all of them? Which ones can fail gracefully? Which ones allow the application to continue in a degraded state in the event of a failure? How many dependencies would cause your application to become unavailable in the event of a failure?

It’s instructive that the Latin root of the word “depend” means “to hang”. Although we may rely on them, dependencies also hang over our heads like the sword of Damocles, an ever-present threat to the well-being of our systems and our peace of mind. Since we cannot live without them, we must find a way to harness the usefulness while minimizing the risk.

It’s common to think of code when the subject of dependencies comes up. Issues around how to organize an application into packages, reuse of common components, use of third-party components, and even proper usage of framework classes are all dependency management issues. For these types of dependencies, the anti-patterns tend to be well known, as are techniques for managing them:

My Four Principles of Dependency Management have an order of precedence.

  1. Minimise Dependencies – the simpler our code, the less “things” we have referring to other “things”
  2. Localise Dependencies – for the code we have to write, as much as possible, “things” should be packaged – in units of code organisation – together with the “things” they depend on
  3. Stabilise Dependencies – of course, we can’t put our entire dependency network in the same function (that would be silly). For starters, it’s at odds with minimising our dependencies, since modularity is the mechanism for removing duplication, and modularisation inevitably requires some dependencies to cross the boundaries between modules (using the most general meaning of “module” to mean a unit of code reuse – which could be a function or could be an entire system in a network of systems). When dependencies have to cross those boundaries, they should point towards things that are less likely – e.g., harder – to change. This can help to localise the spread of changes across our network of dependencies, in much the same way that a run on the banks is less likely if banks only lend to other banks that are less likely to default.
  4. Abstract Dependencies – when we have to depend on something, but still need to accomodate change into system somehow, the easiest way to that is to make things that are depended upon easier to substitute. It’s for much the same reason that we favour modular computer hardware. We can evolve and improve our computer by swapping out components with newer ones. To make this possible, computer components need to communicate through standard interfaces. These industry abstractions make make it possible for me to swap out my memory with larger or faster memory, or my hard drive, or my graphics card. If ATI graphics cards had an ATI-specific interface, and NVidia cards had NVidia-specific interfaces, this would not be possible.

Jason Gorman, “Revisiting Unified Principles of Dependency Management (In Lieu Of 100 Tweets)”

Infrastructure dependencies are another common dependency management issue. Database and web servers, middleware, and distributed storage all fall into this category, as does network connectivity. While the focus around code dependencies was on complexity and API stability, the main concern for infrastructure dependencies will be availability. Monitoring, clustering and/or mirroring are common tactics for mitigating risks with these. Other tactics include retries and queuing requests until communications are restored. In some cases, optional functionality can be disabled when unavailable.

Services, particularly third-party services, combine the features of both code and infrastructure dependencies in that API stability and availability are equal concerns. While this presents extra challenges, it also means that both sets of mitigation strategies are available for use. For example, assuming that two providers host an equivalent service, an abstraction layer can be combined with queuing and retries to remain up even when one provider is out of service. Where appropriate, an enterprise service bus can be used to handle translation across multiple message formats, taking that complexity out of the client application.

Third-party services pose a special case of availability concerns – supplier continuity. You must be prepared to deal with the contingency that the provider will either go out of business or discontinue their offering, leaving you to find a permanent replacement. This applies to third-party infrastructure (aka “the Cloud”) as well.

Configuration data is a dependency that doesn’t come to mind as readily. However, as systems become more complex and more redundant (purposely, for availability), configuration issues can cripple. Jason Gorman’s first two principles (minimize and localize) can help. Additionally, automating changes and additions to configuration data will also help ensure that items aren’t forgotten or poorly formatted.

A side effect of increased integration is increased reliance on shared data values and mapping from one value to another. If the same item is a “Gadget” on the web site and a “Widget” in the inventory system, there is a potential for problems. By the same token, even when items are named identically, issues can arise when changes are made for business reasons. When the systems involved cross organizational boundaries, the potential for problems increases further. It is critical to identify and understand these dependencies so that you have a plan in place to manage them prior to their becoming an issue.

Understanding and managing dependencies contributes to both the reliability and maintainability of a system. Potential issues can be identified, making them both quicker and easier to debug when a problem occurs. This allows issues to be evaluated for what level of risk is acceptable and which measures are appropriate to mitigate that risk. Where appropriate, the system’s architecture can then be structured to handle them with minimal manual intervention. Failing to do the work up front can “leave you hanging” when things go wrong.