Design for Life

Soundview, Bronx, NY


The underlying theme of my last post, “Babies, Bathwater, and Software Architects”, was that it’s necessary to understand the role of a software architect in order to understand the need for that role. If our understanding of the role is flawed, not just missing aspects of what the role should be focusing on, but also largely consisting of things the role should not be concerned with, then we can’t really effectively determine whether the role is needed or not. If a person drowning is told that an anvil is a life-preserver, that doesn’t mean that they don’t need a life-preserver. It does mean they need a better definition of “life-preserver”.

Ruth Malan, answering the question “Do we still need architects?”, captures the essence of what is unique about the concerns of the software architect role:

There’s paying attention to the structural integrity of the code when that competes with the urge to deliver value and all we’ve been able to accomplish in terms of the responsiveness of continuous integration and deployment. We don’t intend to let the code devolve as we respond to unfolding demands, but we have to intend not to, and that takes time — possibly time away from the next increment of user-perceived value. There’s watching for architecturally impactful, structurally and strategically significant decisions, and making sure they get the attention, reflection, expertise they require. Even when the team periodically takes stock, and spends time reflecting learning back into the design/code, the architect’s role is to facilitate, to nurture, the design integrity of the system as a system – as something coherent. Where coherence is about not just fit and function, but system properties. Non-trivial, mutually interacting properties that entail tradeoffs. Moreover, this coherence must be achieved across (microservice, or whatever, focused) teams. This takes technical know-how and know-when, and know-who to work with, to bring needed expertise to bear.

These system properties are crucial, because without a cohesive set of system properties (aka quality of service requirements), the quality of the system suffers:

Even if the parts are perfectly implemented and fly in perfect formation, the quality is still lacking.

A tweet from Charles T. Betz points out the missing ingredient:

Now, even though Frederick Brooks was writing about the architecture of hardware, and the nature of users has drastically evolved over the last fifty-four years, his point remains: “Architecture must include engineering considerations, so that the design will be economica1 and feasible; but the emphasis in architecture is upon the needs of the user, whereas in engineering the emphasis is upon the needs of the fabricator.”

In other words, habitability, the quality of being “livable”, is a critical condition for an architecture. Two systems providing the exact same functionality may not be equivalent. The system providing the “better” (from the user’s perspective) quality of service will most likely be seen as the superior system. In some cases, quality of service concerns can even outweigh functional concerns.

Who, if not the software architect, is looking out for the livability of your system as a whole?

“Microservice Mistakes – Complexity as a Service” on Iasa Global

I’m pleased to announce that I’ve been asked to continue as a contributor to the Iasa Global site. I’m planning to post original content there on at least a monthly basis. In the interim, please enjoy a re-post of “Microservice Mistakes – Complexity as a Service”

Law of Unintended Consequences – Security Edition

Bank Vault

More isn’t always better. When it comes to security, more can even be worse.

As the use of encryption has increased, management of encryption keys has emerged as a pain point for many organizations. The amount of encrypted data passing through corporate firewalls, which has doubled over the last year, poses a severe challenge to security professionals responsible for protecting corporate data. The mechanism that’s intended to protect information in transit does so regardless of whether the transmission is legitimate or not.

Greater complexity, which means greater inconvenience, can lead to decreased security. Usability increases security by increasing compliance. Alarm fatigue means that as the number of warnings increase, so does the likelihood of their being ignored

Like any design issue, security should be approached from a systems thinking viewpoint (at least in my opinion). Rather than a one-dimensional, naive approach, a holistic one that recognizes and deals with the interrelationships is more likely to get it right. Thinking solely in terms of actions while ignoring the reactions that result from them hampers effective decision-making.

To be effective, security should be comprehensive, coordinated, collaborative, and contextual.

Comprehensive security is security that involves the entire range of security concerns: application, network, platform (OS, etc.), and physical. Strength in one or more of these areas means little if only one of the others is fatally compromised. Coordination of the efforts of those responsible for these aspects is essential to ensure that the various security enhance rather than hinder security. This coordination is better achieved via a collaborative process that reconciles the costs and benefits systemically than a prescriptive one imposed without regard to those factors. Lastly, practices should be tailored to the context of the problem at hand. Value at risk and amount of exposure are two factors that should help determine the effort expended. Putting a bank vault door on the garden shed not only wastes money, but also hinders security by taking those resources away from an area of greater need.

As with most quality of service concerns, security is not a binary toggle but a continuum. Matching the response to the need is a good way to stay on the right side of the law of unintended consequences.

Microservice Mistakes – Complexity as a Service

Michael Feathers’ tweet about technical empathy packs a lot of wisdom into 140 characters. Lack of technical empathy can lead to a system that is harder to both implement and maintain since no thought was given to simplifying things for the caller. Maintainability is one of those quality of service requirements that appears to be a purely technical consideration right up to the point that it begins significantly affecting development time. If this sounds suspiciously like technical debt to you, then move to the head of the class.

The issue of technical empathy is particularly on point given the popular interest in microservice architectures (MSAs). The granular nature of the MSA style brings many benefits, but also comes with the cost of increased complexity (among others). Michael Feathers referred to this in a post from last summer titled “Microservices Until Macro Complexity”:

It is going to be interesting to see how this approach scales. Some organizations have a relatively low number of microservices. Others are pushing higher, around the 600 mark. This is a bit beyond the point where people start seeking a bigger picture. If services are often bigger than classes in OO, then the next thing to look for is a level above microservices that provides a more abstract view of an architecture. People are struggling with that right now and it was foreseeable. Along with that concern, we have the general issue of dealing with asynchrony and communication patterns between services. I strongly believe that there is a law of conservation of complexity in software. When we break up big things into small pieces we invariably push the complexity to their interaction.

The last sentence bears repeating: “When we break up big things into small pieces we invariably push the complexity to their interaction”. Breaking a monolith into microservices simplifies each individual component, however, as Eran Hammer observed in “The Fallacy of Tiny Modules”, “…at some point, someone has to put it all together and then, all the complexity is going to surface…”. As Yamen Sader illustrated in his slide deck “Microservices 101 – The Big Why” (slide #26), the structure of the organization, domain, and system will diverge in an MSA. The implication of this is that for a given domain task, we need to know more about the details of that task (i.e. have less encapsulation of the internals) in a microservice environment.

The further removed the consumer of these services are from the providers, the harder and less convenient it will be to transfer that knowledge. To put this in perspective, consider two fast food restaurants: one operates in a traditional manner where money is exchanged for burgers, the other is a microservice style operation where you must obtain the beef, lettuce, pickles, onion, cheese, and buns separately, after which the cooking service combines them (provided the money is there also). The second operation will likely be in business for a much shorter period of time in spite of the incredible flexibility it offers. Additionally, it that flexibility only truly exists when the contracts between two or more providers are swappable. As Ben Morris noted in “Do Microservices create extra challenges for distributed development?”:

Each team will develop its own interpretation of the system and view of the data model. Any service collaboration will have to involve some element of translation or mapping and will be vulnerable to subtle bugs that can arise from semantic differences.

Adopting a canonical model across the platform can help to address this problem by asserting a common vocabulary. Each service is expected to exchange information based on a shared definition of the main entities. However, this requires a level of cross-team governance which is very much at odds with the decentralised nature of microservices.

None of this is meant to say that the microservice architecture style is “bad” or “wrong”. It is my opinion, however, that MSA is first and foremost an architectural style of building applications rather than systems of systems. This isn’t an absolute; I could see multiple MSA applications within an organization sharing component microservices. Where those microservices transcend the organizational boundary, however, making use of them becomes more complex. Actions at a higher level of granularity, such as placing an order for some product or service, involves coordinating multiple services in the MSA realm rather than consuming one “chunkier” service. While the principles behind microservice architectures are relevant at higher levels of abstraction, a mismatch in granularity between the concept and implementation can be very troublesome. Maintainability issues are both technical debt and a source of customer dissatisfaction. External customers are far easier to lose than internal ones.

“Design? Security and Privacy? YAGNI” on Iasa Global

Two of my favorite “bumper sticker philosophies” (i.e. short, pithy, and incredibly simplistic sayings) are “the simplest thing that could possibly work” and YAGNI. Avoiding unnecessary complexity and unneeded features are good ideas at first glance. The problem is determining what is unnecessary and unneeded. Just meeting the functional requirements is unlikely to be sufficient.

Read “Design? Security and Privacy? YAGNI” on the Iasa Global site for a post about how it’s important to have someone responsible for Quality of Service requirements in general and security in particular.

“Beware Premature Certainty – Embracing Ambiguous Requirements” on Iasa Global Blog

You have to measure it to manage it!

Many see ambiguity as antithetical to software requirements. However, as Ruth Malan has observed:

Pick your battles with ambiguity careful. She is a wily foe. Not to be dominated. Rather invited to reveal the clarities we can act on. Make decisions with the understanding that we need to watchful, for our assumptions will, sooner or later, become again tenuous in that fog of ambiguity and uncertainty that change churns up.

See the full post on the Iasa Global Blog (a re-post, originally published here).

Beware Premature Certainty – Embracing Ambiguous Requirements

You have to measure it to manage it!

Many see ambiguity as antithetical to software requirements. However, as Ruth Malan has observed:

Pick your battles with ambiguity careful. She is a wily foe. Not to be dominated. Rather invited to reveal the clarities we can act on. Make decisions with the understanding that we need to watchful, for our assumptions will, sooner or later, become again tenuous in that fog of ambiguity and uncertainty that change churns up.

Premature certainty, locking in too soon to a particular quality of service metric or approach to a functional requirement can cause as many problems as requirements that are too vague. For quality of service, also known as “non-functional”, requirements, this manifests as metrics that lack a basis in reality, metrics that do not account for differing circumstances, and/or metrics that fail to align with their objective. For functional requirements, premature certainty presents as designs masquerading as requirements (specifying “how” instead of “what”) and/or contradictory requirements. For both functional and quality of service requirements, this turns what should be an aid to understanding into an impediment, as well as a source of conflict.

Quality of service requirements are particularly susceptible to this problem. In order to manage quality, metrics must be defined to measure against. Unfortunately, validity of those metrics is not always a priority. Numbers may be pulled from thin air, or worse, marketing materials. Without an understanding of the costs, five 9s availability looks like a no-brainer.

Committing to a metric without understanding the qualitative value of that measure is risky. A three-second response time sounds quick, but will it feel quick? Is it reasonable given the work to be performed? Is it reasonable across the range of environmental conditions that can be expected? How exactly does one measure maintainability? As Tom Graves, in “Metrics for qualitative requirements” noted:

To put it at perhaps its simplest, there’s a qualitative difference between quantitative-requirements and qualitative ones: and the latter cannot and must not be reduced solely to some form of quantitative metric, else the quality that makes it ‘qualitative’ will itself be lost.

In another post, “Requisite Fuzziness”, Tom points out that measures are proxies for qualities, not the qualities themselves. A naive approach can fail to meet the intended objective: “The reality is that whilst vehicle-speed can be measured quite easily, often to a high degree of precision, ‘safe speed’ is highly-variable and highly-contextual.” This context-bound nature begets a concept he refers to as “requisite fuzziness”:

It’s sort-of related to probability, or uncertainty, but it’s not quite the same: more an indicator of how much we need to take that uncertainty into account in system-designs and system-governance. If there’s low requisite-fuzziness in the context, we can use simple metrics and true/false rules to guide decision-making for that context; but if there’s high requisite-fuzziness, any metrics must be interpreted solely as guidance, not as mandatory ‘rules’.

The benefits of requisite fuzziness for functional requirements is somewhat counter-intuitive. Most would argue against ambiguity, as indeed Jeff Sutherland did in “Enabling Specifications: The Key to Building Agile Systems”:

The lawyers pointed out that a patent application is an “enabling specification.” This is a legal term that describes a document that allows the average person knowledgeable in the domain to create the feature without having any discussion with the originators of the enabling specification.

In general, requirements are NOT enabling specifications. On a recent project at a large global company we discovered that hundreds of pages of requirements were not enabling specifications. On the average 60% of what was in the documents was useless to developers. It caused estimates to double in size. Even worse, 10% of what was needed by developers to implement the software was not in the requirements.

The enabling specifications used at PatientKeeper provided a global description of a feature set framed as lightweight user stories with screen shots, business logic, and data structures. The enabling specification was used to generate user stories which then formed the product backlog. The global feature description was updated regularly by the Product Owner team and was a reference to the state of the system that allowed developers to see where the user stories in the product backlog came from.

A user story needes to be a mini-enabling specification for agile teams to operate at peak performance. If it is not, there will be the need for continued dialogue with the Product Owner during the sprint to figure out what the story means. This will reduce story process efficiency and cripple velocity.

While this level of detail may well enable a team to develop efficiently, it may cripple the team’s ability to develop effectively. Given the interdependent relationship between architecture and requirements, overly prescriptive requirements can introduce risk. When design elements (such as “data structures” above) find their way into requirements, it may be that the requirement cannot be implemented as specified within the constraints of the architecture. Without the dialog Jeff referred to, either the requirement will be ignored or the integrity of the architecture will be violated. Neither of these options are advisable.

Another danger inherent in this situation is that of solving the wrong problem. This was addressed by Charlie Alfred in “Invisible Requirements”:

Coming to solutions (requirements) too quickly often times overlooks potentially more beneficial solutions. To illustrate this, consider the Jefferson Memorial.

Several years ago, excessive erosion of the Jefferson Memorial was noticed. A brief investigation identified excessive cleaning as the cause. Since the memorial must be kept clean, more investigation was necessary. Bird droppings were identified as the culprit, so actions were taken to have fewer birds around.Eventually, however, someone asked why the birds were such a problem with the Jefferson Memorial and not the others. Another study determined that the birds frequented the memorial, not for love of Jefferson, but for love of the many tasty spiders that made their home there. Probing further, the spiders were thriving because the insect population was proliferating.Finally, understanding that the insects were attracted by the memorial lights at dusk and dawn identified the ultimate solution. Turn off the lights. Initial solutions driven by quick decisions by memorial managers (i.e. powerful stakeholders) provided expensive ill-suited solutions for an ill-understood problem. The root cause and final solution requirement were well hidden, only brought to light by extensive and time consuming trial and error solutions. Each required solution inappropriately framed the problem, missing the associated hidden causes and final necessary requirement.

“Expensive”, coupled with “extensive and time consuming”, should give pause, particularly when used to describe failures. Naively implementing a set of requirements without technical analysis may well harm the customer. In “Changing Requirements: You Have a Role to Play!”, Raja Bavani noted:

You have to understand what business analysts or product owners provide you. You have to ask questions as early as you can. You have to think in terms of test scenarios and test data. You have to validate your thoughts and assumptions whenever you are in doubt. You have to think about related user stories and conflicting requirements. Instead of doing all these, if you are going to remain a passive consumer of the inputs received from business analysts or product owners, I am sure you are seeding bug issues.

While we don’t want requirements that are deliberately undecipherable, neither can we expect requirements that are both fully developed and cohesive with the architecture as a whole. Rather, we should hope for something like what Robert Galen suggested, that “communicates…goals while leaving the flexibility for my architect to do their job”. They should be, according to J. B. Rainsberger, a ticket to a conversation.

Lisa Crispin captured the reason for this conversation in “Helping the Customer Stick to the Purpose of a User Story”:

Make sure you understand the *purpose* of a user story or feature. Start with the “why”. You can worry later about the “how”. The customers get to decide on the business value to be delivered. They generally aren’t qualified to dictate the technical implementation of that functionality. We, the technical team, get to decide the best way to deliver the desired feature through the software. Always ask about the business problem to be solved. Sometimes, it’s possible to implement a “solution” that doesn’t really solve the problem.

Likewise, Roman Pichler observed:

If I say, for instance, that booking a training course on our website should be quick, then that’s a first step towards describing the attribute. But it would be too vague to characterise the desired user experience, to help the development team make the right architecture choices, and to validate the constraint. I will hence have to iterate over it, which is best done together with the development team.

Rather than passively implementing specifications and hoping that a coherent architecture “emerges”, iteratively refining requirements with those responsible for the architecture stacks the deck in favor of success by surfing ambiguity:

“When you ask a question, what two word answer distinguishes the architect?” And they didn’t miss a beat, answering “It depends” in an instant. “That’s my litmus test for architects.” I told them. “So, how do I tell a good architect?”…I said “They tell you what it depends on.” Yes, “it depends” is a hat tip to multiple simultaneous possibilities and even truths, which is the hallmark of ambiguity (of the kind that shuts down those who are uncomfortable with it). The good architect can sense what the dependencies are, and figure what to resolve and what to live with, to make progress.

Ruth Malan, A Trace in the Sand