Design Follies – ‘Why can’t I do that?’

Man in handcuffs

It’s ironic that the traits we think of as making a good developer are also those that can get in the way of design and testing, but that’s just the case. Think of how many times you’ve heard (or perhaps, said) “no one would ever do that”. Yet, given the event-driven, non-linear nature of modern systems, if a given execution path can occur, it will occur. Our cognitive biases can blind us to potential issues that arise when our product is used in ways we did not intend. As Thomas Wendt observed in “The Broken Worldview of Experience Design”:

To a certain extent, the designer’s intent is irrelevant once the product launches. That is, intent can drive the design process, but that’s not the interesting part; the ways in which users adopt the product to their own needs is where the most insight comes from. Designer intent is a theoretical, speculative formulation even when based on the most rigorous research methods and valid interpretations. That is not to say intention and strategic positioning is not important, but simply that we need to consider more than idealized outcomes.

Abhi Rele, in “APIs and Data: Journey to the Center of the Customer Experience”, put it in more concrete terms:

If you think you’re in full control of your customers’ experience, you’re wrong.

Customers increasingly have taken charge—they know what they want, when they want it, and how they want it. They are using their mobile phones more often for an ever-growing list of tasks—be it searching for information, looking up directions, or buying products. According to Google, 34% of consumers turn to the device that’s closest to them. More often than not, they’re switching from one channel or device mid-transaction; Google found that 67% of consumers do just that. They might start their product research on the web, but complete the purchase on a smartphone.

Switch device in mid-transaction? No one would ever do that! Oops.

We could, of course, decide to block those paths that we don’t consider “reasonable” (as opposed to stopping actual error conditions). The problem with that approach, is that our definition of “reasonable” may conflict with the customer’s definition. When “conflict” and “customer” are in the same sentence, there’s generally a problem.

These conflicts, in the right domain, can even have deadly results. While investigating the Asiana Airlines crash from July of 2013, one of the findings of the National Transportation Safety Board (NTSB) was that the crew’s belief of what the autopilot system would do did not coincide with what it actually did (my emphasis):

The NTSB found that the pilots had “misconceptions” about the plane’s autopilot systems, specifically what the autothrottle would do in the event that the plane’s airspeed got too low.

In the setting that the autopilot was in at the time of the accident, the autothrottles that are used to maintain specific airspeeds, much like cruise control in a car, were not programmed to wake up and intervene by adding power if the plane got too slow. The pilots believed otherwise, in part because in other autopilot modes on the Boeing 777, the autothrottles would in fact do this.

“NTSB Blames Pilots in July 2013 Asiana Airlines Crash” on Mashable.com

Even if it doesn’t contribute to a tragedy, a poor user experience (inconsistent, unstable, or overly restrictive) can lead to unintended consequences, customer dissatisfaction, or both. Basing that user experience on assumptions instead of research and/or testing increases the risk. As I’ve stated previously, risky assumptions are an assumption of risk.

Advertisement

Design Follies – ‘I paid for it so you have to use it’

Sinking of the SMS Ostfriesland
Sunk Costs

The Daily WTF is almost always good for a laugh (or a facepalm, at least). It’s even been the inspiration for posts on error handling and dependency management. The June 24th post, “The System”, is yet another one that I have to comment on. You have to read the full post to fully appreciate the nature of the “system” in question (really, read it – it’s so bad I can’t even begin to do it justice via an excerpt), but the punch line is that it’s so wretchedly convoluted due to the fact that one of the components of it was a “significant expense”.

The term “sunk cost” refers to a “cost that has already been incurred and cannot be recovered”. Although economic theory asserts that sunk costs should not influence future decisions, it’s a common enough phenomenon. The sunk cost fallacy (aka escalation of commitment) is the academic term for “throwing good money after bad”. It’s common in endeavors as disparate as business, gambling, and military operations. Essentially, a person operating under this fallacy justifies spending more money based on the fact that they’ve already invested so much and to abandon it would be a waste. Needless to say, this can easily become a death spiral where ever larger sums are wasted to avoid acknowledging the original mistake.

In the IT realm, the sunk cost fallacy can lead to real problems. Weaving a poorly chosen platform/component/service/etc. into your applications not only wastes money and effort up front, but also complicates future efforts. Quality of service impairments (e.g. maintainability, performance, scalability, and even security) can quickly add up, eclipsing the original cost of the bad decision.

One area where sunk costs are quasi-relevant is when a given capability is adequate. Theoretically, you’re not really considering the prior cost, but deciding to avoid a new cost. In practical terms, it works out the same – you have something that works, no need to pay for a redundant capability.

Beyond being aware of this particular anti-pattern, there’s not much else that can be done about it in many instances. Typically this is something imposed from above, so beyond getting on the record with your reservations, there may be little else you can do. If, however, you’re in a managerial position, reducing the cultural factors that promote this fallacy could serve you well.

Faster Horses – Henry Ford and Customer Development on Iasa Global Blog

A faster horse

Henry (“Any customer can have a car painted any color that he wants so long as it is black”) Ford was not an advocate of customer development. Although there’s no evidence that he actually said “If I had asked people what they wanted, they would have said faster horses”, it’s definitely congruent with other statements he did make.

See the full post on the Iasa Global Blog (a re-post, originally published on CitizenTekk and mirrored here).

Microservices and Data Architecture – Who Owns What Data?

Medieval Master and Scholars

An important consideration with microservice architectures is their effect on an enterprise’s data architecture. In “Carving it up – Microservices, Monoliths, & Conway’s Law”, I touched on the likelihood of there being data fragmentation and redundancy with this style (not that monoliths have a good record regarding these attributes either). More important than data fragmentation and redundancy, is the notion of authoritativeness, which was mentioned in “More on Microservices – Boundaries, Governance, Reuse & Complexity”. When redundant data is locked up in monoliths with little or no interoperability and little or no governance, then it’s very easy to have conflicting data without an approved method of determining which copy represents the true state. The answer lies not in removing all redundancy, but in recognizing the fact that systems will share concepts and managing that sharing.

Multiple systems routinely share conceptual entities. In his post on bounded contexts, Martin Fowler used the example of Customer and Product concepts spanning Sales and Support contexts. In all likelihood, these would be shared by other contexts as well, Customer perhaps also appearing in an Accounts Receivable context while Product might also be part of the Catalog and Inventory contexts. While these contexts might share concepts, the details will differ from one to another (e.g. the Price attribute of a Product may be relevant to the Sales and Catalog contexts, but irrelevant to Support and Inventory). Control over these details will likely be isolated to a single context (e.g. while Price may be relevant to both the Sales and the Catalog contexts, it’s likely that the setting of the price is only the responsibility of the Catalog context).

Why the discussion about bounded contexts? The Organized around Business Capabilities characteristic of microservices elaborated in the Lewis and Fowler post is analogous to bounded contexts within a system. A microservice style of architecture brings the idea of a bounded context to a system of systems.

A common characteristic of monoliths is that they may be owned by one organizational unit, but contain several contexts that are rightly the domain of another organizational unit due to a lack of integration. Re-keying or scripting extracted data from one system to another can arguably be called eventual consistency, but it’s a poor alternative to what can be done with a service-based architecture. The section “Data duplication over Events” in Jeppe Cramon’s “Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 4” illustrates how cooperating services can share data in a controlled manner with the service owning the data broadcasting out changes to those consuming that data, whether they be other transactional systems or data stores used for analytics and reporting.

The “bounded context writ large” nature of a microservices style allows you to use Conway’s Law to improve your data architecture. When the technology supports the same communication and governance model as the business it is supposed to support, data conflicts can be reduced, if not eliminated. Don’t underestimate, however, the effects that legacy systems may have had on the business’ communication and governance model. Years of working around the existing systems may have influenced (officially and/or unofficially) that model. As Udi Dahan noted in “People, Politics, and the Single Responsibility Principle”:

This just makes it that much harder to decide how to structure our software – there is no map with nice clean borders. We need to be able to see past the organizational dysfunction around us, possibly looking for how the company might have worked 100 years ago if everything was done by paper. While this might be possible in domains that have been around that long (like banking, shipping, etc) but even there, given the networked world we now live in, things that used to be done entirely within a single company are now spread across many different entities taking part in transnational value networks.

In short – it’s freakin’ hard.

But it’s still important.

Just don’t buy too deeply into the idea that by getting the responsibilities of your software right, that you will somehow reduce the impact that all of that business dysfunction has on you as a software developer. Part of the maturation process for a company is cleaning up its’ business processes in parallel to cleaning up its’ software processes.

It should be obvious that some governance will be needed to untangle the monoliths and keep them so. The good news is that this particular style provides the tools to do it incrementally.

“Finding the Balance” on Iasa Global Blog

Evening it out

One of my earliest posts on Form Follows Function, “There is no right way (though there are plenty of wrong ones)”, dealt with the subject of trade-offs. Whether dealing with the architecture of a solution or the architecture of an enterprise, there will be competing forces at work. Resolving these conflicts in an optimal manner involves finding the balance between individual forces and the system as whole (consistent with the priorities of the stakeholders).

See the full post on the Iasa Global Blog (a re-post, originally published here).

Coordinating Microservices – Playing Well with Others

Eugene Ormandy Conducting

In “More on Microservices – Boundaries, Governance, Reuse & Complexity”, I made the statement that I loved feedback on my posts. Thomas Cagley and Alexander Samarin submitted two comments that reinforced that sentiment and led directly to this post.

Thomas’ comment asked about the risks inherent in microservice architectures. It was a good, straight-forward question that was right on point with the post. It also foreshadowed Alexander’s comment that “An explicit coordination between services is still missing…Coordination should be externalized…” because coordination of microservices is a significant area of risk.

In his comment, Alexander provided links to two of his own posts “Ideas for #BPMshift – Delenda est “vendor-centric #BPM” – How to modernise a legacy ERP” and “Enterprise patterns: eclipse”. These posts deal with decomposing monoliths into services and then composing services into larger coordinating services. They support his position that the coordination should be external to the various component services, a position that I agree with for the most part. However, according to my understanding of those posts, his position rests on considerations of dependency management and ease of composition. While these are very important, other factors are equally important to consider when designing how the components of a distributed system work together.

There is a tendency for people to design and implement distributed applications in the same manner they would a monolith, resulting in a web of service dependencies. Services are not distributed objects. Arnon Rotem-Gal-Oz’s “Fallacies of Distributed Computing Explained” explains in detail why treating them as a such introduces risk (all of these fallacies affect the quality of coordination of collaborating services). That people are still making these mistaken assumptions so many years later (Peter Deutsch contributed the first 7 fallacies in 1994 and James Gosling added the 8th in 1997) is mind-boggling:

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn’t change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

In addition to the issues illustrated by the fallacies, coupling in distributed systems becomes more of an area of operational concern than just an element of “good” design. Ben Morris’ “How loose coupling can be undermined in service-orientated architectures” is a good resource on types of coupling that can be present in service architectures.

Synchronous request/response communication is a style familiar to most developers in that it mimics the communication pattern between objects in object-oriented software systems. It is a simple style to comprehend. That familiarity and simplicity, however, make it a particularly troublesome style in that it is subject to many of the issues listed above (items 1 through 3 and 7 especially). The synchronous nature introduces a great deal of problematic coupling, noted by Jeppe Cramon in “Micro services: It’s not (only) the size that matters, it’s (also) how you use them – part 1”:

Coupling has a tendency of creating cascading side effects: When a service changes its contract it becomes something ALL dependent services must deal with. When a service is unavailable, all services that depend upon the service are also unavailable. When a service failsduring a data update, all other services involved in the same coordinated process / update also have to deal with the failed update (process coupling)

Systems using the synchronous request/response style can be structured to minimize the effects of some of the fallacies, but there is a cost for doing so. The more provision one makes for reliability, for example, the more complicated the client system becomes. Additionally, one can further aggravate the amount of coupling via the use of distributed transactions to improve reliability, which Jeppe Cramon addresses in “Micro services: It’s not (only) the size that matters, it’s (also) how you use them – part 2”.

In the Lewis and Fowler post, orchestration was dealt with in the section named “Smart endpoints and dumb pipes”. Their approach emphasized pipe and filter composition (with a nod to reducing the chattiness of the communication compared to that within the process space of a monolith) and/or lightweight messaging systems instead of Enterprise Service Bus (ESB) products. While complex ESBs may be overkill, at least initially, I would not necessarily counsel avoiding them. Once the need moves beyond simple composition into routing and transformation, then the value proposition for these types of products becomes clearer (especially where they include management, monitoring and logging features). The message routing and transformation capabilities in particular can allow you to decouple from a particular service implementation providing that the potential providers have similar data profiles.

Asynchronous communication methods are more resilient to the issues posed by the eight fallacies and can also reduce some types of coupling (temporal coupling at a minimum). As Jeppe Cramon states in “Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 3”, asynchronous communication can be either one way (events) or it can still be two way (request/reply as opposed to request/response). Jeppe’s position is that true one way communication is superior and in many cases, I would agree. There will still be many situations, however, where a degree of process coupling, however reduced, must be lived with.

In summary, composing services is far more complex than composing method calls within the single process space of a monolithic application. A microservice architecture that looks like a traditional layered monolith with services at the layer boundaries betrays a poor understanding of the constraints that distributed applications operate under. The cost of going out of process should not be a surprise to architects and developers. Even with custom protocols narrowly tailored to their function, database accesses are a recognized source of performance issues and managed accordingly. We shouldn’t expect equivalent, much less better performance from services running over HTTP.

“It Depends” and “I Don’t Know” on Iasa Global Blog

Getting good advice?

When Croesus of Lydia considered going to war against the Persian Empire, he sought advice from the best available source of his day – the Oracle at Delphi. He was told that attacking the Persians would mean the end of a mighty empire. Armed with this “knowledge”, he attacked and a mighty empire, his own, was destroyed.

While the Oracle’s advice was arguably accurate, it definitely wasn’t helpful. The ambiguous answer conveyed more certainty than was warranted. While Croesus was complicit in his downfall (what’s that saying about “assumptions”?), the Oracle must also accept some blame. Failing to convey the uncertainty was a betrayal.

See the full post on the Iasa Global Blog (a re-post, originally published here).

#4U2U – Canned Competency, Values & Pragmatism

home canned food

Not quite two years ago, I put up a quick post entitled “The Iron Law of Tools”, which in its essence was: “that which does for you, can do to you” (whence comes the #4U2U in the title of this post). That particular post focused on ORMs (Entity Framework to be specific), but the warning equally applies to libraries and frameworks for other technical issues as well as process, methodology and techniques.

Libraries, frameworks, and processes (“tools” from this point forward) can make things easier by allowing you to concentrate on what to do rather than how to do it (via high-level abstractions and/or conventions). However, tools are not a substitute for understanding. Neither the Law of Unintended Consequences nor Murphy’s Law have been repealed. Without an adequate understanding of how something works, you cannot assess the costs of the trade-offs that are being made (and there are trade-offs involved, you can rely on that). Understanding is likewise necessary to recognize and fix those situations where the tool causes more problems than it solves. As Pawel Brodzinski observed in his post “A Fool With a Tool Is Still a Fool”:

Any time a discussion goes toward tools, any tools really, it’s a good idea to challenge the understanding of a tool itself and principles behind its successes. Without that shared success stories bear little value in other contexts, thus end result of applying the same tools will frequently result in yet another case of a cargo cult. While it may be good for training and consulting businesses (aka prophets) it won’t help to improve our organizations.

A fool with a tool will remain a fool, only more dangerous since now they’re armed.

Pawel’s point regarding cargo cults is particularly important. Lack of understanding of how a particular effect proceeds from a given cause often manifests as dogmatic assertions in defense of some “universal truth”. The closest thing I’ve found to a universal truth of software development is that it’s very unlikely that anything is universally applicable to every context.

It’s dangerous to conflate adherence to a tool with one’s core values, such that anyone who disagrees is “wrong” or “deluded” or “unprofessional”. That being said, values can provide a frame of reference in understanding someone’s position in regard to a tool. In “The TDD Divide: Everyone is Right”, Cory House addresses the current dispute over Test-Driven Development and notes (rightly, in my opinion):

The world is a messy place. Deadlines loom, team skills vary widely, and the impact of bugs varies greatly by industry. Ultimately, we write software to make money and solve problems. Tests are a tool that help us do both. Consider the context to determine which testing style fits for your project.

Uncle Bob is right. Quality matters. Separation of concerns and unit testing help assure the utmost quality, speed, and flexibility.

DHH is right. Sometimes the cost of unit tests exceed their benefit. Some of the benefit of automated testing can be achieved through automated integration testing instead.

You need to understand what a tool offers and what it costs and the result of that equation in relation to what’s important in your context. With that understanding, you can make a rational choice.

More on Microservices – Boundaries, Governance, Reuse & Complexity

The Gerry-Mander

One of the things I love most about blogging (and never get enough of) is the feedback. My previous post, “Carving it up – Microservices, Monoliths, & Conway’s Law”, generated several comments/discussions that, taken together, warranted a follow-up post.

One such discussion was a Twitter exchange with Ruth Malan and Jeff Sussna regarding governance. Jeff regarded the concept of decentralized governance to be “controversial”, although he saw centralized governance as “problematic” and as “rigidity”. Ruth observed “Or at least an axis of tradeoffs and (organizational) design…? (Even “federated” has wide interpretation)”. I defended the need for some centralized governance, but with restraint, noting “governing deeper than necessary will cause problems, e.g. problem w/ BDUF (IMO) is the B, Not UF”.

When applications span multiple teams, all deploying independently, some centralized governance will be necessary. The key is to set rules of the road that allow teams to work independently without interfering with each other and not to attempt to micro-manage. This need for orchestration extends beyond just the technical. As a comment by Tom Cagley on my previous post pointed out “Your argument can also be leveraged when considering process improvements across team and organizational boundaries”. Conflicting process models between teams could easily complicate distributed solutions.

Another comment on the “Carving it up – Microservices, Monoliths, & Conway’s Law” post, this time from Robert Tanenbaum, discussed reuse and questioned whether libraries might be a better choice in some cases. Robert also observed “Agile principles tell us to implement the minimal architecture that satisfies the need, and only go to a more complex architecture when the need outgrows the minimal implementation”. I view microservice and SOA architectures as more of a partitioning mechanism than a tool for reuse. Because reuse carries more costs and burdens than most give it credit for, I tend to be less enthusiastic about that aspect. I definitely agree, however, that distributed architectures may be better suited to a later step in the evolutionary process rather than as a starting point. A quote from “Microservices: Decomposing Applications for Deployability and Scalability” by Chris Richardson puts it nicely (emphasis is mine):

Another challenge with using the microservice architecture is deciding at what point during the lifecycle of the application you should use this architecture. When developing the first version of an application, you often do not have the problems that this architecture solves. Moreover, using an elaborate, distributed architecture will slow down development.

Michael Brunton-Spall’s “What Are Microservices and Why Are They Important” provides more context on the nature of the trade-offs involved with this architectural style:

Microservices trade complicated monoliths for complex interacting systems of simple services.

A large monolith might be so complicated that it is hard to reason about certain behaviours, but it is technically possible to reason about it. Simple services interacting with each other will exhibit emergent behaviours, and these can be impossible to fully reason about.

Cascading failure is significantly more of a problem with complex systems, where failures in one simple system cause failures in the other systems around them. Luckily there are patterns, such as back-pressure, dead-man’s switch and more that can help you mitigate this, but you need to consider this.

Finally, microservices can becomes the solution through which you see the problem, even when the problem changes. So once built, if your requirements change such that your bounded contexts adjust, the correct solution might be to throw way two current services and create three to replace them, but it’s often easier to simply attempt to modify one or both. This means microservices can be easy to change in the small, but require more oversight to change in the large.

The difference between “complicated” and “complex” in the quote is particularly important. The emergent behaviors mentioned means that the actions of complex systems will only be completely explainable in retrospect. This means that distributed applications cannot be treated as monoliths with space between the parts, but must be designed according to their nature. This is a dominant theme for Jeppe Cramon, whose ongoing work around microservices on his TigerTeam site is well worth the read.

Jeppe posted a pair of comments on a LinkedIn discussion of my post. In that discussion, Jeppe pointed out that smaller services focused on business capabilities were preferable to monoliths, but that is not that same as connecting several monoliths with web services. The main focus, however, was on the nature of data “owned” by the services. I touched on that very briefly in my previous post, mentioning that some notion of authoritativeness was needed (central governance again). Jeppe concurred, noting that monoliths lead to a multi-master data architecture where multiple systems contain redundant, potentially conflicting, data on the same entity.

The bottom line? In my opinion, this architectural style has tremendous potential to enhance an enterprise’s operations, provided it’s not over-sold (“SOA cures baldness, irritable bowel syndrome, and promotes world peace…for free!!!”). Services are not a secret sauce to effortlessly transform legacy systems into the technology of the twenty-second century. There are caveats, trade-offs, and costs. A pragmatic approach that takes those into account along with the potential benefits should be much more likely to succeed than yesterday’s “build it and they will come” philosophy.

“Beware Premature Certainty – Embracing Ambiguous Requirements” on Iasa Global Blog

You have to measure it to manage it!

Many see ambiguity as antithetical to software requirements. However, as Ruth Malan has observed:

Pick your battles with ambiguity careful. She is a wily foe. Not to be dominated. Rather invited to reveal the clarities we can act on. Make decisions with the understanding that we need to watchful, for our assumptions will, sooner or later, become again tenuous in that fog of ambiguity and uncertainty that change churns up.

See the full post on the Iasa Global Blog (a re-post, originally published here).