Design Follies – ‘Why can’t I do that?’

Man in handcuffs

It’s ironic that the traits we think of as making a good developer are also those that can get in the way of design and testing, but that’s just the case. Think of how many times you’ve heard (or perhaps, said) “no one would ever do that”. Yet, given the event-driven, non-linear nature of modern systems, if a given execution path can occur, it will occur. Our cognitive biases can blind us to potential issues that arise when our product is used in ways we did not intend. As Thomas Wendt observed in “The Broken Worldview of Experience Design”:

To a certain extent, the designer’s intent is irrelevant once the product launches. That is, intent can drive the design process, but that’s not the interesting part; the ways in which users adopt the product to their own needs is where the most insight comes from. Designer intent is a theoretical, speculative formulation even when based on the most rigorous research methods and valid interpretations. That is not to say intention and strategic positioning is not important, but simply that we need to consider more than idealized outcomes.

Abhi Rele, in “APIs and Data: Journey to the Center of the Customer Experience”, put it in more concrete terms:

If you think you’re in full control of your customers’ experience, you’re wrong.

Customers increasingly have taken charge—they know what they want, when they want it, and how they want it. They are using their mobile phones more often for an ever-growing list of tasks—be it searching for information, looking up directions, or buying products. According to Google, 34% of consumers turn to the device that’s closest to them. More often than not, they’re switching from one channel or device mid-transaction; Google found that 67% of consumers do just that. They might start their product research on the web, but complete the purchase on a smartphone.

Switch device in mid-transaction? No one would ever do that! Oops.

We could, of course, decide to block those paths that we don’t consider “reasonable” (as opposed to stopping actual error conditions). The problem with that approach, is that our definition of “reasonable” may conflict with the customer’s definition. When “conflict” and “customer” are in the same sentence, there’s generally a problem.

These conflicts, in the right domain, can even have deadly results. While investigating the Asiana Airlines crash from July of 2013, one of the findings of the National Transportation Safety Board (NTSB) was that the crew’s belief of what the autopilot system would do did not coincide with what it actually did (my emphasis):

The NTSB found that the pilots had “misconceptions” about the plane’s autopilot systems, specifically what the autothrottle would do in the event that the plane’s airspeed got too low.

In the setting that the autopilot was in at the time of the accident, the autothrottles that are used to maintain specific airspeeds, much like cruise control in a car, were not programmed to wake up and intervene by adding power if the plane got too slow. The pilots believed otherwise, in part because in other autopilot modes on the Boeing 777, the autothrottles would in fact do this.

“NTSB Blames Pilots in July 2013 Asiana Airlines Crash” on Mashable.com

Even if it doesn’t contribute to a tragedy, a poor user experience (inconsistent, unstable, or overly restrictive) can lead to unintended consequences, customer dissatisfaction, or both. Basing that user experience on assumptions instead of research and/or testing increases the risk. As I’ve stated previously, risky assumptions are an assumption of risk.

Design Follies – ‘I paid for it so you have to use it’

Sinking of the SMS Ostfriesland
Sunk Costs

The Daily WTF is almost always good for a laugh (or a facepalm, at least). It’s even been the inspiration for posts on error handling and dependency management. The June 24th post, “The System”, is yet another one that I have to comment on. You have to read the full post to fully appreciate the nature of the “system” in question (really, read it – it’s so bad I can’t even begin to do it justice via an excerpt), but the punch line is that it’s so wretchedly convoluted due to the fact that one of the components of it was a “significant expense”.

The term “sunk cost” refers to a “cost that has already been incurred and cannot be recovered”. Although economic theory asserts that sunk costs should not influence future decisions, it’s a common enough phenomenon. The sunk cost fallacy (aka escalation of commitment) is the academic term for “throwing good money after bad”. It’s common in endeavors as disparate as business, gambling, and military operations. Essentially, a person operating under this fallacy justifies spending more money based on the fact that they’ve already invested so much and to abandon it would be a waste. Needless to say, this can easily become a death spiral where ever larger sums are wasted to avoid acknowledging the original mistake.

In the IT realm, the sunk cost fallacy can lead to real problems. Weaving a poorly chosen platform/component/service/etc. into your applications not only wastes money and effort up front, but also complicates future efforts. Quality of service impairments (e.g. maintainability, performance, scalability, and even security) can quickly add up, eclipsing the original cost of the bad decision.

One area where sunk costs are quasi-relevant is when a given capability is adequate. Theoretically, you’re not really considering the prior cost, but deciding to avoid a new cost. In practical terms, it works out the same – you have something that works, no need to pay for a redundant capability.

Beyond being aware of this particular anti-pattern, there’s not much else that can be done about it in many instances. Typically this is something imposed from above, so beyond getting on the record with your reservations, there may be little else you can do. If, however, you’re in a managerial position, reducing the cultural factors that promote this fallacy could serve you well.

Faster Horses – Henry Ford and Customer Development on Iasa Global Blog

A faster horse

Henry (“Any customer can have a car painted any color that he wants so long as it is black”) Ford was not an advocate of customer development. Although there’s no evidence that he actually said “If I had asked people what they wanted, they would have said faster horses”, it’s definitely congruent with other statements he did make.

See the full post on the Iasa Global Blog (a re-post, originally published on CitizenTekk and mirrored here).

Microservices and Data Architecture – Who Owns What Data?

Medieval Master and Scholars

An important consideration with microservice architectures is their effect on an enterprise’s data architecture. In “Carving it up – Microservices, Monoliths, & Conway’s Law”, I touched on the likelihood of there being data fragmentation and redundancy with this style (not that monoliths have a good record regarding these attributes either). More important than data fragmentation and redundancy, is the notion of authoritativeness, which was mentioned in “More on Microservices – Boundaries, Governance, Reuse & Complexity”. When redundant data is locked up in monoliths with little or no interoperability and little or no governance, then it’s very easy to have conflicting data without an approved method of determining which copy represents the true state. The answer lies not in removing all redundancy, but in recognizing the fact that systems will share concepts and managing that sharing.

Multiple systems routinely share conceptual entities. In his post on bounded contexts, Martin Fowler used the example of Customer and Product concepts spanning Sales and Support contexts. In all likelihood, these would be shared by other contexts as well, Customer perhaps also appearing in an Accounts Receivable context while Product might also be part of the Catalog and Inventory contexts. While these contexts might share concepts, the details will differ from one to another (e.g. the Price attribute of a Product may be relevant to the Sales and Catalog contexts, but irrelevant to Support and Inventory). Control over these details will likely be isolated to a single context (e.g. while Price may be relevant to both the Sales and the Catalog contexts, it’s likely that the setting of the price is only the responsibility of the Catalog context).

Why the discussion about bounded contexts? The Organized around Business Capabilities characteristic of microservices elaborated in the Lewis and Fowler post is analogous to bounded contexts within a system. A microservice style of architecture brings the idea of a bounded context to a system of systems.

A common characteristic of monoliths is that they may be owned by one organizational unit, but contain several contexts that are rightly the domain of another organizational unit due to a lack of integration. Re-keying or scripting extracted data from one system to another can arguably be called eventual consistency, but it’s a poor alternative to what can be done with a service-based architecture. The section “Data duplication over Events” in Jeppe Cramon’s “Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 4” illustrates how cooperating services can share data in a controlled manner with the service owning the data broadcasting out changes to those consuming that data, whether they be other transactional systems or data stores used for analytics and reporting.

The “bounded context writ large” nature of a microservices style allows you to use Conway’s Law to improve your data architecture. When the technology supports the same communication and governance model as the business it is supposed to support, data conflicts can be reduced, if not eliminated. Don’t underestimate, however, the effects that legacy systems may have had on the business’ communication and governance model. Years of working around the existing systems may have influenced (officially and/or unofficially) that model. As Udi Dahan noted in “People, Politics, and the Single Responsibility Principle”:

This just makes it that much harder to decide how to structure our software – there is no map with nice clean borders. We need to be able to see past the organizational dysfunction around us, possibly looking for how the company might have worked 100 years ago if everything was done by paper. While this might be possible in domains that have been around that long (like banking, shipping, etc) but even there, given the networked world we now live in, things that used to be done entirely within a single company are now spread across many different entities taking part in transnational value networks.

In short – it’s freakin’ hard.

But it’s still important.

Just don’t buy too deeply into the idea that by getting the responsibilities of your software right, that you will somehow reduce the impact that all of that business dysfunction has on you as a software developer. Part of the maturation process for a company is cleaning up its’ business processes in parallel to cleaning up its’ software processes.

It should be obvious that some governance will be needed to untangle the monoliths and keep them so. The good news is that this particular style provides the tools to do it incrementally.

“Finding the Balance” on Iasa Global Blog

Evening it out

One of my earliest posts on Form Follows Function, “There is no right way (though there are plenty of wrong ones)”, dealt with the subject of trade-offs. Whether dealing with the architecture of a solution or the architecture of an enterprise, there will be competing forces at work. Resolving these conflicts in an optimal manner involves finding the balance between individual forces and the system as whole (consistent with the priorities of the stakeholders).

See the full post on the Iasa Global Blog (a re-post, originally published here).

Coordinating Microservices – Playing Well with Others

Eugene Ormandy Conducting

In “More on Microservices – Boundaries, Governance, Reuse & Complexity”, I made the statement that I loved feedback on my posts. Thomas Cagley and Alexander Samarin submitted two comments that reinforced that sentiment and led directly to this post.

Thomas’ comment asked about the risks inherent in microservice architectures. It was a good, straight-forward question that was right on point with the post. It also foreshadowed Alexander’s comment that “An explicit coordination between services is still missing…Coordination should be externalized…” because coordination of microservices is a significant area of risk.

In his comment, Alexander provided links to two of his own posts “Ideas for #BPMshift – Delenda est “vendor-centric #BPM” – How to modernise a legacy ERP” and “Enterprise patterns: eclipse”. These posts deal with decomposing monoliths into services and then composing services into larger coordinating services. They support his position that the coordination should be external to the various component services, a position that I agree with for the most part. However, according to my understanding of those posts, his position rests on considerations of dependency management and ease of composition. While these are very important, other factors are equally important to consider when designing how the components of a distributed system work together.

There is a tendency for people to design and implement distributed applications in the same manner they would a monolith, resulting in a web of service dependencies. Services are not distributed objects. Arnon Rotem-Gal-Oz’s “Fallacies of Distributed Computing Explained” explains in detail why treating them as a such introduces risk (all of these fallacies affect the quality of coordination of collaborating services). That people are still making these mistaken assumptions so many years later (Peter Deutsch contributed the first 7 fallacies in 1994 and James Gosling added the 8th in 1997) is mind-boggling:

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn’t change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

In addition to the issues illustrated by the fallacies, coupling in distributed systems becomes more of an area of operational concern than just an element of “good” design. Ben Morris’ “How loose coupling can be undermined in service-orientated architectures” is a good resource on types of coupling that can be present in service architectures.

Synchronous request/response communication is a style familiar to most developers in that it mimics the communication pattern between objects in object-oriented software systems. It is a simple style to comprehend. That familiarity and simplicity, however, make it a particularly troublesome style in that it is subject to many of the issues listed above (items 1 through 3 and 7 especially). The synchronous nature introduces a great deal of problematic coupling, noted by Jeppe Cramon in “Micro services: It’s not (only) the size that matters, it’s (also) how you use them – part 1”:

Coupling has a tendency of creating cascading side effects: When a service changes its contract it becomes something ALL dependent services must deal with. When a service is unavailable, all services that depend upon the service are also unavailable. When a service failsduring a data update, all other services involved in the same coordinated process / update also have to deal with the failed update (process coupling)

Systems using the synchronous request/response style can be structured to minimize the effects of some of the fallacies, but there is a cost for doing so. The more provision one makes for reliability, for example, the more complicated the client system becomes. Additionally, one can further aggravate the amount of coupling via the use of distributed transactions to improve reliability, which Jeppe Cramon addresses in “Micro services: It’s not (only) the size that matters, it’s (also) how you use them – part 2”.

In the Lewis and Fowler post, orchestration was dealt with in the section named “Smart endpoints and dumb pipes”. Their approach emphasized pipe and filter composition (with a nod to reducing the chattiness of the communication compared to that within the process space of a monolith) and/or lightweight messaging systems instead of Enterprise Service Bus (ESB) products. While complex ESBs may be overkill, at least initially, I would not necessarily counsel avoiding them. Once the need moves beyond simple composition into routing and transformation, then the value proposition for these types of products becomes clearer (especially where they include management, monitoring and logging features). The message routing and transformation capabilities in particular can allow you to decouple from a particular service implementation providing that the potential providers have similar data profiles.

Asynchronous communication methods are more resilient to the issues posed by the eight fallacies and can also reduce some types of coupling (temporal coupling at a minimum). As Jeppe Cramon states in “Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 3”, asynchronous communication can be either one way (events) or it can still be two way (request/reply as opposed to request/response). Jeppe’s position is that true one way communication is superior and in many cases, I would agree. There will still be many situations, however, where a degree of process coupling, however reduced, must be lived with.

In summary, composing services is far more complex than composing method calls within the single process space of a monolithic application. A microservice architecture that looks like a traditional layered monolith with services at the layer boundaries betrays a poor understanding of the constraints that distributed applications operate under. The cost of going out of process should not be a surprise to architects and developers. Even with custom protocols narrowly tailored to their function, database accesses are a recognized source of performance issues and managed accordingly. We shouldn’t expect equivalent, much less better performance from services running over HTTP.

“It Depends” and “I Don’t Know” on Iasa Global Blog

Getting good advice?

When Croesus of Lydia considered going to war against the Persian Empire, he sought advice from the best available source of his day – the Oracle at Delphi. He was told that attacking the Persians would mean the end of a mighty empire. Armed with this “knowledge”, he attacked and a mighty empire, his own, was destroyed.

While the Oracle’s advice was arguably accurate, it definitely wasn’t helpful. The ambiguous answer conveyed more certainty than was warranted. While Croesus was complicit in his downfall (what’s that saying about “assumptions”?), the Oracle must also accept some blame. Failing to convey the uncertainty was a betrayal.

See the full post on the Iasa Global Blog (a re-post, originally published here).