In “More on Microservices – Boundaries, Governance, Reuse & Complexity”, I made the statement that I loved feedback on my posts. Thomas Cagley and Alexander Samarin submitted two comments that reinforced that sentiment and led directly to this post.
Thomas’ comment asked about the risks inherent in microservice architectures. It was a good, straight-forward question that was right on point with the post. It also foreshadowed Alexander’s comment that “An explicit coordination between services is still missing…Coordination should be externalized…” because coordination of microservices is a significant area of risk.
In his comment, Alexander provided links to two of his own posts “Ideas for #BPMshift – Delenda est “vendor-centric #BPM” – How to modernise a legacy ERP” and “Enterprise patterns: eclipse”. These posts deal with decomposing monoliths into services and then composing services into larger coordinating services. They support his position that the coordination should be external to the various component services, a position that I agree with for the most part. However, according to my understanding of those posts, his position rests on considerations of dependency management and ease of composition. While these are very important, other factors are equally important to consider when designing how the components of a distributed system work together.
There is a tendency for people to design and implement distributed applications in the same manner they would a monolith, resulting in a web of service dependencies. Services are not distributed objects. Arnon Rotem-Gal-Oz’s “Fallacies of Distributed Computing Explained” explains in detail why treating them as a such introduces risk (all of these fallacies affect the quality of coordination of collaborating services). That people are still making these mistaken assumptions so many years later (Peter Deutsch contributed the first 7 fallacies in 1994 and James Gosling added the 8th in 1997) is mind-boggling:
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn’t change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
In addition to the issues illustrated by the fallacies, coupling in distributed systems becomes more of an area of operational concern than just an element of “good” design. Ben Morris’ “How loose coupling can be undermined in service-orientated architectures” is a good resource on types of coupling that can be present in service architectures.
Synchronous request/response communication is a style familiar to most developers in that it mimics the communication pattern between objects in object-oriented software systems. It is a simple style to comprehend. That familiarity and simplicity, however, make it a particularly troublesome style in that it is subject to many of the issues listed above (items 1 through 3 and 7 especially). The synchronous nature introduces a great deal of problematic coupling, noted by Jeppe Cramon in “Micro services: It’s not (only) the size that matters, it’s (also) how you use them – part 1”:
Coupling has a tendency of creating cascading side effects: When a service changes its contract it becomes something ALL dependent services must deal with. When a service is unavailable, all services that depend upon the service are also unavailable. When a service failsduring a data update, all other services involved in the same coordinated process / update also have to deal with the failed update (process coupling)
Systems using the synchronous request/response style can be structured to minimize the effects of some of the fallacies, but there is a cost for doing so. The more provision one makes for reliability, for example, the more complicated the client system becomes. Additionally, one can further aggravate the amount of coupling via the use of distributed transactions to improve reliability, which Jeppe Cramon addresses in “Micro services: It’s not (only) the size that matters, it’s (also) how you use them – part 2”.
In the Lewis and Fowler post, orchestration was dealt with in the section named “Smart endpoints and dumb pipes”. Their approach emphasized pipe and filter composition (with a nod to reducing the chattiness of the communication compared to that within the process space of a monolith) and/or lightweight messaging systems instead of Enterprise Service Bus (ESB) products. While complex ESBs may be overkill, at least initially, I would not necessarily counsel avoiding them. Once the need moves beyond simple composition into routing and transformation, then the value proposition for these types of products becomes clearer (especially where they include management, monitoring and logging features). The message routing and transformation capabilities in particular can allow you to decouple from a particular service implementation providing that the potential providers have similar data profiles.
Asynchronous communication methods are more resilient to the issues posed by the eight fallacies and can also reduce some types of coupling (temporal coupling at a minimum). As Jeppe Cramon states in “Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 3”, asynchronous communication can be either one way (events) or it can still be two way (request/reply as opposed to request/response). Jeppe’s position is that true one way communication is superior and in many cases, I would agree. There will still be many situations, however, where a degree of process coupling, however reduced, must be lived with.
In summary, composing services is far more complex than composing method calls within the single process space of a monolithic application. A microservice architecture that looks like a traditional layered monolith with services at the layer boundaries betrays a poor understanding of the constraints that distributed applications operate under. The cost of going out of process should not be a surprise to architects and developers. Even with custom protocols narrowly tailored to their function, database accesses are a recognized source of performance issues and managed accordingly. We shouldn’t expect equivalent, much less better performance from services running over HTTP.