Microservice Principles and Enterprise IT Architecture

Julia Set Fractal

Ruth Malan is fond of noting that “design is fractal”. In a comment on her post “We Just Stopped Talking About Design”, she observed:

We need to get beyond thinking of design as just a do once, up-front sort of thing. If we re-orient to design as something we do at different levels (strategic, system-in-context, system, elements and mechanisms, algorithms, …), at different times (including early), and iteratively and throughout the development and evolution of systems, then we open up the option that we (can and should) design in different media.

This fractal nature is illustrated by the fact that software systems and systems of systems belonging to an organization exist within an ecosystem dominated by that organization which is itself a system of systems of the social kind operating within a larger ecosystem (i.e. the enterprise). Just as structure follows strategy then becomes a constraint on strategy going forward, the architectures of the systems that make up the IT architecture of the enterprise influence its character (for good or bad) and vice versa. Likewise, the IT architecture of the enterprise and the architecture of the enterprise itself are mutually influencing. Ruth again:

Of course, I don’t mean we design the (entire business) ecosystem…We can, though, design interventions in the ecosystem, to shift value flows and support the value network. You know, like supporting the development community that will build apps on your platform with tooling and APIs, or creating relationships with content providers, that sort of thing. And more.

So what does any of this have to do with the principles behind the microservice architectural style?

Separation of concerns, modularity, scalability, DRY-ness, high-cohesion, low coupling, etc. are all recognized virtues at the application architecture level. As we move into the levels of solution architecture and enterprise IT architecture (EITA), these qualities, in my opinion, remain valuable. Many of the attributes the microservice style embody these qualities. In particular, componentization at the application level (i.e. systems as components in a system of systems) and focus on a particular business capability both enhance agility at the solution architecture and EITA levels of abstraction.

Conventionally, the opposite of a microservice has been termed a “monolith” when discussing microservice architecture. Robert Annett, in “What is a Monolith?”, takes an expansive view of the term, listing three ways in which an application’s architecture can be monolithic. He notes that the term need not be pejorative. Chris Carroll, in “A Single Deployment Target is not a Monolith”, prefers the traditional definition, an application suffering from insufficient modularity. He notes that an application whose components run in process on the same machine can be loosely coupled and modular. This holds true when considering the application’s architecture, but begins to falter when considering the architecture of a solution and more so at the EITA level.

In my opinion, applications that encompass multiple business capabilities, even when well designed internally, can be described as monolithic at higher levels of abstraction. This need not be considered a bad thing; where those multiple capabilities are organizationally cohesive, more granular componentization may not pass a cost/benefit analysis. However, where the capabilities are functionally disparate, the potential for redundancy in function, lack of organizational alignment, process mismatch, and data integrity issues all become significant. In a previous post, “Making and Taming Monoliths”, I presented a hypothetical solution architecture illustrating most of these issues. This was coupled with an example of how that could be remedied in an incremental manner (it should be noted that the taming of a solution architecture monolith is most likely to succeed where the individual applications are internally well modularized).

Higher level modularity and DRY-ness enhance both solution architecture and EITA. This applies in terms of code:

This also applies in terms of data:

Modularity at the level of solution architecture and EITA is also important in terms of process. Whether the model is Bi-Modal IT or Pace-Layered, it is becoming more and more apparent that no one process will fit the entire enterprise (and for what it’s worth, I agree with Simon Wardley that Bi-Model is a mode short). Having disparate business capabilities reside within the same application increases the risk of collisions where the process used is inappropriate to the domain. When dealing with Conway’s Law, it’s useful to remember “I Fought the Law, and the Law Won”.

Even without adopting a pure microservice architecture for any application, adopting some of the principles behind the style can be useful. Reducing redundant code and data reduces risk and allows teams to concentrate on the core capabilities of their application. Modular solution and enterprise IT architectures have more flexibility in terms of deployment, release schedules, and process. Keeping applications tightly focused on business capabilities allows you to use Conway’s Law to your advantage. Not every organization is a Netflix, but you may be able to profit by their example.

Advertisements

Microservices – The Too Good to be True Parts

Label for Clark Stanley's Snake Oil Liniment

Over the last several months, I’ve written several posts about microservices. My attitude toward this architectural style is one of guarded optimism. I consider the “purist” version of it to be overkill for most applications (are you really creating something Netflix-scale?), but see a lot of valuable ideas developing out of it. Smaller, focused, service-enabled applications are, in my opinion, an excellent way to increase systems agility. Where the benefits outweigh the costs and you’ve done your homework, systems of systems make sense.

However, the history of Service-Oriented Architecture (SOA), is instructive. A tried and true method of discrediting yourself is to over-promise and under-deliver. Microservice architectures, as the latest hot topic, currently receive a lot of uncritical press, just as SOA did a few years back. An article on ZDNet, “How Nike thinks about app development: Lots of micro services”, illustrates this (emphasis is mine):

Nike is breaking down all the parts of its apps to crate (sic) building blocks that can be reused and tweaked as needed. There’s also a redundancy benefit: Should one micro service fail the other ones will work in the app.

Reuse and agility tend to be antagonists. The governance needed to promote reuse impedes agility. Distribution increase complexity on its own; reuse adds additional complexity. This complexity comes not only from communication issues but also from coordination and coupling. Rationalization, reuse and the ability to compose applications from the individual service is absolutely a feature of this style. The catch is the cost involved in achieving it.

A naive reading of Nike’s strategy would imply that breaking everything up “auto-magically” yields reuse and agility. Without an intentional design, this is very unlikely. Cohesion of the individual services, rather than their size is the important factor in achieving those goals. As Stefan Tilkov notes in “How Small Should Your Microservice Be?”:

In other words, I think it’s not a goal to make your services as small as possible. Doing so would mean you view the separation into individual, stand-alone services as your only structuring mechanism, while it should be only one of many.

Redundancy and resilience are likewise issues that need careful consideration. The quote from the Nike article might lead you to believe that resilience and redundancy are a by-product of deploying microservices. Far from it. Resilience and distribution are orthogonal concepts; in fact, breaking up a monolith can have a negative impact on resilience if resilience is not specifically accounted for in the design. Coupling, in all its various forms, reduces resilience. Jeppe Cramon, in “SOA: synchronous communication, data ownership and coupling”, has shown that distribution, in and of itself, does not eliminate coupling. This means that “Should one micro service fail the other ones will work in the app” may prove false if the service that fails is coupled with and depended on by other services. Decoupling is unlikely to happen accidentally. Likewise, redundant instances of the same service will do little good if a resource shared by those instances (e.g. the data store) is down.

Even where a full-blown microservice architecture is inappropriate, many of the principles behind the style are useful. Swimming with the tide of Conway’s Law, rather than against it, is more likely to yield successful application architectures and enterprise IT architectures. The coherence that makes it successful is a product of design, however, and not serendipity. Microservices are most definitely not snake oil. Selling the style like it is snake oil is a really bad idea.

Design Away Error Handling?

Evil Monkey Pointing

Writing is an interesting process. Some posts spring to life; ignited by some inspiration, they swiftly flow from fingertips to (virtual) page. Other posts simmer. An idea is half-conceived, then languishes incomplete. It sits in the corner staring at you balefully, a reproach for your lack of commitment. In the case of this one, it sat for the better part of a year because I wasn’t quite sure which side I wanted to come down on.

It started with a fairly uncontroversial tweet from Michael Feathers: “Spend more time designing away errors so that you don’t have to handle them.” On its face, this is reasonable; eliminating error vectors should lead to a more robust product. Some warning flags appear, however, when you read the stream leading up to that tweet:

Francis Fish’s point about context (“…medical equipment and (say) android app have totally diff needs…”) certainly applies, as does Feather’s reply. Whenever I see the word “best” devoid of context, the credibility detector bottoms out. It’s the response to Brian Knapp (“Yeah, they are better not used at all. :)”) that is worrisome. Under some circumstances, throwing an exception when an error condition occurs is the right answer.

Having default values for parameters is one technique for designing away errors. Checking for problem conditions such as disk space or network connectivity prior to use can be used as well. The key thing to remember is that these techniques assume that the problem is an expected one and that something can be done about it. Checking for space or connectivity is useless if you don’t have an alternate location to write to or if you lack the ability to restore the connection. Likewise, use of a default value is only appropriate when there is a meaningful default.

The thing to remember is that avoiding an exception is not the goal, correct execution/valid state is. If you’re transferring money between accounts, you want to be able to trust that either the transaction completed and the balances are adjusted or that you know something went wrong. Silent failures are much more of a problem than noisy errors. As Jef Claes noted in “Tests as part of your code”, silent failures can put you in the newspaper (and not in a good way).

A more recent Twitter exchange involving Feathers returned to this same issue:

The short answer, is yes, it’s a bug. Otherwise things found in code reviews would not count as defects because they had not happened “live”. The last tweet in that stream summed it up nicely:

We cannot rely on design alone to eliminate error conditions because we cannot foresee all potential issues. Testing shares this same dilemma. I believe Arlo Belshee strikes the right balance in “Treat bugs as fires”. Fire departments concentrate foremost on preventing fires while still extinguishing those that fall through the cracks. Where one occurs, it’s treated as a learning experience. So too should we treat error conditions. Dan Cresswell put it nicely:

Handling exceptions is tedious, but critical. Where we can remove risks, or at least reduce them via design, so much the better. We cannot, however, rely on our ability to foresee every circumstance. Chaos Monkey chooses you.

[Hat tip to Lorrie MacVittie for tweeting the evil monkey image above]

Form Follows Function on SPaMCAST 315

SPaMCAST logo

Have you ever wanted to put a voice with the words that you’re reading here? Now you have the chance.

Tom Cagley, who publishes one of my favorite blogs, Software Process and Measurement (SPaMCAST), has invited me to become a regular on his podcast. We decided to name the segment “Form Follows Function” (after all, what else would we call it?). We kick it off this month with a brief discussion of my post “Quick Fixes That Last a Lifetime”.

You can listen to the podcast here.

I first appeared on Tom’s “SPaMCAST 268 – Gene Hughson, Architecture, Management, Software Development” last December and I’m honored to be invited to be a regular part of his excellent podcast.

Innovation, Agility, and the Big Ball of Mud in Meatspace

French infantry in a trench, Verdun 1916

Although the main focus of my blog is application and solution architecture, I sometimes write about process and management issues as well. Conway’s law dictates that the organizational environment strongly influences software systems. While talking with a colleague recently, I stated that I see organizations as systems – social systems operating on “hardware” that’s more complex and less predictable than that which hosts software systems (i.e. people, hence the use of “Meatspace” in the title). The entangling of social and software systems means that we should be aware of the architecture of the enterprise at the least in so far as it will affect the IT architecture of the enterprise.

Innovation and agility are hot topics. Large corporations, by virtue of their very size are at a disadvantage in this respect. In a recent article for Harvard Business Review, “The Core Incompetencies of the Corporation”, Gary Hamel discussed this issue. Describing corporations as “inertial”, “incremental” and “insipid”, he notes that “As the winds of creative destruction continue to strengthen, these infirmities will become even more debilitating”.

The wonderful thing (at least in my mind) about Twitter is that it makes it very easy for two people in the UK and two people in the US to hold an impromptu discussion of enterprise architecture in general and leadership and management issues in particular (on a Saturday morning, no less). Dan Cresswell started the ball rolling with a quote from a Hamel’s article: “most leaders still over-value alignment and conformance and under-value heterodoxy and heresy”. Tom Graves replied that he would suggest that “…heresy is a _necessary_ element of ‘working together’…”. My contribution was that I suspect that “together” is part of the problem; they don’t know how to integrate rebels and followers, therefore the heretics are relegated to a skunkworks or given the sack. Ruth Malan cautioned about limits, noting that “A perpetual devil’s advocate can hold team in perpetual churn; that judgment thing…by which I simply mean, sometimes dissent can be pugnacious, sometimes respectful and sometimes playful; so it depends”.

Ruth’s points re: “…that judgment thing…” and “…it depends” are, in my opinion, extremely important to understanding the issue. I noted that “that judgment thing” was a critical part of management and leadership. This is not in the sense that managers and leaders should only be the ones to exercise judgment, but that they should use their judgment to integrate, rather than eliminate, the heresies so that the organization does not stagnate. There is a need for a “predator”, someone to challenge assumptions, in the management realm as much as there is a need for one in the design and development realm. Likewise, an understanding of “it depends” is key. Neither software systems nor social systems are created and maintained via following a recipe.

While management practices are part of the problem, it’s naive to concentrate on that to the exclusion of all else. Tom Graves is fond of saying, “Things work better when they work together, on purpose”. This is a fundamental point. As he observed in “Dotting the joins (the JEA version)”:

Every enterprise is a system – an ‘ecosystem with purpose’ – constrained mainly by its core vision, values and other drivers. Within that system, everything ultimately connects with everything else, and depends on everything else: if it’s in the system, it’s part of the system, and, by definition, the system can’t operate without it.

The system must be structured to manage, not ignore complexity. Without an intentional design, things fall through the cracks. Tom again, from the same post:

To do something – to do anything, really – we need to know enough to get it to work right down in the detail of real-world practice. When there’s a lot of detail to learn, or a lot of complexity, we specialise: we choose one part of the problem, one part of the context, and concentrate on that. We get better at doing that one thing; and then better; and better again. And everyone can be a specialist in something – hence, given enough specialists, it seems that between us we should be able to do anything. In that sense, specialisation seems to be the way to get things done – the right way, the only way.

Yet there’s a catch. What specialisation really does is that it clusters all of its attention in one small area, and all but ignores the rest as Somebody Else’s Problem. It makes a dot, somewhere within what was previously a joined-up whole. And then someone else makes their own dot, and someone else carves out a space to claim to make their dot. But there’s nothing to link those dots together, to link between the dots – that’s the problem here.

Hamel’s use of the word “incremental” points the way to diagnosing the problem – enterprises have grown organically, rather than springing to life fully formed. Like a software system that has grown by sticking on bits and pieces without refactoring, social systems can become an example of Foote and Yoder’s “Big Ball of Mud” as well. Uncoordinated changes made without considering the larger system leads to a sclerotic mess, regardless of whether the system in question is social or software. My very first post on this blog, “Like it or not, you have an architecture (in fact, you may have several)”, sums it up. The question, is whether that architecture is intentional or not.