Fixing IT – Credible or Cassandra?

Portrait of Cassandra in front of Troy

In Greek mythology, Cassandra was a princess of Troy who possessed the gift of prophecy, but was cursed never to be believed. I was reminded of the story by a tweet from Peter Kretzman:

Credibility is a precious commodity in the business world. It’s hard to earn and yet can be lost astonishingly easily. Poor customer service, regardless of whether it’s due to malice, apathy, or just plain ignorance, damages trust.

Likewise, commitment mismatches can leave customers (internal and/or external) disaffected. The project that will be “out of sight, out of mind” for IT is the product that they will be saddled with for years to come. IT’s perceived lack of commitment (justified or not) is a source of conflict and mistrust.

Lack of a business focus is a credibility killer as well. Things like indulging faddish practices, essentially engaging in one-sided experiments with the enterprise’s money, are seen as evidence of immaturity. A dictatorial attitude toward technology issues is typically resented (regardless of whether the opinion is correct). Failure to communicate business value, whether out of arrogance or ignorance, can lead to ill-advised decisions on the part of the business. When you’re asking for seven figures and “trust me” is your sole justification, you cannot complain when you get turned down.

Things that might seem purely technical can damage the relationship with the customer. Technology for technology’s sake, putting your vision ahead of the customer’s needs, ignoring user experience, and inadequate attention to quality can all lead to a loss of trust.

Quality and reliability can be particularly problematic. Stepping into the breach and heroically fixing issues can be perceived as admirable in some organizations. All a customer sees is an outage. When dial tone services go down, so too does credibility. As Matt Ballantine observed in his post “Firefighting”:

But if your world is one where you can only justify your own existence through the solving of problems that are of your own creation, you’re in trouble long term. That’s where IT has been – and why commodity services have become so pervasive so quickly. The IT team wins no points for fixing stuff that’s gone wrong when someone else can be providing that stuff without it failing all the time.

Working with the business, IT can serve as a powerful force multiplier. Opportunities can be seized and risks averted. For that to happen, however, IT has to be heard. The less we shoot ourselves in the foot, the better chance we have.

Advertisement

“Dealing with “Technical Debt”” on Iasa Global

The term “technical debt” has a tendency to evoke an emotional response. Some people react puritanically – “technical debt” means sloppy code; sloppy code is sin; failing to call sin “sin” is condoning sin; to the stake with the heretic! Others will contend that technical debt solely refers solely to conscious trade-offs and by definition excludes code that is poorly written or designed.

Read “Dealing with “Technical Debt”” on Iasa Global for my take on how broadly “technical debt” should be defined.

Quick Fixes That Last a Lifetime

Move Fast and Break Things on xkcd

“Move fast and break things.”

“Fail fast.”

“YAGNI.”

“Go with the simplest thing that can possibly work.”

I’ve written previously about my dislike for simplistic sound-bite slogans. Ideas that have real merit under the appropriate circumstances can be deadly when stripped of context and touted as universal truths. As Tom Graves noted in his recent post “Fail, to learn”, it’s not about failing, it’s about learning. We can’t laugh at cargo cultists building faux airports to lure the planes back while we latch on to naive formulas for success in complex undertakings without a clue as to how they’re supposed to work.

The concepts of emergent design and emergent architecture are cases in point. Some people contend that if you do the simplest thing that could possibly work, “The architecture follows immediately from that: the architecture is just the accumulation of these small steps”. It is trivially true that an architecture will emerge under those circumstances. What is unclear (and unexplained) is how a coherent architecture is supposed to emerge without any consideration for the higher levels of scope. Perhaps the intent is to replicate Darwinian evolution. If so, that would seem to ignore the fact that Darwinian evolution occurs over very long time periods and leaves a multitude of bodies in its wake. While the species (at least those that survive) ultimately benefit, individuals may find the process harsh. If the fittest (most adaptable, actually) survive, that leaves a bleaker future for those that are less so. Tipping the scales by designing for more than the moment seems prudent.

Distributed systems find it even more difficult to evolve. Within the boundary of a single application, moving fast and breaking things may not be fatal (systems dealing with health, safety, or finance are likely to be less tolerant than social networks and games). With enough agility, unfavorable mutations within an application can be responded to and remediated relatively quickly. Ill-considered design decisions that cross system boundaries can become permanent problems when cost and complexity outweigh the benefits of fixing them. There is a great deal of speculation that the naming of Windows 10 was driven by the number of potential issues that would be created by naming it Windows 9. Allegedly, Microsoft based its decision on not triggering issues caused by short-sighted decisions on the part of developers external to Microsoft. As John Cook noted:

Many think this is stupid. They say that Microsoft should call the next version Windows 9, and if somebody’s dumb code breaks, it’s their own fault.

People who think that way aren’t billionaires. Microsoft got where it is, in part, because they have enough business savvy to take responsibility for problems that are not their fault but that would be perceived as being their fault.

It is naive, particularly with distributed applications, to act as if there are no constraints. Refactoring is not free, and consumers of published interfaces create inertia. While it would be both expensive and ultimately futile to design for every circumstance, no matter how improbable, it is foolish to ignore foreseeable issues and allow a weakness to become a “standard”. There is a wide variance between over-engineering/gold-plating (e.g. planting land mines in my front yard just in case I get attacked by terrorists) and slavish adherence to a slogan (e.g. waiting to install locks on my front door until I’ve had something stolen because YAGNI).

I can move fast and break things by wearing a blindfold while driving, but that’s not going to get me anywhere, will it?

Microservices Under the Microscope

Microscope

Buzz and backlash seems to describe the technology circle of life. Something (language, process, platform, etc.; it doesn’t seem to matter) gets noticed, interest increases, then the reaction sets in. This was seen with Service-Oriented Architecture (SOA) in the early 2000s. More was promised than could ever be realistically attained and eventually the hype collapsed under its own weight (with a little help from the economic downturn). While the term SOA has died out, services, being useful, have remained.

2014 appears to be the year of microservices. While neither the term nor the architectural style itself are new, James Lewis and Martin Fowler’s post from earlier this year appears to have significantly raised the level of interest in it. In response to the enthusiasm, others have pointed out that the microservices architectural style, like any technique, involves trade offs. Michael Feathers pointed out in “Microservices Until Macro Complexity”:

If services are often bigger than classes in OO, then the next thing to look for is a level above microservices that provides a more abstract view of an architecture. People are struggling with that right now and it was foreseeable. Along with that concern, we have the general issue of dealing with asynchrony and communication patterns between services. I strongly believe that there is a law of conservation of complexity in software. When we break up big things into small pieces we invariably push the complexity to their interaction.

Robert, “Uncle Bob”, Martin has recently been a prominent voice questioning the silver bullet status of microservices. In “Microservices and Jars”, he pointed out that applications can achieve separation of concerns via componentization (using jars/Gems/DLLs depending on the platform) without incurring the overhead of over-the-wire communication. According to Uncle Bob, by using a plugin scheme, components can be as independently deployable as a microservice.

Giorgio Sironi responded with the post “Microservices are not Jars”. In it, Giorgio pointed out independent deployment is only part of the equation, independent scalability is possible via microservices but not via plugins. Giorgio questioned the safety of swapping out libraries, but I can vouch for the fact that plugins can be hot-swapped at runtime. One important point made was in regard to this quote from Uncle Bob’s post:

If I want two jars to get into a rapid chat with each other, I can. But I don’t dare do that with a MS because the communication time will kill me.

Sironi’s response:

Of course, chatty fine-grained interfaces are not a microservices trait. I prefer accept a Command, emit Events as an integration style. After all, microservices can become dangerous if integrated with purely synchronous calls so the kind of interfaces they expose to each other is necessarily different from the one of objects that work in the same process. This is a property of every distributed system, as we know from 1996.

Remember this for later.

Uncle Bob’s follow-up post, “Clean Micro-service Architecture”, concentrated on scalability. It made the point that microservices are not the only method for scaling an application (true); and stated that “the deployment model is a detail” and “details are never part of an architecture” (not true, at least in my opinion and that of others):

While Uncle Bob may consider the idea of designing for distribution to be “BDUF Baloney”, that’s wrong. That’s not only wrong, but he knows it’s wrong – see his quote above re: “a rapid chat”. In the paper that’s referenced in the Sironi quote above, Waldo et al. put it this way:

We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure.

You can design a system with components that can run in the same process, across multiple processes, and across multiple machines. To do so, however, you must design them as if they were going to be distributed from the start. If you begin chatty, you will find yourself jumping through hoops to adapt to a coarse-grained interface later. If you start with the assumption of synchronous and/or reliable communications, you may well find a lot of obstacles when you need to change to a model that lacks one or both of those qualities. I’ve seen systems that work reasonably well on a single machine (excluding the database server) fall over when someone attempts to load balance them because of a failure to take scaling into account. Things like invalidating and refreshing caches as well as event publication become much more complex starting with node number two if a “simplest thing that can work” approach is taken.

Distributed applications in general and microservice architectures in particular are not universal solutions. There are costs as well as benefits to every architectural style and sometimes having everything in-process is the right answer for a given point in time. On the other hand, you can’t expect to scale easily if you haven’t taken scalability into consideration previously.

Service Versioning Illustrated

Hogarth painting the muse

My last post, “No Structure Services”, generated some discussion on LinkedIn regarding service versioning and how an application’s architecture can enable exposing services that are well-defined, stable, internally simple and DRY. I’ve discussed these topics in the past: “Strict Versioning for Services – Applying the Open/Closed Principle” detailed the versioning process I use to design and maintain services that can evolve while maintaining backwards compatibility and “On the plane or in the plane?” covered how I decouple the service from the underlying implementation. Based on the discussion, I decided that some visuals would probably provide some additional clarity to the subject.

Note: The diagrams below are meant to simplify understanding of these two concepts (versioning and the structure to support it) and not be a 100% faithful representation of an application. If you look at them as a blueprint, rather than a conceptual outline, you’ll find a couple SRP violations, etc. Please ignore the nits and focus on the main ideas.

Internal API Diagram

In the beginning, there was an application consisting of a class named Greeter, which had the job of constructing a greeting for some named person from another person. A user interface was created to collect the necessary information from the end-user, pass it to Greeter and display the results. The input to the Greet method is an object of type GreetRequest, which has members identifying the sender and recipient. Greeter.Greet() returns a GreetResponse, the sole member of which is a string containing the Greeting.

And it was good (actually, it was Hello World which until recently was just a cheesy little sample program but is now worth boatloads of cash – should you find yourself using these diagrams to pitch something to a VC, sending me a cut would probably be good karma 😉 ).

At some point, the decision was made to make the core functionality available to external applications (where external is defined as client applications built and deployed separately from the component in question, regardless of whether the team responsible for the client is internal or external to the organization). If the internal API were exposed directly, the ability to change Greeter, GreetRequest and GreetResponse would be severely constrained. Evolving that functionality could easily lead to non-DRY code if backwards compatibility is a concern.

Note: Backwards compatibility is always a concern unless you’re ditching the existing client. The option is synchronized development and deployment which is slightly more painful than trimming your fingernails with a chainsaw – definitely not recommended.

The alternative is to create a facade/adapter class (GreetingsService) along with complementary message classes (GreetingRequest and GreetingResponse) that can serve as the published interface. The GreetingsService exists to receive the GreetingRequest, manage its transformation to a GreetRequest, delegate to Greeter and manage the transformation of the GreetResponse into a GreetingResponse which is returned to the caller (this is an example of the SRP problem I mentioned above, in actual practice, some of those tasks would be handled by other classes external to GreetingsService – an example can be found here).

Internal API with External Service Adapter Diagram

Later, someone decided that the application should have multilingual capability. Wouldn’t it be cool if you could choose between “Hello William, from your friend Gene” and “Hola Guillermo, de su amigo Eugenio”? The question, however, is how to enable this without breaking clients using GreetingsService. The answer is to add the Language property to the GreetRequest (Language being of the GreetingLanguage enumeration type) and making the default value of Language be English. We can now create GreetingsServiceV1 which does everything GreetingsService does (substituting GreetingRequestV1 and GreetingResponseV1 for GreetingRequest and GreetingResponse) and adds the new language capability. The result is like this:

Internal API with External Service Adapters (2 versions) Diagram

Because Language defaults to English, there’s no need to modify GreetingsService at all. It should continue to work as-is and its clients will continue to receive the same results. The same type of results can be obtained using a loose versioning scheme (additions, which should be ignored by existing clients, are okay; you only have to add a new version if the change is something that would break the interface, like a deletion). The “can” and “should” raise flags for me – I have control issues (which is incredibly useful when you support published services).

Control is the best reason for preferring a strict versioning scheme. If, for example, we wanted to change the default language to Spanish going forward while maintaining backward compatibility, we could not do that under a loose regime without introducing a lot of kludgy complexity. With the strict scheme, it would be trivial (just change the default on GreetingRequestV1 to Spanish and you’re done). With the strict scheme I can even retire the GreetingService once GreetingServiceV1 is operational and the old clients have had a chance to migrate to the new version.

Our last illustration is just to reinforce what’s been said above. This time a property has been added to control the number of times the greeting is generated. GreetingsServiceV2 and its messages support that and all prior functionality, while GreetingsService and GreetingsServiceV1 are unchanged.

Internal API with External Service Adapters (3 versions) Diagram

As noted above, being well-defined, stable, internally simple and DRY are all positive attributes for published services. A strict versioning scheme provides those attributes and control over what versions are available.

No Structure Services

Amoeba sketch

Some people seem to think that flexibility is universally a virtue. Flexibility, in their opinion, is key to interoperability. Postel’s Principle, “…be conservative in what you do, be liberal in what you accept from others”, is often used to justify this belief. While this sounds wonderful in theory, in practice it’s problematic. As Tom Stuart pointed out in “Postel’s Principle is a Bad Idea”:

Postel’s Principle is wrong, or perhaps wrongly applied. The problem is that although implementations will handle well formed messages consistently, they all handle errors differently. If some data means two different things to different parts of your program or network, it can be exploited—Interoperability is achieved at the expense of security.

These problems exist in TCP, the poster child for Postel’s principle. It is possible to make different machines see different input, by building packets that one machine accepts and the other rejects. In Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection, the authors use features like IP fragmentation, corrupt packets, and other ambiguous bits of the standard, to smuggle attacks through firewalls and early warning systems.

In his defense, the environment in which Postel proposed this principle is far different from what we have now. Eric Allman, writing for the ACM Queue, noted in “The Robustness Principle Reconsidered”:

The Robustness Principle was formulated in an Internet of cooperators. The world has changed a lot since then. Everything, even services that you may think you control, is suspect.

Flexibility, often sold as extensibility, too often introduces ambiguity and uncertainty. Ambiguity and uncertainty are antithetical to APIs. This is why 2 of John Sonmez’s “3 Simple Techniques to Make APIs Easier to Use and Understand” are “Using enumerations to limit choices” and “Using default values to reduce required parameters”. Constraints provide structure and structure simplifies.

Taken to the extreme, I’ve seen flexibility used to justify “string in, string out” service method signatures. “Send us a string containing XML and we’ll send you one back”. There’s no need to worry about versioning, etc. because all the versions for all the clients are handled by a single endpoint. Of course, behind the scenes there’s a lot of conditional logic and “hope for the best” parsing. For the client, there’s no automated generation of messages nor even guarantee of structure. Validation of the structure can only occur at runtime.

Does this really sound robust?

I often suspect the reluctance to tie endpoints to defined contracts is due to excessive coupling between the code exposing the service and the code performing the function of the service. If domain logic is intermingled with presentation logic (which a service is), then a strict versioning scheme, an application of the Open/Closed Principle to services, now violates Don’t Repeat Yourself (DRY). If, however, the two concerns are kept separate within the application, multiple endpoints can be handled without duplicating business logic. This provides flexibility for both divergent client needs and client migrations from one message format to another with less complexity and ambiguity.

Stable interfaces don’t buy you much when they’re achieved by unsustainable complexity on the back end. The effect of ambiguity on ease of use doesn’t help either.