Design Away Error Handling?

Evil Monkey Pointing

Writing is an interesting process. Some posts spring to life; ignited by some inspiration, they swiftly flow from fingertips to (virtual) page. Other posts simmer. An idea is half-conceived, then languishes incomplete. It sits in the corner staring at you balefully, a reproach for your lack of commitment. In the case of this one, it sat for the better part of a year because I wasn’t quite sure which side I wanted to come down on.

It started with a fairly uncontroversial tweet from Michael Feathers: “Spend more time designing away errors so that you don’t have to handle them.” On its face, this is reasonable; eliminating error vectors should lead to a more robust product. Some warning flags appear, however, when you read the stream leading up to that tweet:

Francis Fish’s point about context (“…medical equipment and (say) android app have totally diff needs…”) certainly applies, as does Feather’s reply. Whenever I see the word “best” devoid of context, the credibility detector bottoms out. It’s the response to Brian Knapp (“Yeah, they are better not used at all. :)”) that is worrisome. Under some circumstances, throwing an exception when an error condition occurs is the right answer.

Having default values for parameters is one technique for designing away errors. Checking for problem conditions such as disk space or network connectivity prior to use can be used as well. The key thing to remember is that these techniques assume that the problem is an expected one and that something can be done about it. Checking for space or connectivity is useless if you don’t have an alternate location to write to or if you lack the ability to restore the connection. Likewise, use of a default value is only appropriate when there is a meaningful default.

The thing to remember is that avoiding an exception is not the goal, correct execution/valid state is. If you’re transferring money between accounts, you want to be able to trust that either the transaction completed and the balances are adjusted or that you know something went wrong. Silent failures are much more of a problem than noisy errors. As Jef Claes noted in “Tests as part of your code”, silent failures can put you in the newspaper (and not in a good way).

A more recent Twitter exchange involving Feathers returned to this same issue:

The short answer, is yes, it’s a bug. Otherwise things found in code reviews would not count as defects because they had not happened “live”. The last tweet in that stream summed it up nicely:

We cannot rely on design alone to eliminate error conditions because we cannot foresee all potential issues. Testing shares this same dilemma. I believe Arlo Belshee strikes the right balance in “Treat bugs as fires”. Fire departments concentrate foremost on preventing fires while still extinguishing those that fall through the cracks. Where one occurs, it’s treated as a learning experience. So too should we treat error conditions. Dan Cresswell put it nicely:

Handling exceptions is tedious, but critical. Where we can remove risks, or at least reduce them via design, so much the better. We cannot, however, rely on our ability to foresee every circumstance. Chaos Monkey chooses you.

[Hat tip to Lorrie MacVittie for tweeting the evil monkey image above]

Service Versioning Illustrated

Hogarth painting the muse

My last post, “No Structure Services”, generated some discussion on LinkedIn regarding service versioning and how an application’s architecture can enable exposing services that are well-defined, stable, internally simple and DRY. I’ve discussed these topics in the past: “Strict Versioning for Services – Applying the Open/Closed Principle” detailed the versioning process I use to design and maintain services that can evolve while maintaining backwards compatibility and “On the plane or in the plane?” covered how I decouple the service from the underlying implementation. Based on the discussion, I decided that some visuals would probably provide some additional clarity to the subject.

Note: The diagrams below are meant to simplify understanding of these two concepts (versioning and the structure to support it) and not be a 100% faithful representation of an application. If you look at them as a blueprint, rather than a conceptual outline, you’ll find a couple SRP violations, etc. Please ignore the nits and focus on the main ideas.

Internal API Diagram

In the beginning, there was an application consisting of a class named Greeter, which had the job of constructing a greeting for some named person from another person. A user interface was created to collect the necessary information from the end-user, pass it to Greeter and display the results. The input to the Greet method is an object of type GreetRequest, which has members identifying the sender and recipient. Greeter.Greet() returns a GreetResponse, the sole member of which is a string containing the Greeting.

And it was good (actually, it was Hello World which until recently was just a cheesy little sample program but is now worth boatloads of cash – should you find yourself using these diagrams to pitch something to a VC, sending me a cut would probably be good karma 😉 ).

At some point, the decision was made to make the core functionality available to external applications (where external is defined as client applications built and deployed separately from the component in question, regardless of whether the team responsible for the client is internal or external to the organization). If the internal API were exposed directly, the ability to change Greeter, GreetRequest and GreetResponse would be severely constrained. Evolving that functionality could easily lead to non-DRY code if backwards compatibility is a concern.

Note: Backwards compatibility is always a concern unless you’re ditching the existing client. The option is synchronized development and deployment which is slightly more painful than trimming your fingernails with a chainsaw – definitely not recommended.

The alternative is to create a facade/adapter class (GreetingsService) along with complementary message classes (GreetingRequest and GreetingResponse) that can serve as the published interface. The GreetingsService exists to receive the GreetingRequest, manage its transformation to a GreetRequest, delegate to Greeter and manage the transformation of the GreetResponse into a GreetingResponse which is returned to the caller (this is an example of the SRP problem I mentioned above, in actual practice, some of those tasks would be handled by other classes external to GreetingsService – an example can be found here).

Internal API with External Service Adapter Diagram

Later, someone decided that the application should have multilingual capability. Wouldn’t it be cool if you could choose between “Hello William, from your friend Gene” and “Hola Guillermo, de su amigo Eugenio”? The question, however, is how to enable this without breaking clients using GreetingsService. The answer is to add the Language property to the GreetRequest (Language being of the GreetingLanguage enumeration type) and making the default value of Language be English. We can now create GreetingsServiceV1 which does everything GreetingsService does (substituting GreetingRequestV1 and GreetingResponseV1 for GreetingRequest and GreetingResponse) and adds the new language capability. The result is like this:

Internal API with External Service Adapters (2 versions) Diagram

Because Language defaults to English, there’s no need to modify GreetingsService at all. It should continue to work as-is and its clients will continue to receive the same results. The same type of results can be obtained using a loose versioning scheme (additions, which should be ignored by existing clients, are okay; you only have to add a new version if the change is something that would break the interface, like a deletion). The “can” and “should” raise flags for me – I have control issues (which is incredibly useful when you support published services).

Control is the best reason for preferring a strict versioning scheme. If, for example, we wanted to change the default language to Spanish going forward while maintaining backward compatibility, we could not do that under a loose regime without introducing a lot of kludgy complexity. With the strict scheme, it would be trivial (just change the default on GreetingRequestV1 to Spanish and you’re done). With the strict scheme I can even retire the GreetingService once GreetingServiceV1 is operational and the old clients have had a chance to migrate to the new version.

Our last illustration is just to reinforce what’s been said above. This time a property has been added to control the number of times the greeting is generated. GreetingsServiceV2 and its messages support that and all prior functionality, while GreetingsService and GreetingsServiceV1 are unchanged.

Internal API with External Service Adapters (3 versions) Diagram

As noted above, being well-defined, stable, internally simple and DRY are all positive attributes for published services. A strict versioning scheme provides those attributes and control over what versions are available.

No Structure Services

Amoeba sketch

Some people seem to think that flexibility is universally a virtue. Flexibility, in their opinion, is key to interoperability. Postel’s Principle, “…be conservative in what you do, be liberal in what you accept from others”, is often used to justify this belief. While this sounds wonderful in theory, in practice it’s problematic. As Tom Stuart pointed out in “Postel’s Principle is a Bad Idea”:

Postel’s Principle is wrong, or perhaps wrongly applied. The problem is that although implementations will handle well formed messages consistently, they all handle errors differently. If some data means two different things to different parts of your program or network, it can be exploited—Interoperability is achieved at the expense of security.

These problems exist in TCP, the poster child for Postel’s principle. It is possible to make different machines see different input, by building packets that one machine accepts and the other rejects. In Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection, the authors use features like IP fragmentation, corrupt packets, and other ambiguous bits of the standard, to smuggle attacks through firewalls and early warning systems.

In his defense, the environment in which Postel proposed this principle is far different from what we have now. Eric Allman, writing for the ACM Queue, noted in “The Robustness Principle Reconsidered”:

The Robustness Principle was formulated in an Internet of cooperators. The world has changed a lot since then. Everything, even services that you may think you control, is suspect.

Flexibility, often sold as extensibility, too often introduces ambiguity and uncertainty. Ambiguity and uncertainty are antithetical to APIs. This is why 2 of John Sonmez’s “3 Simple Techniques to Make APIs Easier to Use and Understand” are “Using enumerations to limit choices” and “Using default values to reduce required parameters”. Constraints provide structure and structure simplifies.

Taken to the extreme, I’ve seen flexibility used to justify “string in, string out” service method signatures. “Send us a string containing XML and we’ll send you one back”. There’s no need to worry about versioning, etc. because all the versions for all the clients are handled by a single endpoint. Of course, behind the scenes there’s a lot of conditional logic and “hope for the best” parsing. For the client, there’s no automated generation of messages nor even guarantee of structure. Validation of the structure can only occur at runtime.

Does this really sound robust?

I often suspect the reluctance to tie endpoints to defined contracts is due to excessive coupling between the code exposing the service and the code performing the function of the service. If domain logic is intermingled with presentation logic (which a service is), then a strict versioning scheme, an application of the Open/Closed Principle to services, now violates Don’t Repeat Yourself (DRY). If, however, the two concerns are kept separate within the application, multiple endpoints can be handled without duplicating business logic. This provides flexibility for both divergent client needs and client migrations from one message format to another with less complexity and ambiguity.

Stable interfaces don’t buy you much when they’re achieved by unsustainable complexity on the back end. The effect of ambiguity on ease of use doesn’t help either.