Rules are rules…or are they?

Elizabeth: Wait! You have to take me to shore. According to the Code of the Order of the Brethren…

Barbossa: First, your return to shore was not part of our negotiations nor our agreement so I must do nothing. And secondly, you must be a pirate for the pirate’s code to apply and you’re not. And thirdly, the code is more what you’d call “guidelines” than actual rules. Welcome aboard the Black Pearl, Miss Turner.

(from “Pirates of the Caribbean: The Curse of the Black Pearl”)

Over the course of a career in technology, you collect “rules” that guide the way you work. However, as a blogger noted: “Good design principles are usually helpful. But, are they always applicable?”.

We know redundant data is bad, except for caching, when it’s good. Database normalization is a good thing until it affects performance, then it’s not so good…same with transactions. Redundant code, now there’s an absolute! Except we find that the Open/Closed Principle dictates that new code be added for changes, rather than modifying the existing code. What to do?

The Open/Closed Principle always bothered me. I agree with it philosophically–good designs make it possible to add functionality without disturbing existing features–but in my experience there are no permanently closed abstractions. Superclasses or APIs might be stable for a (relatively) long time, but eventually even the most fundamental classes and interfaces need updating to meet emerging needs.

(Kent Beck, “The Open/Closed/Open Principle”)

Even Bob Martin, who stated “In many ways this principle is at the heart of object oriented design” recognized its limits:

It should be clear that no significant program can be 100% closed… In general, no matter how “closed” a module is, there will always be some kind of change against which it is not closed.

(from “The Open-Closed Principle”)

The takeaway is that an architect should be pragmatic, not dogmatic. Understanding the “why” behind a “rule” allows you to know when it doesn’t apply. Knowing the trade-offs allows you to make informed, rationale decisions that are consistent with the needs at hand. Blindly adhering to received wisdom is just magical thinking, and it’s value is more a function of chance than reality. By the same token, understanding why you’re acting contrary to common practice marks the difference between boldness and gambling.

A foolish consistency is the hobgoblin of little minds, adored by little statesman and philosophers and divines. With consistency a great soul has simply nothing to do. He may as well concern himself with his shadow on the wall. Speak what you think now in hard words, and to-morrow speak what to-morrow thinks in hard words again, though it contradict every thing you said to-day.

(Ralph Waldo Emerson, “Self Reliance”)


Strict Versioning for Services – Applying the Open/Closed Principle

In a previous post, I mentioned that I preferred a strict model of service versioning for the safety and control that it provides. In the strict model, any change results in a new contract. This is in contrast to the flexible model which allows changes that do not break backwards compatibility and the loose model which supports both backwards and forwards compatibility (by eliminating any concept of contract).

The loose model generally comes in two flavors: string in/string out and generic xml. Both share numerous disadvantages:

… sending such arbitrary messages in a SOAP envelope often requires additional processing by the SOAP engine. The wire format of a message might not be very readable once it gets encoded. Moreover, you must write code manually to deal with the payload of a message. Since there is no clear definition of the message in WSDL, the web services tooling cannot generate this code, which can make such a solution more error prone. Validating messages cannot take place. If a message format changes, it might be easier to update the service interface and regenerate binding code than ensuring all consumers and providers properly handle the new format.

In the loose model, a slight advantage in terms of governance (not having to manage multiple endpoints) is far outweighed by the additional complexity and effort required to compensate for its weaknesses.

The flexible model initially seems to be a compromise. Adding an optional message element with a default value arguably allows you to make a backward compatible change without having a new endpoint. But what happens if the default value is not appropriate to all of your original consumers? Blank or null defaults may work, but only if blank or null is otherwise meaningless for the service. Additionally, changes which break backwards compatibility will require a new contract anyway. Lastly, because multiple versions share the same physical artifacts, it will be impossible to determine which versions are still in use by monitoring log files.

The strict model I prefer is essentially an application of Bertrand Meyer’s Open/Closed Principle. This principle states that “software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification”. In other words, new functionality should be implemented via new code (which may build on existing code) rather than by changing the existing code. In the words of Bob Martin:

When a single change to a program results in a cascade of changes to dependent modules, that program exhibits the undesirable attributes that we have come to associate with “bad” design. The program becomes fragile, rigid, unpredictable and unreusable. The open-closed principle attacks this in a very straightforward way. It says that you should design modules that never change. When requirements change, you extend the behavior of such modules by adding new code, not by changing old code that already works.

(“The Open-Closed Principle”, Robert C. Martin)

Applied to services, this means that all changes (with the exception of bug fixes that don’t affect the signature) result in a new service contract: endpoint, messages, entities. Assuming the service is a facade for components of the business layer, this principle can be applied (or not) to those underlying components based on whether the risk of change outweighs the redundancy introduced. This allows the impact to existing consumers of the service to be managed.

Some general rules for governing service versions (from SOA World Magazine, “Design Strategies for Web Services Versioning”):

1. Determine how often versions are to be released. When considering frequency, you should consider how many versions of the Web service you want to support in parallel.

2. Understand the timeframe within which you expect consumers to move to a new version of a service. The Web services management platform may be able to provide guidance on service usage to determine the appropriate time to phase out older versions.

3. Consider releasing a pilot or an early release of a new version. Give consumers an opportunity to test compatibility and determine potential code impacts.

4. Approach Web services versioning the same way software packages might be released. Changes to your service, either as a result of bug fixes, partner requests, or specification upgrades, should follow a specific release cycle.

5. Clearly communicate your Web services versioning strategy to users of your Web service.

It should also be noted that chunkier services designed around business processes will be less likely to change frequently than fine-grained CRUD services. Additionally, such services will generally be more cohesive, making them easier to understand and use.

Like any rule, the Open/Closed Principle has its exceptions. Applying it universally to an application soon leads to a proliferation of duplicate classes and methods, some of which may no longer be used. However, when dealing with code that is directly consumed by external applications (i.e. a service you expose), then the Open/Closed Principle provides a way to avoid the pain you would otherwise incur.

Dependency Management – Anti-Patterns as Art from The Daily WTF

The Daily WTF has a great collection of IT oddities. It seems as though it’s also a gallery of modern art created by architects who haven’t quite understood that the question is not “can I?” but “should I?”.  

While the pictures are worth more than a thousand words, click the images for the back story:

Enterprise Dependency:  The Next Generation

Enterprise Dependency: Big Ball of Yarn

The Enterprise Dependency

Leadership, Strategy and Tactics: Vision or Hallucination?

As I began reading through John McKee’s Stop with the “vision” stuff on TechRepublic, I found myself agreeing whole-heartedly. This quote, in particular, was a real howler:

The last speaker who came to talk about leadership told them that if they had ‘vision statements’ for their departments and teams, then things would be more successful. Consequently, a lot of time and energy went into creating departmental vision statements. But in the end, her advice didn’t work. Things didn’t improve much – if at all. I think it was a real turnoff for many of our staff.

Magical thinking like this may give a confidence boost to those who want to believe in the power of the ritual, but lacking any basis in reality, it fails for those with even a trace of skepticism. Developing a vision statement no more insures success than collecting four-leaf clovers.

John lost me, however, when he began to describe why popular leadership concepts fail:

1. Many of them were never expected to. The concepts, models, philosophies, were created for strategic value. Not tactical. Strategy is long-term in focus; tactics get us through tough situations we’re facing now.

2. Strategic planning was invented by the military, probably first used in ancient Greece as a methodical thinking approach for army leaders who needed to have a longer term perspective in addition to winning the next battle. But it was never intended to determine how to take the next bridge or town.

3. By definition, vision statements are intended to define the way an organization will look in the future. It is long term in perspective. Mission statements are more about describing what an organization does to achieve the vision. The can be helpful for those who need some clear direction on a big picture basis.

4. Most people get confused by terms and words like these, preferring to be given fairly clear direction. But I’m not saying you need to spell out each detailed step a team member should take to get his/her tasks accomplished – that mistake could result in a loss of your best talent who resent such detail

I’ve been an avid student of history, particularly military history, since childhood, so item number 2 was like waving a red flag. Both strategy and tactics deal with plans, the difference being one of scope (today’s battle vs the campaign/war as a whole). They are inextricably linked in that tactics will be constrained by strategic considerations and strategy may be affected by tactical outcomes. To build on the metaphor, strategic planning may not only determine how to take the next bridge or town (“we have no reinforcements, so avoid heavy casualties”) but whether to attempt it in the first place.

I would argue that the problem lies not with the military heritage of strategic planning, but in straying from that heritage. US Army Field Manual FM 3-0 outlines the principles by which operations are conducted. While many of those are also relevant to business, the one most applicable to this topic is Unity of Command (emphasis is mine):

Unity of Command – For every objective, seek unity of command and unity of effort. At all levels of war, employment of military forces in a manner that masses combat power toward a common objective requires unity of command and unity of effort. Unity of command means that all the forces are under one responsible commander. It requires a single commander with the requisite authority to direct all forces in pursuit of a unified purpose.

The civilian version can be found in Harold Koontz and Cyril O’Donnell’s Principles of Management: An Analysis of Managerial Functions. There it is referred to as Unity of Command and Unity of Objectives. Regardless of the name or the source, however, the idea is the same: a direction is decided upon and the constituent units execute based on that strategy. Each subordinate unit’s strategy (or tactics, as the scale reduces) is a refinement of the parent’s, rather than an independently sourced plan. The US military codifies this planning method in the Operations Order (OPORD):

An Operations Order, often abbreviated as OPORD, is an executable plan that directs a unit to conduct a military operation. An Operations Order will describe the situation facing the unit, the mission of the unit, and what activities the unit will conduct to achieve the mission goals. Normally an Operations Order will be generated at a regiment, brigade, division, or corps headquarters and then given to lower echelons to implement. Each lower echelon as they receive an operations order will in turn develop their own Operations Order which removes extraneous detail and adds details focused on what and how that subunit will implement the higher level OPORD. So an Operations Order at a particular level of the military organization will trigger units involved in the operation to develop their own Operations Order which will borrow from the Operations Order given them so far as the situation and mission but will then add additional details for the activities a specific unit is to conduct.

It’s important to note that having unity of objectives is not the same as micro-managing. The parent sets the outcome to be attained by the constituent units, each of which determine the best way to execute within the constraints they’re given. This provides both flexibility and coordination.

A vision that’s not executed is nothing more than a dream. Likewise, a strategic plan that isn’t developed into an actionable operation is nothing more than a what-if game. Effective leadership develops a unified strategy and then ensures that the strategy filters down to the various components of the enterprise so that all are pulling in the same direction.

Adding Value

“Value” is a subject that seems to keep coming up. Whether discussing enterprise architecture or even information technology itself (IT’s chicken or egg problem on Tech Republic), demonstrating value is a recurrent theme. Even in a recent post discussing chargebacks, the value of in-house IT versus an external provider was an underlying concern.

The Tech Republic article captures the problem thusly:

The interesting aspect to these discussions is that people in IT management seem to fall into two camps in their proposed solution to the problem of IT/business alignment, resulting in a bit of a chicken or egg problem. One camp contends that IT needs to earn its place in strategic discussions, toiling away thanklessly until its contributions are recognized and the CIO receives a cordial invitation to the next board meeting. The other camp suggests that IT’s strategic potential should be recognized by the CEO, and that it is incumbent upon executive leadership to embrace IT and drive it toward executing corporate strategy.

What I find interesting about these arguments is that they assume too passive a role for IT leadership. In the first case, IT is the thankless soldier, hoping to one day be recognized by the General while heading into the gunfire, and in the second case, IT is the child athlete, sitting on the bench lamenting that the coach never calls him into the game.

That IT (or EA) should provide demonstrable value to an organization is a bit of a no-brainer. It’s also a fact of life that if you have to be your own advocate. Therefore, it seems like an affirmation of the geek stereotype to resent having to articulate the value you bring to the table (after all, you are in the best position to describe that value).

One argument is that the value of IT should be self-evident: no organization of any size would attempt to operate without IT any more than it would attempt to do without accounting. This is true, but trivially so. The value proposition is not whether the business needs a particular service, but whether you are the provider that best meets its needs.

The strategies listed in the Tech Republic article are a good start: “Get the utility aspect perfect, and then shut up about it”, “Talk the same language” and “State your case”. Network access and applications are now in the same category as telephone and lights: nobody will notice them unless they’re unavailable. Unfair, perhaps, but it’s little use trying to tout something that’s taken for granted. Shifting from being technology-centric to being business-centric, is likewise key. As I noted in another post, the business is not interested in your skill, they are interested in what your skill can do to improve their operation. Thinking in terms of the business will help both in choosing the right time to state your case and in what to say. There’s a difference between proposing a technology solution and proposing a business solution that happens to make use of technology.

Ultimately, it comes down to building and maintaining a relationship between the business and the IT infrastructure that supports it. In How CIOs Build Bridges With Other C-Level Execs on, Diane Frank quotes Robert Webb of Hilton Worldwide:

“In my world, the way you build trust is by making promises and keeping promises—repeatedly—and then there’s the opportunity to build a deeper relationship,” he says.

It is crucial to understand that the building of relationships is not a passive process. Rather, IT needs to contribute its own expertise. From the same article, there’s a passage about Zack Hicks, VP and CIO of Toyota Motor Sales USA:

For example, when Toyota was developing its Entune “connected vehicle” initiative, Hicks acknowledged the growing importance of telematics in the automotive industry but also pointed out the potential risks of what is essentially “a device driving down the road.” Hicks asked the other executives whether Toyota could handle the public outcry if proper security and privacy controls were not built into the in-car multimedia system.

The executives got the point. The result is that Toyota’s Entune initiative is moving ahead but Hicks is now fully involved in the project and is head of the senior management task force that’s ensuring the company is creating a safe and secure service.

This stands in stark contrast to an experience detailed in Run IT as a business — why that’s a train wreck waiting to happen on InfoWorld:

Adam Hartung, author of “Create Marketplace Disruption: How to Stay Ahead of the Competition,” tells the tale:

“I had an experience with the head of field services for a very large pharmaceutical company. He was working himself ragged, and complaining about insufficient budget to build all the Web applications his internal customers were asking for. So I suggested that instead of trying to deliver on ‘customer needs,’ why didn’t he go back to the business with a set of recommendations for how he thought he could deliver a superior set of solutions that would meet their needs in 2012 — and beyond.

“Instead of reacting to users, he should be their peer. Primarily, I asked him why he didn’t transition from building Web apps to instead creating a solution using cloud technology and true mobile devices like BlackBerrys, iPods, and emerging tablets. He could offer a better solution, at about a quarter of the cost.

“He told me he had never thought of dealing with the situation that way, but it sure made a lot more sense than letting his ‘customers’ run him ragged to deliver stuff with a short life.”

What Zack Hicks understood is that to “deliver on ‘customer needs'” means more than blindly saying “yes” to every request. Providing input based on the unique experience IT brings to the table (such as network security issues in the Toyota example) is the essence of adding value. A history of adding value is the best weapon in your arsenal when you need to show that your operation contributes to the success of the enterprise.

Coping with change using the Canonical Data Model

According to the Greek philosopher Heraclitus of Ephesus, “Nothing endures but change”. It’s not just a common theme, but also a fundamental principle of architecture. Any design that does not allow for change is doomed from the start. By the same token, when exposing services to other applications, particularly applications that cross organizational boundaries, the signature of those services (methods, messages and data) become a contract that should not be broken. The Canonical Data Model provides a way to resolve this paradox.

When using a message bus, a common pattern is to use a message translator at the endpoints to transform messages to and from the canonical format. This allows the messaging system to maintain versioned services as receive locations and communicate with connected applications in their native format, while avoiding the n-squared problem. Instead of requiring up to 12 translations for a four endpoint integration, the maximum number needed would be 8. As the numbers grow, the savings quickly become more significant (for 6 endpoints, 12 instead of 30; for 8 endpoints, 16 instead of 56). Internal operations (orchestration, etc.) are simplified because only one format is dealt with.

The same pattern can be applied to service-enabled applications. As I noted in a previous post, messages and data entities will change from release to release according to the needs of the system (new features as well as changes and fixes to existing functionality). As long as all consumers of these classes are built and deployed simultaneously, all is good. Once that no longer applies, such as when another application is calling your service, then a versioning scheme becomes necessary.

While a full treatment of versioning is beyond the scope of this post, my preference is for a strict versioning scheme:

Strategy #1: The Strict Strategy (New Change, New Contract)

The simplest approach to Web service contract versioning is to require that a new version of a contract be issued whenever any kind of change is made to any part of the contract.

This is commonly implemented by changing the target namespace value of a WSDL definition (and possibly the XML Schema definition) every time a compatible or incompatible change is made to the WSDL, XML Schema, or WS-Policy content related to the contract. Namespaces are used for version identification instead of a version attribute because changing the namespace value automatically forces a change in all consumer programs that need to access the new version of the schema that defines the message types.

This “super-strict” approach is not really that practical, but it is the safest and sometimes warranted when there are legal implications to Web service contract modifications, such as when contracts are published for certain inter-organization data exchanges. Because both compatible and incompatible changes will result in a new contract version, this approach supports neither backwards or forwards compatibility.

Pros and Cons

The benefit of this strategy is that you have full control over the evolution of the service contract, and because backwards and forwards compatibility are intentionally disregarded, you do not need to concern yourself with the impact of any change in particular (because all changes effectively break the contract).

On the downside, by forcing a new namespace upon the contract with each change, you are guaranteeing that all existing service consumers will no longer be compatible with any new version of the contract. Consumers will only be able to continue communicating with the Web service while the old contract remains available alongside the new version or until the consumers themselves are updated to conform to the new contract.

Therefore, this approach will increase the governance burden of individual services and will require careful transitioning strategies. Having two or more versions of the same service co-exist at the same time can become a common requirement for which the supporting service inventory infrastructure needs to be prepared.

In short, my reasons for preferring the strict model is summed up by the words “safety” and “control” above. Once you have exposed a service, your ability to control its evolution becomes severely limited. Changes to the signature, if meaningful, either break older clients or introduce risk of semantic confusion. The only way around this is to have synchronized releases of both service and consumers. This is a painful process when the external consumer is developed in-house. It is doubly so when that consumer is developed by an entirely different organization. Using a strict approach decouples the service from the client. New functionality is added via new endpoints and clients, shielded from the change until ready, upgrade on their own schedule (within reason).

Using the Canonical Data Model, strictly versioned services can be set up as facades over the business layer for use by external applications (whether in-house or third party). Incoming requests are translated to the canonical (internal-only) format and responses are translated from the canonical format to that required by the endpoint. Internal-only services (such as those to support a Smart Client) can use the canonical format directly.

This architecture allows for preprocessing or post processing for external calls if needed. Individual versions of services, messages and data can be exposed, tracked, and ultimately retired in a controlled manner. The best part is that it provides these benefits while still supporting a unified business layer. This combination of both flexibility and uniformity makes for a robust design.

Automating a bad process doesn’t make it better

I ran across an article the other day that gave the excellent advice of evaluating processes before moving them into the cloud. According to the author:

That process assessment should at minimum answer the following questions.

  • Do we know how effective this process is?
  • Do we know its costs and benefits?
  • Do we know how well it is supported and enforced?
  • Do we know that this process is kept current with changing needs and conditions?

The same advice applies to any development effort, whether cloud-based or not. Considering the expense involved in either new development or major updates, it makes little sense to neglect the opportunity to evaluate the underlying business processes. Otherwise, you risk wasting time and money implementing a system that fails to meet the users needs from the outset. Even if some measure of process flexibility is built in, the expense of the wasted effort will almost certainly dwarf what could have been spent on analysis.

Too often, a new application becomes the clone of an older one with just a technology facelift. Frequently the reason given is that it will avoid re-training of existing users. While this is a valid concern, both in terms of optimizing user adoption and avoiding service disruption during the cutover, it must be balanced against training of new users going forward. Depending on employee turnover rates, a more intuitive system that is easier to bring new employees up to speed on may be a better choice. Additionally, current users may not find the transition as difficult as anticipated. As people adjust to dealing with clunky systems, the complaints may die out, but this is not the same thing as satisfaction. The experienced user base may well appreciate the innovations as much as newcomers. Storyboarding and prototyping up front, working with those users, can help greatly in determining if this is the case.

Periodic re-evaluation is critical to any process if it is to remain relevant. Those that hang on past any point of utility either become a hindrance or worse, a custom of policy ignored. Socrates felt that “the unexamined life is not worth living”. Dealing with a process that has remained unexamined for long may have you wishing for the hemlock as well.

Think asynch for performance

It’s fairly natural to think in terms of synchronous steps when defining processes to be automated. You do this, then that, followed by a third thing and then you’re done. When applied to applications, this paradigm provides a relatively simple, easy to understand flow. The problem is that each of the steps take time and as the number of steps accumulate (due to enhancements added over time), the duration of the process increases. As the duration of the process increases, the probability that a user takes exception to the wait approaches 1.

There are several approaches to dealing with this dilemma. Hoping it goes away is a non-starter. Removing functionality, unless it’s blatant gold-plating, is probably not going to be an easy sell either. One workable option is scaling up and/or out to reduce the impact of user load on the time to completion. Another is to adjust the user’s perception of performance by executing selected steps, if not entire processes, asynchronously. Combining scale up/scale out and asynchronous operation can provide further performance gains, both actual and perceived.

When scaling up, the trade-off for increased performance is normally the cost of the hardware resources (processor/cores, memory, and/or storage). Either price or physical constraints will prove the limiting factor. For scaling out, the main trade-off will again be price, with a secondary consideration of some added complexity (e.g. session state considerations when load balancing a web application). Price would generally provide more of a limit to this method than any physical considerations. Greater complexity will be the main trade-off for asynchronous operations. Costs can also be incurred if coupled with one or more of the scaling methods above and/or if using commercial tools to host the asynchronous steps.

While a full treatment on the tools available to support asynchronous execution is beyond the scope of this post, they include everything from full-blown messaging systems to queuing to home-grown solutions. High-end tools can be expensive, complex and have considerable hardware requirements. However, when evaluating methods, it is important to include maintenance considerations. A complex system developed in-house may soon equal the cost of a third party product when you factor in both initial development and on-going maintenance.

Steps from a synchronous process can be carved out for asynchronous execution provided they meet certain criteria. These steps must be those that do not change the semantics of the process in the event of a failure or affect the response to the caller. Additionally, such steps must be amenable to retry logic (either automated or manual) or compensating transactions in order to deal with any failures. Feedback to the user must either be handled by periodically checking for messages or by the caller providing a callback mechanism. These considerations are a significant source of the additional complexity incurred.

As an example, consider a process where:

  • the status of an order is updated
  • a document is created and emailed to the participant assigned to fulfill the order
  • an accounting entry is made reflecting the expected cost of fulfilling the order
  • an audit entry is made reflecting the status change
  • the current state of the order is returned to the caller

Over a number of releases, the process is enhanced so that now:

  • the status of an order is updated
  • a document is created and emailed to the participant assigned to fulfill the order
  • an accounting entry is made reflecting the expected cost of fulfilling the order
  • an audit entry is made reflecting the status change
  • a snapshot of the document data is saved
  • a copy of the document is forwarded to the document management system
  • the current state of the order is returned to the caller

The creation and emailing of the document, as well as sending it to the document management system, are prime candidates for asynchronous execution. Both tasks are reproducible in the event of a failure and neither leaves the system in an unstable state should they fail to complete. Because the interaction with the document management system crosses machine boundaries and involves a large message, the improvement should be noticeable. As noted above, the trade-offs for this will include the additional complexity in dealing with failures (retry logic: manual, automatic, or both) as well as the costs and complexity associated with whatever method is used to accomplish the step out of band.

Implementing a process that is entirely asynchronous can have a broader range of complexity. Simple actions that require little feedback to the user, such as re-sending the email from the example above, should be no more difficult than the carved out step. Processes that require greater user interaction will require more design and development effort to accomplish. Processes that were originally implemented in a synchronous manner will require the most effort.

Converting a process from a synchronous communications model to an asynchronous one will require substantial refactoring across all layers. When this refactoring crosses application boundaries (as in the case of a service application supporting other client applications), then the complexity increases. Crossing organizational boundaries (e.g. company to company) will entail the greatest complexity. In these cases, providing a new asynchronous method while maintaining the old synchronous one will simplify the migration. Trying to coordinate multiple releases, particularly across enterprises, is asking for an ulcer.

In spite of the number of times I’ve used the word “complexity” in this post, my intent is not to discourage the use of the technique. Where appropriate, it is an extremely powerful tool that can allow you to meet the needs of the customer without requiring them to pick between function and performance. They tend to like the “have your cake and eat it too” scenario.