Canonical Data Models, ESBs, and a Reuse Trap

The original uphill battle

In “Reduce, Reuse, Recycle”, I discussed how reuse introduces additional complexity and cost as the scope expands (application to family of applications to extra-organization) and in “Coping with change using the Canonical Data Model”, I illustrated how that technique could be used to manage API changes without breaking existing clients. Last week on LinkedIn, an interesting question was posted that combined both reuse and the Canonical Data Model pattern – how do you create a service-oriented architecture around a canonical data model when the teams for the component applications are unaware of the benefits of reuse?

Initially, the benefit seems to be self-evident. If you have ten applications publishing messages to each other, this can involve up to 90 mappings per service method [n(n – 1)], 180 if the services are request-response. If, however, all ten applications use the same set of messages, you have one format per call (two for request-response). Things get sticky when you realize that you must now coordinate ten development teams to code an interface to your service bus (we’re not even considering the case of a third-party product). Assuming you manage to achieve that feat, things get stickier still when the inevitable change is needed to the canonical format. Do you violate the one true message layout rule or do you try to get ten application teams to deploy an update simultaneously?

Using the canonical data model pattern inside the service bus (the incoming native message is transformed to the canonical format on the way in and then to the appropriate outgoing native format on the way out) allows you to vary the external formats while reducing the number of mappings to a maximum 20 [2n] (40 if using request-response). Any logic inside the bus (e.g. orchestrations) retains the same benefit of operating on the canonical format without the constraint of having all clients constantly synchronized. Obviously, nothing prevents multiple applications from using the same format, so this method can reap the benefits of reuse without constraining change.

A successful service-oriented architecture will need to balance two potentially contradictory forces: stable service contracts for existing consumers and the ability to quickly accommodate new consumers. Using a canonical data model internally while allowing for a variety of external formats can allow you to meet both objectives with minimal complexity. Attempting to enforce a canonical model externally pushes changes onto all consumer applications regardless of need and will slow the pace of change.

Coping with change using the Canonical Data Model

According to the Greek philosopher Heraclitus of Ephesus, “Nothing endures but change”. It’s not just a common theme, but also a fundamental principle of architecture. Any design that does not allow for change is doomed from the start. By the same token, when exposing services to other applications, particularly applications that cross organizational boundaries, the signature of those services (methods, messages and data) become a contract that should not be broken. The Canonical Data Model provides a way to resolve this paradox.

When using a message bus, a common pattern is to use a message translator at the endpoints to transform messages to and from the canonical format. This allows the messaging system to maintain versioned services as receive locations and communicate with connected applications in their native format, while avoiding the n-squared problem. Instead of requiring up to 12 translations for a four endpoint integration, the maximum number needed would be 8. As the numbers grow, the savings quickly become more significant (for 6 endpoints, 12 instead of 30; for 8 endpoints, 16 instead of 56). Internal operations (orchestration, etc.) are simplified because only one format is dealt with.

The same pattern can be applied to service-enabled applications. As I noted in a previous post, messages and data entities will change from release to release according to the needs of the system (new features as well as changes and fixes to existing functionality). As long as all consumers of these classes are built and deployed simultaneously, all is good. Once that no longer applies, such as when another application is calling your service, then a versioning scheme becomes necessary.

While a full treatment of versioning is beyond the scope of this post, my preference is for a strict versioning scheme:

Strategy #1: The Strict Strategy (New Change, New Contract)

The simplest approach to Web service contract versioning is to require that a new version of a contract be issued whenever any kind of change is made to any part of the contract.

This is commonly implemented by changing the target namespace value of a WSDL definition (and possibly the XML Schema definition) every time a compatible or incompatible change is made to the WSDL, XML Schema, or WS-Policy content related to the contract. Namespaces are used for version identification instead of a version attribute because changing the namespace value automatically forces a change in all consumer programs that need to access the new version of the schema that defines the message types.

This “super-strict” approach is not really that practical, but it is the safest and sometimes warranted when there are legal implications to Web service contract modifications, such as when contracts are published for certain inter-organization data exchanges. Because both compatible and incompatible changes will result in a new contract version, this approach supports neither backwards or forwards compatibility.

Pros and Cons

The benefit of this strategy is that you have full control over the evolution of the service contract, and because backwards and forwards compatibility are intentionally disregarded, you do not need to concern yourself with the impact of any change in particular (because all changes effectively break the contract).

On the downside, by forcing a new namespace upon the contract with each change, you are guaranteeing that all existing service consumers will no longer be compatible with any new version of the contract. Consumers will only be able to continue communicating with the Web service while the old contract remains available alongside the new version or until the consumers themselves are updated to conform to the new contract.

Therefore, this approach will increase the governance burden of individual services and will require careful transitioning strategies. Having two or more versions of the same service co-exist at the same time can become a common requirement for which the supporting service inventory infrastructure needs to be prepared.

In short, my reasons for preferring the strict model is summed up by the words “safety” and “control” above. Once you have exposed a service, your ability to control its evolution becomes severely limited. Changes to the signature, if meaningful, either break older clients or introduce risk of semantic confusion. The only way around this is to have synchronized releases of both service and consumers. This is a painful process when the external consumer is developed in-house. It is doubly so when that consumer is developed by an entirely different organization. Using a strict approach decouples the service from the client. New functionality is added via new endpoints and clients, shielded from the change until ready, upgrade on their own schedule (within reason).

Using the Canonical Data Model, strictly versioned services can be set up as facades over the business layer for use by external applications (whether in-house or third party). Incoming requests are translated to the canonical (internal-only) format and responses are translated from the canonical format to that required by the endpoint. Internal-only services (such as those to support a Smart Client) can use the canonical format directly.

This architecture allows for preprocessing or post processing for external calls if needed. Individual versions of services, messages and data can be exposed, tracked, and ultimately retired in a controlled manner. The best part is that it provides these benefits while still supporting a unified business layer. This combination of both flexibility and uniformity makes for a robust design.