Legacy Systems – Constraints, Conflicts, and Customers

Crade to the grave (in 6.2 seconds)

As I was reading Roger Sessions’ latest white paper, “The Thirteen Laws of Highly Complex IT Systems”, Laws 1 and 2 immediately caught my eye:

Law 1. There are three categories of complexity: business, architectural and implementation.

Law 2. The three categories of complexity are largely independent of each other.

That complexity in these categories can vary independently (e.g. complex business processes can be designed and implemented simply just as simple processes can be designed and implemented in an extremely complex manner) is important to the understanding of complexity in IT. Likewise, it serves to remind that the function of the architecture and implementation of a system can vary independently from the underlying business process(es) it was intended to enable. That variance is an insidious form of technical debt, whether it occurs over time or was a foundational aspect of the system. In either case (though perhaps more so in the latter), customer satisfaction is going to be negatively affected.

Poor customer service, particularly in the form of ignoring (or being perceived as ignoring) the needs of the business, is a prime trigger for rogue IT implementations. The uncoordinated nature of these implementations leads to an overly complex “accidental architecture”. These accidental architectures pose problems not only in that they tend to be fragmented and more expensive than a well-designed solution, but also in that their existence constrains future architectures. Structure follows strategy when building anew, but then strategy will find itself constrained by structure.

The antithesis of this IT dystopia is the “fluid enterprise”, described by Brenda Michelson as one where “…assets in our portfolios are no longer sole-purposed applications or databases; they are also potential multiuse components and triggers to be exploited in the new architecture”. In order to evolve applications that come together as an enterprise platform, it is necessary to start from a base of applications that meet the needs of their users. While rationalizing a collection of shadow IT components is likely to be a long and expensive task, that does not mean that gluing together a bunch of inadequate (albeit “official”) systems will be a better solution.

No, Uncle Bob, No – the Obligatory healthcare.gov Post

Good for what ails you?

I tried to avoid this one. First of all, I don’t do politics on this site and this topic has way too much political baggage. Second, a great many people have already written about it, so I didn’t think I really had anything to add.

Then, Uncle Bob Martin chimed in.

I agree with some of what he has to say. I have no doubt that this particular debacle has harmed the image of software development in the eyes of the general public. Then he falls over the edge, comparing the launch of healthcare.gov with the Challenger disaster. After all, in both cases, political considerations overrode technical concerns. Regardless of this, Bob puts the blame on those far down the ladder:

Perhaps you disagree. Perhaps you think this was a failure of government, or of management. Of course I agree. Government failed and management failed. But government and management don’t know how to build software. We do. We were hired because of that knowledge. And we are expected to use that knowledge to communicate to the managers and administrators who don’t have it.

The thing is, the Centers for Medicare and Medicaid Services (CMS) is both a government agency and the system integrator on the healthcare.gov project. While there’s plenty of evidence of really poor code across the various parts, the integration of those parts is where the project fell down. Had the various contractors hired numerous Bob Martin clones and obtained the cleanest of clean code, the result would have still been the same.

Those with the technical knowledge and experience are, without a doubt, obligated to provide their best advice to the managers and administrators. When those managers and administrators ignore that advice, it is incorrect to allege that the fault lies elsewhere.

The end of the post, however, is the worst:

So, if I were in government right now, I’d be thinking about laws to regulate the Software Industry. I’d be thinking about what languages and processes we should force them to use, what auditing should be done, what schooling is necessary, etc. etc. I’d be thinking about passing laws to get this unruly and chaotic industry under some kind of control.

If I were the President right now, I might even be thinking about creating a new Czar or Cabinet position: The Secretary of Software Quality. Someone who could regulate this misbehaving industry upon which so much of our future depends.

Considering that all indications are that the laws and regulations around government purchasing and contracting contributed to this mess, I’m not sure how additional regulation is supposed to fix it. Likewise, it’s a little boneheaded to suggest that those responsible for this debacle (by attempting to manage what they should have known they were unqualified to manage) should now regulate the entire software development industry. For a fact, the very diversity of the industry should make it obvious that a one-size-fits-all mandate would make matters irretrievably worse.

Handing out aspirin to treat Ebola is just bad medicine.

Technical Debt – What it is and what to do about it

This is gonna cost you

In spite of all that’s been written on the subject of technical debt, it’s still a common occurrence to see it defined as simply “bad code”. Likewise, it’s still common to see the solution offered being “stop writing bad code”. Technical debt encompasses much more than that simplistic definition, so while “stop writing bad code” is good advice, it’s wholly insufficient to deal with the subject.

Steve McConnell’s definition is much more comprehensive (and, in my opinion, closer to the mark):

A design or construction approach that’s expedient in the short term but that creates a technical context in which the same work will cost more to do later than it would cost to do now (including increased cost over time)

While it’s a better definition, I’d differ with it in three ways. Technical debt may not only incur costs due to rework of the original item, but also by making more difficult changes that are dependent on the original item. Technical debt may also end up costing nothing extra over time (due to a risk not materializing or because the feature associated with the debt is eliminated). Lastly, it should be noted that the cost of technical debt can extend beyond just effort by also affecting customer satisfaction.

In short, I define technical debt as any technical deficit that involves a risk of greater cost and/or end user dissatisfaction.

This definition encompasses debts that are taken on deliberately and rationally, those that are taken on impulsively, and those that are taken on unconsciously.

Code that is brittle, redundant, unnecessary, unclear, insecure, and/or untested is, of course, a type of technical debt. Although Bob Martin argues otherwise, the risk of costs to be paid clearly makes it so. Likewise, aspects of design can be considered technical debt, whether in the form of poor decisions, intentional shortcuts, decisions deferred too long, or architectural “drift” (losing design coherence via new features being added using new technologies/techniques without bringing older components up to date, or failing to evolve the system as the needs of the business change). Deferring bug fixes is a form of technical debt as is deferring automation of recurring support tasks. Dependencies can be a source of technical debt, both in regard to debt they carry and in terms of their fitness to your purpose. The platform that hosts your application is yet another potential source of technical debt if not maintained.

As noted above, the “interest” on technical debt can manifest as the need for rework and/or more effort in implementing changes over time. This increase in effort can come through the proliferation of code to compensate for the effects of unresolved debt or even just through increased time to comprehend the existing code base prior to attempting a change. As Ruth Malan has noted, strategy may drive architecture, but once the initial architecture is in place, it serves to both enable and constrain the strategy of the system going forward (strategies requiring major architectural changes typically must offer extremely high ROI to get approval). Time spent on manual maintenance tasks (e.g. running scripts to add new reference values) can also be a form of interest, considering that time spent there is time that could be spent on other tasks.

Costs associated with technical debt are not always a gradual payback over time as with an ordinary loan. Some can be like a debt to the mob: “they come at night, pointing a gun to your head, and they want their money NOW”. Security issues are a prime example of this type of debt. Obviously, debts that carry the danger of coming due with little or no notice should be considered too risky to take on.

Having proposed a definition for the term “technical debt” and identified the risks that it entails, it remains to discuss what to do about it. The first step is to recognize it when it’s incurred (or as soon as possible thereafter). For debt taken on deliberately, recognition should be trivial going forward. Recognition of existing debt in an established system may require discovery if it has not been cataloged previously. Debt that has been taken on unconsciously will always require effort to discover. In all cases, the goal is to maintain a technical debt backlog that is as comprehensive as possible. Maintaining this backlog provides insight into both the current state of the system and can inform risk assessments for future decisions.

Becoming aware of existing debt is a critical first step, but is insufficient in itself. Taking steps to actively manage the system’s debt portfolio is essential. The first step should be to stop unconsciously taking on new debt. Latent debt tends to fit into the immediate, unexpected payback model mentioned above. Likewise, steps taken to improve the quality up front (unit testing, code review, static analysis, process changes, etc.) should reduce the effort needed for detection and remediation on the back end. Architectural and design practices should also be examined. Too little design can be as counter-productive as too much. Striking the right balance can yield savings over the life of the application.

Deciding whether or not to take on intentional technical debt is less black and white. Often this type of debt is taken on for rational reasons. An example of this is what Ruth Malan characterizes as “…trading technical learning and code design improvement for market learning (getting features into user hands to improve them)”. Other times, the balance between risk and reward (whether time to market or budget) may tilt in favor of taking on a debt. When this is the case, it is critical that the owner(s) of the system make the decision in possession of the best possible information you can provide. An impulsive decision taken on the basis of “feel” rather than information will likely carry more risk.

Retiring old debt should be the final link in the chain. Just as the taking on of new debt should be done in a rational manner, so should the retirement of old debt. Not all debt carries the same risk/reward ratio and efforts that carry more bang for the buck will be an easier sell. Although some may disagree, I firmly believe that better outcomes will result from making those who own the system active partners in its development and evolution.

It’s highly unlikely that a system will be free of technical debt. Perversely, being free of such debt could actually be a liability. That being said, there is a world of difference between the two poles of debt-free and technical anarchy. Effort spent to rationally manage a system’s debt load will free up time to be put to better use.