Building a Legacy

Greek Trireme image from Deutsches Museum, Munich, Germany

 

Over the last few weeks, I’ve run across a flurry of articles dealing with the issue of legacy systems used by the U.S. government.

An Associated Press story on the findings from the Government Accountability Office (GAO) issued in May reported that roughly three-fourths of the $80 billion IT budget was used to maintain legacy systems, some more than fifty years old and without an end of life date in sight. An article on CIO.com about the same GAO report detailed seven of the oldest systems. Two were over 56 years old, two 53, one 51, one 35, and one 31. Four of the seven have plans to be replaced, but the two oldest have no replacement yet planned.

Cost was not the only issue, reliability is a problem as well. An article on Timeline.com noted:

Then there’s the fact that, up until 2010, the Secret Service’s computer systems were only operational about 60% of the time, thanks to a highly outdated 1980s mainframe. When Senator Joe Lieberman spoke out on the issue back in 2010, he claimed that, in comparison, “industry and government standards are around 98 percent generally.” It’s alright though, protecting the president and vice president is a job that’s really only important about 60 percent of the time, right?

It would be easy to write this off as just another example of public-sector inefficiency, but you can find these same issues in the private sector as well. Inertia can, and does, affect systems belonging to government agencies and business alike. Even a perfectly designed implemented system (we’ve all got those, right?) is subject to platform rot if ignored. Ironically, our organizations seem designed to do just that by being project-centric.

In philosophy, there’s a paradox called the Ship of Theseus, that explores the question of identity. The question arises, if we maintain something by replacing its constituent parts, does it remain the same thing? While many hours could be spent debating this, to those whose opinion should matter most, those who use the system, the answer is yes. To them, the identity of the system is bound up in what they do with it, such that it ceases to be the same thing, not when we maintain it but when its function is degraded through neglect.

Common practice, however, separates ownership and interest. Those with the greatest interest in the system typically will not own the budget for work on it. Those owning the budget, will typically be biased towards projects which add value, not maintenance work that represents cost.

Speaking of cost, is 75% of the budget an unreasonable amount for maintenance? How well are the systems meeting the needs of their users? Is quality increasing, decreasing, or holding steady? Was more money spent because of deferred maintenance than would have been spent with earlier intervention? How much business risk is involved? Without this context, it’s extremely difficult to say. It’s understandable that someone outside an organization might lack this information, but even within it, would a centralized IT group have access to it all? Is the context as meaningful at a higher, central level as it is “at the pointy end of the spear”?

Maintaining systems bit by bit, replacing them gradually over time, is likely to be more successful and less expensive, than letting them rot and then having a big-bang re-write. In my opinion, having an effective architecture for the enterprise’s IT systems is dependent on having an effective architecture for the enterprise itself. If the various systems (social and software) are not operating in conjunction, drift and inertia will take care of building your legacy (system).

[Greek Trireme image from Deutsches Museum, Munich, Germany via Wikimedia Commons]

Dealing with Technical Debt Like We Mean it

What’s the biggest problem with technical debt?

In my opinion, the biggest problem is that it works. Just like the electrical outlet pictured above, systems with technical debt get the job done, even when there’s a hidden surprise or two waiting to make life interesting for us at some later date. If it flat-out failed, getting it fixed would be far easier. Making the argument to spend time (money) changing something that “works” can be difficult.

Failing to make the argument, however, is not the answer:

Brenda Michelson‘s observation is half the battle. The argument for paying down technical debt needs to be made in business-relevant terms (cost, risk, customer impact, etc.). We need more focus on the “debt” part and remember “technical” is just a qualifier:

The other half of the battle is communicating, in the same business-relevant manner, the costs and/or risks involved when taking on technical debt is considered:

Tracking what technical debt exists and managing the payoff (or write off, removing failed experiments is a reduction technique) is important. Likewise, managing the assumption of technical debt is critical to avoid being swamped by it.

Of course, one could take the approach that the only acceptable level of technical debt is zero. This is equivalent to saying “if we can’t have a perfect product, we won’t have a product”. That might be a difficult position to sell to those writing the checks.

Even if you could get an agreement for that position, reality will conspire to frustrate you. Entropy emerges. Even if the code is perfected and then left unchanged, the system can rot as its platform ages and the needs of the business change. When a system is actively maintained over time without an eye to maintaining a coherent, intentional architecture, then the situation becomes worse. In his post “Enterprise Modernization – The Next Big Thing!”, David Sprott noted:

The problem with modernization is that it is widely perceived as slow, very expensive and high risk because the core business legacy systems are hugely complex as a result of decades of tactical change projects that inevitably compromise any original architecture. But modernization activity must not be limited to the old, core systems; I observe all enterprises old and new, traditional and internet based delivering what I call “instant legacy” [Note 1] generally as outcomes of Agile projects that prioritize speed of delivery over compliance with a well-defined reference architecture that enables ongoing agility and continuous modernization.

Kellan Elliot-McCrea, in “Towards an understanding of technical debt”, captured the problem:

All code is technical debt. All code is, to varying degrees, an incorrect bet on what the future will look like.

This means that assessing and managing technical debt should be an ongoing activity with a responsible owner rather than a one-off event that “somebody” will take care of. The alternative is a bit like using a credit card at every opportunity and ignoring the statements until the repo-man is at the door.

Technical Debt – Why not just do the work better?

Soap or oil press ruins in Panayouda, Lesvos

A comment from Tom Cagley was the catalyst for my post “Design Communicates the Solution to a Problem”, and he’s done it again. I recently re-posted “Technical Debt – What it is and what to do about it” to the Iasa Global blog, and Tom was kind enough to tweet a link to it. In doing so, he added a little (friendly) barb: “Why not just do the work better?” My reply was “It’s certainly the best strategy, assuming you can“.

Obviously, we have a great deal of control over intentional coding shortcuts. It’s not absolute control; business considerations can intrude, despite our best advice to the contrary. There are a host of other factors that we have less control over: aging platforms, issues in dependencies, changing runtime circumstances (changes in load, usage patterns, etc.), and increased complexity due to organic growth are all factors that can change a “good” design into a “poor” one. As Tom has noted, some have extended the metaphor to refer to these as “Quality Debt, Design Debt, Configuration Management Debt, User Experience Debt, Architectural Debt”, etc., but in essence, they’re all technical deficits affecting user satisfaction.

Unlike coherent architectural design, these other forms of technical debt emerge quite easily. They can emerge singly or in concert. In many cases, they can emerge without your doing anything “wrong”.

Compromise is often a source of technical debt, both in terms of the traditional variety (code shortcuts) and in terms of those I listed above. While it’s easy to say “no compromises”, reality is far different. While I’m no fan of YAGNI, at least the knee-jerk kind, over-engineering is not the answer either. Not every application needs to be built for hundreds of thousands of concurrent users. This is compromise. Having insufficient documentation is a form of technical debt that can come back to haunt you if personnel changes result in loss of knowledge. How many have made a compromise in this respect? While it is a fundamental form of documentation, code is insufficient by itself.

A compromise that I’ve had to make twice in my career is the use of the shared database EAI pattern when an integration had to be in place and the target application did not have a better mechanism in place. While I am absolutely in agreement that this method is sub-standard, the business need was too great (i.e. revenue was at stake). The risk was identified and the product owner was able to make an informed decision with the understanding that the integration would be revised as soon as possible. Under the circumstances, both compromises were the right decision.

In my opinion, taking the widest possible view of technical debt is critical. Identification and management of all forms of technical debt is a key component of sustainable evolution of a product over its lifetime. Maintaining a focus on the product as a whole, rather than projects, provides a long-term focus should help make better compromises – those that are tailored to getting the optimal value out of the product. Having tunnel vision on just the code leaves too much room for surprises.

[Soap or oil press ruins in Panayouda, Lesvos Image by Fallacia83 via Wikimedia Commons.]

“Avoiding Platform Rot” on Iasa Global Blog

just never had the time to keep up with the maintenance

Is your OS the latest version? How about your web server? Database server? If not now, when?

A “no” answer to the first three questions is likely not that big a deal. There can be advantages to staying off the bleeding edge. That being said, the last question is the key one. If the answer to that is “I haven’t thought about it”, then there’s potential for problems.

See the full post on the Iasa Global Blog (a re-post, originally written for the old Iasa blog).

Technical Debt – What it is and what to do about it

This is gonna cost you

In spite of all that’s been written on the subject of technical debt, it’s still a common occurrence to see it defined as simply “bad code”. Likewise, it’s still common to see the solution offered being “stop writing bad code”. Technical debt encompasses much more than that simplistic definition, so while “stop writing bad code” is good advice, it’s wholly insufficient to deal with the subject.

Steve McConnell’s definition is much more comprehensive (and, in my opinion, closer to the mark):

A design or construction approach that’s expedient in the short term but that creates a technical context in which the same work will cost more to do later than it would cost to do now (including increased cost over time)

While it’s a better definition, I’d differ with it in three ways. Technical debt may not only incur costs due to rework of the original item, but also by making more difficult changes that are dependent on the original item. Technical debt may also end up costing nothing extra over time (due to a risk not materializing or because the feature associated with the debt is eliminated). Lastly, it should be noted that the cost of technical debt can extend beyond just effort by also affecting customer satisfaction.

In short, I define technical debt as any technical deficit that involves a risk of greater cost and/or end user dissatisfaction.

This definition encompasses debts that are taken on deliberately and rationally, those that are taken on impulsively, and those that are taken on unconsciously.

Code that is brittle, redundant, unnecessary, unclear, insecure, and/or untested is, of course, a type of technical debt. Although Bob Martin argues otherwise, the risk of costs to be paid clearly makes it so. Likewise, aspects of design can be considered technical debt, whether in the form of poor decisions, intentional shortcuts, decisions deferred too long, or architectural “drift” (losing design coherence via new features being added using new technologies/techniques without bringing older components up to date, or failing to evolve the system as the needs of the business change). Deferring bug fixes is a form of technical debt as is deferring automation of recurring support tasks. Dependencies can be a source of technical debt, both in regard to debt they carry and in terms of their fitness to your purpose. The platform that hosts your application is yet another potential source of technical debt if not maintained.

As noted above, the “interest” on technical debt can manifest as the need for rework and/or more effort in implementing changes over time. This increase in effort can come through the proliferation of code to compensate for the effects of unresolved debt or even just through increased time to comprehend the existing code base prior to attempting a change. As Ruth Malan has noted, strategy may drive architecture, but once the initial architecture is in place, it serves to both enable and constrain the strategy of the system going forward (strategies requiring major architectural changes typically must offer extremely high ROI to get approval). Time spent on manual maintenance tasks (e.g. running scripts to add new reference values) can also be a form of interest, considering that time spent there is time that could be spent on other tasks.

Costs associated with technical debt are not always a gradual payback over time as with an ordinary loan. Some can be like a debt to the mob: “they come at night, pointing a gun to your head, and they want their money NOW”. Security issues are a prime example of this type of debt. Obviously, debts that carry the danger of coming due with little or no notice should be considered too risky to take on.

Having proposed a definition for the term “technical debt” and identified the risks that it entails, it remains to discuss what to do about it. The first step is to recognize it when it’s incurred (or as soon as possible thereafter). For debt taken on deliberately, recognition should be trivial going forward. Recognition of existing debt in an established system may require discovery if it has not been cataloged previously. Debt that has been taken on unconsciously will always require effort to discover. In all cases, the goal is to maintain a technical debt backlog that is as comprehensive as possible. Maintaining this backlog provides insight into both the current state of the system and can inform risk assessments for future decisions.

Becoming aware of existing debt is a critical first step, but is insufficient in itself. Taking steps to actively manage the system’s debt portfolio is essential. The first step should be to stop unconsciously taking on new debt. Latent debt tends to fit into the immediate, unexpected payback model mentioned above. Likewise, steps taken to improve the quality up front (unit testing, code review, static analysis, process changes, etc.) should reduce the effort needed for detection and remediation on the back end. Architectural and design practices should also be examined. Too little design can be as counter-productive as too much. Striking the right balance can yield savings over the life of the application.

Deciding whether or not to take on intentional technical debt is less black and white. Often this type of debt is taken on for rational reasons. An example of this is what Ruth Malan characterizes as “…trading technical learning and code design improvement for market learning (getting features into user hands to improve them)”. Other times, the balance between risk and reward (whether time to market or budget) may tilt in favor of taking on a debt. When this is the case, it is critical that the owner(s) of the system make the decision in possession of the best possible information you can provide. An impulsive decision taken on the basis of “feel” rather than information will likely carry more risk.

Retiring old debt should be the final link in the chain. Just as the taking on of new debt should be done in a rational manner, so should the retirement of old debt. Not all debt carries the same risk/reward ratio and efforts that carry more bang for the buck will be an easier sell. Although some may disagree, I firmly believe that better outcomes will result from making those who own the system active partners in its development and evolution.

It’s highly unlikely that a system will be free of technical debt. Perversely, being free of such debt could actually be a liability. That being said, there is a world of difference between the two poles of debt-free and technical anarchy. Effort spent to rationally manage a system’s debt load will free up time to be put to better use.

Avoiding Platform Rot

(Mirrored from the Iasa Blog)

just never had the time to keep up with the maintenance

Is your OS the latest version? How about your web server? Database server? If not now, when?

A “no” answer to the first three questions is likely not that big a deal. There can be advantages to staying off the bleeding edge. That being said, the last question is the key one. If the answer to that is “I haven’t thought about it”, then there’s potential for problems.

“Technical Debt” is a currently a hot topic. Although the term normally brings to mind hacks and quick fixes, more subtle issues can be technical debt as well. A slide from a recent Michael Feathers presentation (slide 5) is particularly applicable to this:

Technical Debt is: the Effect of
Unavoidable Growth and the Remnants
of Inertia

New features tend to be the priority for systems, particularly early in their lifecycle. The plumbing (that which no one cares about until it quits working), tends to be neglected. Plumbing that is outside the responsibility of the development team (such as operating systems and database management systems) is likely to get the least attention. This can lead to systems running on platforms that are past their end of support date or a scramble to verify that the system can run on a later version. The former carries significant security risks while the latter is hardly conducive to adequately testing that the system will function identically on the updated platform. Additionally, new capabilities as well as operational and performance improvements may be missed out on if no one is paying attention to the platform.

One method to help avoid these types of issues is adoption of a DevOps philosophy, such as Amazon’s:

Amazon applies the motto “You build it, you run it”. This means the team that develops the product is responsible for maintaining it in production for its entire life-cycle. All products are services managed by the teams that built them. The team is dedicated to each product throughout its lifecycle and the organization is built around product management instead of project management.

This blending of responsibilities within a single team and focus on the application as a product (something I consider extremely beneficial) lessens the chance that housekeeping tasks fall between the cracks by removing the cracks. The operations aspect is enhanced by ensuring that its concerns are visible to those developing and the development aspect is enhanced by increased visibility into new capabilities of the platform components. The Solutions Architect role, spanning application, infrastructure, and business, is well placed to lead this effort.

Products, not Projects

I’m sure the person who asked Benjamin Mitchell that question truly believed that “faster, better, cheaper” was valuable. I’d imagine that person values his or her work and considers anything that enhances that work to be beneficial. Unfortunately, that’s not always the case. If I create something that fails to meet your needs, my ability to do so “faster, better, cheaper” is supremely irrelevant. From the customer’s viewpoint, value will come from the capabilities that your efforts enable, not from the work itself. The product is the value, not the project.

It’s no secret that I’m skeptical of emergent architecture. Solving various small problems in isolation is not the same as providing a coherent solution to the overall business issue. In fact, focusing exclusively on lower level details may impede our ability to solve the larger issue:

Intuition does not adapt quickly to new situations and resists changes in scale. A retail cashier who is exceptional at serving customers and processing purchases finds new, perhaps overwhelming, challenges in managing the store. The cashiers natural intuitive ability to diligently manage a single customers purchases is overwhelmed when presented with many customers and many purchases. The culprit is the complexity of detail. Often referred to as an inability to “see the bigger picture”.

Gary W. Kenward , “The Systems Perspective”

Just as a system is more than a collection of components, the product is more than a series of projects. Tom Graves, in the slides for his BCS-EA 2012 presentation noted (slide #8):

Products always imply a service…

  • Whom do you serve, and how?
  • How will you know you’ve served?
  • How will you know you’ve served well?
  • Who decides?

The answer to the first and fourth questions, are obviously “the customer”. Questions two and three, however, can be trickier. Feedback is required to answer those. The advantage of frequent incremental delivery lies in the fact that feedback can be gained, both by the development team and the customer/product owner:

We all know the world is not flat, and Scrum is not a linear process. But some Scrum teams use feedback only to improve the process, and forget to apply it to the product. User stories are consequently turned into software without learning from the stakeholders and adapting the backlog. This wastes a massive opportunity: to ensure that the right product with the right features is created.

Roman Pichler, “The Scrum Cycle”

If process improvement is emphasized to the point of excluding product improvement, the feedback is wasted. Each release should be seen as an opportunity to reduce the divergence between the product delivered and the product desired. The smaller the divergence, the greater the value.

Focusing on the product aspect, which meshes well with DevOps practices, enhances application lifecycle management by tying budgets to business value:

Using a product management approach, the organization now has budgets for product teams instead of new development or maintenance. Budgets are no longer lumped together, but are instead dedicated to specific product lines. Each product team needs to make the case why their product needs funding based on the business value that it generates compared to total life-cycle cost. Using this approach, organization can now make business decisions as to which products to innovate on and which to cut or retire.

Fadi Stephan, “You Build It, You Run It”

This kind of holistic approach encourages customer participation in the process, setting up a virtuous circle of increasing value, increasing ownership and increasing satisfaction. Products with a high level of customer satisfaction (not to mention those that work on the teams developing those products) tend to do better over time than those without.