Building a Legacy

Greek Trireme image from Deutsches Museum, Munich, Germany

 

Over the last few weeks, I’ve run across a flurry of articles dealing with the issue of legacy systems used by the U.S. government.

An Associated Press story on the findings from the Government Accountability Office (GAO) issued in May reported that roughly three-fourths of the $80 billion IT budget was used to maintain legacy systems, some more than fifty years old and without an end of life date in sight. An article on CIO.com about the same GAO report detailed seven of the oldest systems. Two were over 56 years old, two 53, one 51, one 35, and one 31. Four of the seven have plans to be replaced, but the two oldest have no replacement yet planned.

Cost was not the only issue, reliability is a problem as well. An article on Timeline.com noted:

Then there’s the fact that, up until 2010, the Secret Service’s computer systems were only operational about 60% of the time, thanks to a highly outdated 1980s mainframe. When Senator Joe Lieberman spoke out on the issue back in 2010, he claimed that, in comparison, “industry and government standards are around 98 percent generally.” It’s alright though, protecting the president and vice president is a job that’s really only important about 60 percent of the time, right?

It would be easy to write this off as just another example of public-sector inefficiency, but you can find these same issues in the private sector as well. Inertia can, and does, affect systems belonging to government agencies and business alike. Even a perfectly designed implemented system (we’ve all got those, right?) is subject to platform rot if ignored. Ironically, our organizations seem designed to do just that by being project-centric.

In philosophy, there’s a paradox called the Ship of Theseus, that explores the question of identity. The question arises, if we maintain something by replacing its constituent parts, does it remain the same thing? While many hours could be spent debating this, to those whose opinion should matter most, those who use the system, the answer is yes. To them, the identity of the system is bound up in what they do with it, such that it ceases to be the same thing, not when we maintain it but when its function is degraded through neglect.

Common practice, however, separates ownership and interest. Those with the greatest interest in the system typically will not own the budget for work on it. Those owning the budget, will typically be biased towards projects which add value, not maintenance work that represents cost.

Speaking of cost, is 75% of the budget an unreasonable amount for maintenance? How well are the systems meeting the needs of their users? Is quality increasing, decreasing, or holding steady? Was more money spent because of deferred maintenance than would have been spent with earlier intervention? How much business risk is involved? Without this context, it’s extremely difficult to say. It’s understandable that someone outside an organization might lack this information, but even within it, would a centralized IT group have access to it all? Is the context as meaningful at a higher, central level as it is “at the pointy end of the spear”?

Maintaining systems bit by bit, replacing them gradually over time, is likely to be more successful and less expensive, than letting them rot and then having a big-bang re-write. In my opinion, having an effective architecture for the enterprise’s IT systems is dependent on having an effective architecture for the enterprise itself. If the various systems (social and software) are not operating in conjunction, drift and inertia will take care of building your legacy (system).

[Greek Trireme image from Deutsches Museum, Munich, Germany via Wikimedia Commons]

What’s Innovation Worth?

Animated GIF of Sherman Tank Variants

What does an old World War II tank have to do with innovation?

I’ve mentioned it before, but it bears repeating – one of benefits of having a blog is the ability to interact with and learn from people all over the world. For example, Greger Wikstrand and I have been trading blog posts on innovation for six months now. His latest post, “Switcher’s curse and legacy decisions”,is the 18th installment in the series. In this post, Greger discusses switcher’s curse, “a trap in which a decision maker systematically switches too often”.

Just as the sunk cost fallacy can keep you holding on to a legacy system long past its expiration date, switcher’s curse can cause you to waste money on too-frequent changes. As Greger points out in his post, the net benefit of the new system must outweigh both the net benefit of the old, plus the cost of switching (with a significant safety margin to account for estimation errors in assessing the costs and benefits). Newer isn’t automatically better.

“Disruption” is a two-edged sword when it comes to innovation. As Greger notes regarding legacy systems:

Existing software is much more than a series of decisions to keep it. It embodies a huge number of decisions on how the business of the company should work. The software is full of decisions about business objects and what should be done with them. These decisions, embodied in the software, forms the operating system of the company. The decision to switch is bigger than replacing some immaterial asset with another. It is a decision about replacing a proven way of working with a new way of working.

Disruption involves risk. Change involves cost; disruptive change involves higher costs. In “Innovate or Execute?”, Earl Beede asked:

So, do our employers really want us taking the processes they have paid dearly to implement and products they have scheduled out for the next 15 quarters and, individually, do something disruptive? Every team member taking a risk to see what they can learn and then build on?

Wouldn’t that be chaos?

Beede’s answer to the dilemma:

Now, please don’t think I am completely cynical. I do think that the board of directors and maybe even the C-level officers want to have innovative companies. I really believe that there needs to be parts of a company whose primary mission is to make the rest of the company obsolete. But those disruptive parts need to be small, isolated groups, kept out of the day-to-day delivery of the existing products or services.

What employers should be asking for is for most of the company to be focused on executing the existing plans and for some of the company to be trying to put the executing majority into a whole new space.

This meshes well with Greger’s recommendations:

Conservatism is often the best approach. But it needs to be a prudent conservatism. Making changes smaller and more easily reversible decreases the need for caution. We should consider a prudent application of fail fast mentality in our decision-making process. (But I prefer to call it learn fast.)

Informed decision-making (i.e. making decisions that make sense in light of your context) is critical. The alternative is to rely on blind luck. Being informed requires learning, and as Greger noted, fast turn-around on that learning is to be preferred. Likewise, limiting risk during learning is to be preferred as well. Casimir Artmann, in his post “Fail is not an option”, discussed this concept in relation to hiking in the wilderness. Assessing and controlling risk in that environment can be a matter of life in death. In a business context, it’s the same (even if the “death” is figurative, it’s not much comfort considering the lives impacted). Learning is only useful if you survive to put it to use.

Lastly, it must be understood that decision-making is not a one-time activity. Context is not static, neither should your decision-making process be. An iterative cycle of sense-making and decision-making is required to maintain the balance between innovation churn and stagnation.

So, why the tank?

The M-4 Sherman, in addition to being the workhorse of the U.S. Army’s armor forces in World War II, is also an excellent illustration of avoiding the switcher’s curse. When it was introduced, it was a match for existing German armored vehicles. Shortly afterwards, however, it was outclassed as newer, heavier, better armed German models came online. The U.S. stuck with their existing design, and were able to produce almost three times the number of tanks as Germany (not counting German tanks inferior to the Sherman). As the saying goes, “quantity has a quality all its own”, particularly when paired with other weapon systems in a way that did not disrupt production. The German strategy of producing multiple models hampered their ability to produce in quantity, negating their qualitative advantage. In this instance, progressive enhancement and innovating on the edges was a winning strategy for the U.S.

The Seductive Myth of Greenfield Development

Greger Wikstrand‘s tweet from earlier this week packed a wealth of inspiration into one image:

The second statement particularly resonated with me: “The present is built on the past.”

How often do we, or those around us, long for a chance to do things “from scratch”. The idea being, without the constraints of “legacy” code, we could do things “right”. While it’s a nice idea, it has no basis in reality.

Rewrites, of course, will involve dealing with existing data. I’ve yet to encounter a system where no one was interested in the data when it was replaced. I’ve shut down a few where there was no interest, but that’s a different story. The need for that existing data will serve as a potent influence on what can or cannot be done with the replacement system. Likewise, its structure. It’s not reasonable to assume that the data will be any less “legacy” than the code.

We might be tempted to believe that brand new systems escape this pitfall. In doing so, we fail to consider that new systems still must deal with the wants, needs, and attitudes of their stakeholders. People, processes, and organization form the ecosystem that new systems must fit into as surely as replacement systems must.

A crucial part of problem solving is having an adequate understanding of the problem. Everything has a backstory. Understanding the backstory is dependent on understanding the ecosystem the thing fits into. This what Sullivan was talking about when he said “…form ever follows function”.

Nothing’s Ex Nihilo.

Technical Debt and Rolling Re-writes (Who Needs Architects?)

If you think building a system is challenging, try maintaining one.

Tom Cagley‘s recent post “Plan to Throw One Away Re-Read Saturday: The Mythical Man-Month, Part 11”, was a good reminder that while “technical debt” may be something currently on the radar for many, it’s far from a new phenomenon. The concept of instant legacy applications was in place when forty years ago when Frederick Brooks wrote his masterpiece, even if they weren’t called that. As Tom observed in the post:

Rarely is the first attempt useful to the end consumer, and the usefulness of that first attempt is less in the code than in the feedback it generates. Software development is no different. The initial conceptual design and anticipated technical architecture of a large project rarely stands up to the rigors of the discovery process, and those designs should be learned from and then thrown away.

The faulty assumptions and design flaws accumulate not only from sprint to sprint leading up to the initial release, but also from release to release. In spite of the fact that a product can be so seriously flawed, throwing it away and starting over is easier said than done. While sunk costs cannot be recovered, too sanguine an attitude towards them may not enhance your credibility with the customer. Having to pay for the same thing over and over can make them grumpy.

This sets up a dilemma, one that frequently leads to living with technical debt and attempting to incrementally patch it up. There are limits, however, to the number of band-aids that can be applied. This might make it tempting to propose a rewrite, but as Erik Dietrich stated in “The Myth of the Software Rewrite”:

Sure, they know things now that they didn’t know when they started on this code 3 years ago. But won’t the same thing be true in 3 years? Won’t the developers then be looking at the code and saying, “this is a mess — if only we knew in 2015 what we now know in 2018!” And, beyond that, what makes you think that giving the same group of people the same marching orders won’t result in the same kind of code?

The “big rewrite from scratch because this is a mess” is a losing strategy.

Fortunately, there is an alternative. Quoting Tom Cagley again from the same post as above:

If change is both inevitable and good (within limits), then both systems and organizations (a type of system) need to be engineered to support and facilitate change. Architecturally, techniques such as modularization, object-oriented design and other processes that foster simplification and incremental change create an environment in which change isn’t avoided, but rather encouraged.

While we may laugh at the image of changing a tire while the vehicle is in motion, it is an accurate metaphor. Customers expect flexibility and change on the go; waiting equals lost business. The keys to evolving in place are having an intentionally designed, modular architecture and an understanding of where the weaknesses lie. Both of these are concerns that reside squarely on the architect’s plate.

Modularity not only makes an application more easily maintainable via separation of concerns, but it also embraces change by making components replaceable. This is one of the qualities that has made microservices such a hot topic, although it would be a mistake to think that microservices are the only way (or best way in all cases) to achieve modularity.

Modularity brings benefits beyond the purely technical as well. Rewrites of a fraction of an application are more easily sold than big-bang efforts. Demonstrating forethought (while you can’t predict what the change will be, predicting the need for change is more of a sure thing) demonstrates concern for the customer’s welfare, which should make for a better relationship.

Being able to throw a system away a little at a time allows us to keep the car on the road while it changes and adapts to changing conditions.

Microservice Principles, Technical Debt, and Legacy Systems

Is there a circumstance where the answer to Architect Clippy‘s question is “yes”? In “Microservice Architectures aren’t for Everyone” I used this tweet to underscore the observation that a team that can’t produce a well-modularized monolith is unlikely to be helped by trying to distribute the problem. On the other hand, a team (or teams) tasked with rehabilitating a “Big Ball of Mud” might well find some value in the principles behind microservice architectures.

Some of the relevant principles are cohesion and replaceability. As Dan North noted in “Microservices: software that fits in your head”:

One way to manage the mess is to maximise the likelihood that everyone knows what’s going on in the codebase. This requires two things: consistency and replaceability. Consistency implies you can make reasonable assumptions about unfamiliar parts of the application. Replaceability means you can kill code easily and replace it with something better.

Without achieving separation of concerns, any architectural refactoring effort will be an exercise in chasing fires across the codebase. A divide and conquer strategy that applies the single responsibility principle at a macro level will be more likely to facilitate identification and remediation of lower-level technical debt. Monoliths can benefit from being carved up, not because small is inherently better, but because they reach a point where independence of their components becomes beneficial, even crucial. Components that share fewer dependencies (such as a shared data store) and have independent release cycles offer a great deal of flexibility in structuring an application and the team(s) that develop it.

In “Microservices allow for localized tech debt”, Jim Plush stated: “It’s much easier mentally to tackle $10,000 of debt across 4 credit cards at $2500 each than 1 card at the full $10,000.” Even more to the point, it’s much easier to tackle that debt when you split it with three other people (teams) each working independently.

Re-writes have a well-deserved bad reputation. Shared platforms and shared data stores will often mean that the transition from the legacy system to the re-written one will be a high-risk “big bang” affair. As Edmond Lau observed in “How to Avoid One of the Costliest Mistakes in Software Engineering”, you want to “…get as quickly as possible to a state where you’re again making incremental improvements”. Getting to this state may well happen quicker when the parts are separated.

Making and Taming Monoliths

Rock me, Amadeus

In Meditation XVII of Devotions upon Emergent Occasions, John Donne wrote the immortal line “No man is an island, entire of itself; every man is a piece of the continent, a part of the main.” If we skip ahead nearly 400 years and change our focus from mortality to information systems (lots of parallels there), we might re-word it like this: No system is an island, entire of itself; every system is a piece of the enterprise, a part of the main.

In “Carving it up – Microservices, Monoliths, & Conway’s Law”, I discussed how the new/old idea of microservice architectures related to the more monolithic classic style of application architecture. An important take-away was the applicability of Conway’s Law to application and solution architectures. Monoliths are monolithic for a reason; a great many business activities are dependent on other business activities performed by actors from other business units. In Ruth Malan’s words: “The organizational divides are going to drive the true seams in the system.” Rather than a miscellaneous collection of business capabilities, monoliths typically represent related capabilities stitched together in one application.

The problem with monoliths is that there is a tendency for capabilities/data to be duplicated across multiple systems. The dominant concern of one system (e.g. products in an inventory system) will typically be a subordinate concern in another (e.g. products in an order fulfillment system). This leads to different units using different systems based on their needs vis-a-vis a particular concern. Without adequate governance, this leads to potential conflicts over which system is authoritative regarding data about those concerns.

Consider the following diagram of a hypothetical family of systems consisting of an order management system and a system to track the outsourced activities of the subset of orders that are fulfilled by vendors. Both use miscellaneous data (such as the list of US states and counties) that may or may not be managed by a separate system. They also use and maintain duplicate from other systems. The order system does this with data from the pricing, customer management, and product management systems. It also exports data for manual import into the accounts receivable system. The outsourced fulfillment system does this with data from the product management and vendor management systems as well as the order system. It exports data for manual import into the accounts payable system. Because of the lack of integration, there is redundant data with no clear ability to declare any one set as the master one. Additionally, there is redundant code as the functions of maintaining this data are repeated (and likely repeated in an inconsistent manner) across multiple systems.

Monolithic Systems

There are different responses to the issue of uncontrolled data duplication, ranging from “hope for the best” (not recommended) to various database replication schemes. Jeppe Cramon’s “Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 4” outlines the advantages of publishing data as events from authoritative source systems for use by consumer systems. This provides near real-time updates and avoids the pitfall of trying to treat services like a database (hint, latency will kill you). Caching this data locally should not only improve performance but also allows for the consumer applications to manage attributes of those concerns that are unique to them with referential integrity (if desired).

In the diagram below, the architecture of this family of systems has been rationalized by the introduction of different types of services appropriate to the different use cases. The order system now makes use of the pricing system solely as a service (with its very narrow focus, the pricing system is definitely amenable to the microservice architectural style) and no longer contains data or code related to that concern and only contains enough code to trigger asynchronous messages to the accounts receivable system. Likewise, the outsourced fulfillment system contains only a thin shell for the accounts payable system. Other concerns (customers, vendors, products, etc.) are now managed by their respective owing systems and updates are published asynchronously as events. The mechanism for distributing these events (the cloud element in the diagram) is purposely not named as there are multiple ways to accomplish this (ESBs are one mechanism, but far from the only one). While the consuming systems cache the data locally, much of the code that previously was required to maintain them is no longer needed. In many cases, entire swathes of “administration” UI can be done away with. Where necessary, it can be pared down to only edit data related to those concerns that is both unique to and restricted to the consuming application.

Monolithic Systems

I will take the time to re-emphasize the point that not all communication need be, nor should it be, the same. Synchronous communication (shown as a solid line in the diagram) may be appropriate for some usages (particularly where an end user is waiting on an immediate success/fail type of response). An event-driven style (shown as a dotted line in the diagram) will be much more performant and scalable for many more situations. As I noted in “Coordinating Microservices – Playing Well with Others”, designing and coding distributed applications in the same manner as a monolith will result in a web of service dependencies, a distributed big-ball of mud. Coupling that is a code smell in a monolith can become a crippling issue in distributed systems.

Disassembling monoliths in this manner is complex. It can be done incrementally, removing the need for expensive “big bang” initiatives. There is, however, a need for strategy and governance at the right level of detail (micromanagement and too-granular design are still unlikely to yield the best results). Done well, you can wind up with both better data integrity and less code to maintain.

“Legacy Systems – Constraints, Conflicts, and Customers” on Iasa Global Blog

Crade to the grave (in 6.2 seconds)

As I was reading Roger Sessions’ latest white paper, “The Thirteen Laws of Highly Complex IT Systems”, Laws 1 and 2 immediately caught my eye:

Law 1. There are three categories of complexity: business, architectural and implementation.

Law 2. The three categories of complexity are largely independent of each other.

That complexity in these categories can vary independently (e.g. complex business processes can be designed and implemented simply just as simple processes can be designed and implemented in an extremely complex manner) is important to the understanding of complexity in IT.

See the full post on the Iasa Global Blog (a re-post, originally published here).