Architecture Corner: No Money – Lights Out! – Seven Deadly Sins of IT

Episode 5 of this season of Architecture Corner is out (I made a guest appearance in episode 1, “Good at Innovation”). In this installment, Chris the CEO falls victim to yet another temptation.

The CEO has decided that no more time and money will be “wasted” on old systems. Can Joakim Lindbom convince him that sloth breeds technical debt, leading to expensive outages?

Advertisement

Stopping Accidental Technical Debt

Buster Keaton looking at a poorly constructed house

In one of my earlier posts about technical debt, I differentiated between intentional debt (that taken on deliberately and purposefully) and accidental debt (that which just accrues over time without rhyme or reason or record). Dealing with (in the sense of evaluating, tracking, and resolving it) technical debt is obviously a consideration for someone in an application architect role. While someone in that role absolutely should be aware of the intentional debt, is there a way to be more attuned to the accidental debt as well?

Last summer, I published a post titled “Distance…is the one true enemy…”. The post started with a group of tweets from Gregory Brown talking about the corrosive effects of distance on software development (distance between compile and run, between failure and correction, between development and feedback, etc.). I then extended the concept to management, talking about how distance between sense-maker and decision-maker could negatively affect the quality of the decisions being made.

There’s also a distance that neither Greg nor I covered at the time, design distance. Design distance is the distance between the design and the outcome. Reducing design distance makes it easier to keep a handle on the accidental debt as well as the intentional.

Distance between the architectural decisions and the implementation can introduce technical debt. This distance can come from remote decision-makers, architecture pigeons who swoop in, deposit their “wisdom”, and then fly away home. It can come from failing to communicate the design considerations effectively across the entire team. It can also come from failing to monitor the system as it evolves. The design and the implementation need to be in alignment. Even more so, the design and the implementation need to align with particular problems to be solved/jobs to be done. Otherwise, the result may look like this:

Distance between development of the system and keeping the system running can introduce technical debt as well. The platform a system runs on is a vital part of the system, as critical as the code it supports. As with the code, the design, implementation, and context all need to be kept in alignment.

Alignment of design, implementation, and context can only be maintained by on-going architectural assessment. Stefan Dreverman’s “Using Philosophy in IT architecture” identified four questions to be asked as part of an assessment:

  1. “What is my purpose?”
  2. “What am I composed of?”
  3. “What’s in my environment?”
  4. “What do I communicate?”

These questions are applicable not only to the beginning of a system, but throughout its life-cycle. Failing to re-evaluate the architecture as a whole as the system evolves can lead to inconsistencies as design distance grows. We can get so busy dealing with the present that we create a future of pain:

At first glance, this approach might seem to be expensive, but rewriting legacy systems is expensive as well (assuming the rewrite would be successful, which is a tenuous assumption). Building applications with a one-and-done mindset is effectively building a legacy system.

Building a Legacy

Greek Trireme image from Deutsches Museum, Munich, Germany

 

Over the last few weeks, I’ve run across a flurry of articles dealing with the issue of legacy systems used by the U.S. government.

An Associated Press story on the findings from the Government Accountability Office (GAO) issued in May reported that roughly three-fourths of the $80 billion IT budget was used to maintain legacy systems, some more than fifty years old and without an end of life date in sight. An article on CIO.com about the same GAO report detailed seven of the oldest systems. Two were over 56 years old, two 53, one 51, one 35, and one 31. Four of the seven have plans to be replaced, but the two oldest have no replacement yet planned.

Cost was not the only issue, reliability is a problem as well. An article on Timeline.com noted:

Then there’s the fact that, up until 2010, the Secret Service’s computer systems were only operational about 60% of the time, thanks to a highly outdated 1980s mainframe. When Senator Joe Lieberman spoke out on the issue back in 2010, he claimed that, in comparison, “industry and government standards are around 98 percent generally.” It’s alright though, protecting the president and vice president is a job that’s really only important about 60 percent of the time, right?

It would be easy to write this off as just another example of public-sector inefficiency, but you can find these same issues in the private sector as well. Inertia can, and does, affect systems belonging to government agencies and business alike. Even a perfectly designed implemented system (we’ve all got those, right?) is subject to platform rot if ignored. Ironically, our organizations seem designed to do just that by being project-centric.

In philosophy, there’s a paradox called the Ship of Theseus, that explores the question of identity. The question arises, if we maintain something by replacing its constituent parts, does it remain the same thing? While many hours could be spent debating this, to those whose opinion should matter most, those who use the system, the answer is yes. To them, the identity of the system is bound up in what they do with it, such that it ceases to be the same thing, not when we maintain it but when its function is degraded through neglect.

Common practice, however, separates ownership and interest. Those with the greatest interest in the system typically will not own the budget for work on it. Those owning the budget, will typically be biased towards projects which add value, not maintenance work that represents cost.

Speaking of cost, is 75% of the budget an unreasonable amount for maintenance? How well are the systems meeting the needs of their users? Is quality increasing, decreasing, or holding steady? Was more money spent because of deferred maintenance than would have been spent with earlier intervention? How much business risk is involved? Without this context, it’s extremely difficult to say. It’s understandable that someone outside an organization might lack this information, but even within it, would a centralized IT group have access to it all? Is the context as meaningful at a higher, central level as it is “at the pointy end of the spear”?

Maintaining systems bit by bit, replacing them gradually over time, is likely to be more successful and less expensive, than letting them rot and then having a big-bang re-write. In my opinion, having an effective architecture for the enterprise’s IT systems is dependent on having an effective architecture for the enterprise itself. If the various systems (social and software) are not operating in conjunction, drift and inertia will take care of building your legacy (system).

[Greek Trireme image from Deutsches Museum, Munich, Germany via Wikimedia Commons]

Dealing with Technical Debt Like We Mean it

What’s the biggest problem with technical debt?

https://twitter.com/jetpack/status/724637235459547136

In my opinion, the biggest problem is that it works. Just like the electrical outlet pictured above, systems with technical debt get the job done, even when there’s a hidden surprise or two waiting to make life interesting for us at some later date. If it flat-out failed, getting it fixed would be far easier. Making the argument to spend time (money) changing something that “works” can be difficult.

Failing to make the argument, however, is not the answer:

Brenda Michelson‘s observation is half the battle. The argument for paying down technical debt needs to be made in business-relevant terms (cost, risk, customer impact, etc.). We need more focus on the “debt” part and remember “technical” is just a qualifier:

The other half of the battle is communicating, in the same business-relevant manner, the costs and/or risks involved when taking on technical debt is considered:

Tracking what technical debt exists and managing the payoff (or write off, removing failed experiments is a reduction technique) is important. Likewise, managing the assumption of technical debt is critical to avoid being swamped by it.

Of course, one could take the approach that the only acceptable level of technical debt is zero. This is equivalent to saying “if we can’t have a perfect product, we won’t have a product”. That might be a difficult position to sell to those writing the checks.

Even if you could get an agreement for that position, reality will conspire to frustrate you. Entropy emerges. Even if the code is perfected and then left unchanged, the system can rot as its platform ages and the needs of the business change. When a system is actively maintained over time without an eye to maintaining a coherent, intentional architecture, then the situation becomes worse. In his post “Enterprise Modernization – The Next Big Thing!”, David Sprott noted:

The problem with modernization is that it is widely perceived as slow, very expensive and high risk because the core business legacy systems are hugely complex as a result of decades of tactical change projects that inevitably compromise any original architecture. But modernization activity must not be limited to the old, core systems; I observe all enterprises old and new, traditional and internet based delivering what I call “instant legacy” [Note 1] generally as outcomes of Agile projects that prioritize speed of delivery over compliance with a well-defined reference architecture that enables ongoing agility and continuous modernization.

Kellan Elliot-McCrea, in “Towards an understanding of technical debt”, captured the problem:

All code is technical debt. All code is, to varying degrees, an incorrect bet on what the future will look like.

This means that assessing and managing technical debt should be an ongoing activity with a responsible owner rather than a one-off event that “somebody” will take care of. The alternative is a bit like using a credit card at every opportunity and ignoring the statements until the repo-man is at the door.

What We Have Here is a Failure to Communicate

https://twitter.com/jetpack/status/467353553552416769

“What we’ve got here is failure to communicate” and it appears to be epidemic. My own personal grand unified theory of everything is that most problems stem from or are aggravated by a lack of communication. Whether the topic is process, governance, planning, estimates, or design, chances are it’s easier to find opinions (and worse, policy and practices) based on one-sided viewpoints than a balanced understanding of the contexts involved. This is dangerous due to the simple fact that organizations are social systems (frequently fractal systems of systems) and as Ruth Malan has noted:

Russell Ackoff urged that to design a system, it must be seen in the context of the larger system of which it is part. Any system functions in a larger system (various larger systems, for that matter), and the boundaries of the system — its interaction surfaces and the capabilities it offers — are design negotiations. That is, they entail making decisions with trade-off spaces, with implications and consequences for the system and its containing system of systems. We must architect across the boundaries, not just up to the boundaries. If we don’t, we naively accept some conception of the system boundary and the consequent constraints both on the system and on its containing systems (of systems) will become clear later. But by then much of the cast will have set. Relationships and expectations, dependencies and interdependencies will create inertia. Costs of change will be higher, perhaps too high.

In other words, systems exist within an ecosystem, not a vacuum. Failure to take context into account harms systems (whether software or social) by baking in harmful structures and behaviors and we cannot take into account contexts that are not communicated and appreciated. This, by the way, is why you find posts about management and process on a site with the tagline “All Things Architectural”.

This is why I believe that successfully managing technical debt can’t happen without successfully communicating to the customer when it’s being taken on, what the costs involved are (or may be) and how it’s affecting the evolution of the product.

This is why I believe the answer to problems related to estimation lies in communication and collaboration, rather than #NoEstimates on the one hand or rigid authoritarianism on the other. This, in my opinion, holds true for all the social system issues (management, process, governance, planning, quality, and architectural design) that affect software development. Without understanding (which does not happen without communication) the goals behind the practices and what results are being achieved, it’s unlikely that the system will work to the satisfaction of anyone.

Local “optimizations” won’t fix systemic problems. We need to bridge the gaps. Can we talk?

Updated 4/8/2016 to fix a broken link.

Technical Debt and Rolling Re-writes (Who Needs Architects?)

If you think building a system is challenging, try maintaining one.

Tom Cagley‘s recent post “Plan to Throw One Away Re-Read Saturday: The Mythical Man-Month, Part 11”, was a good reminder that while “technical debt” may be something currently on the radar for many, it’s far from a new phenomenon. The concept of instant legacy applications was in place when forty years ago when Frederick Brooks wrote his masterpiece, even if they weren’t called that. As Tom observed in the post:

Rarely is the first attempt useful to the end consumer, and the usefulness of that first attempt is less in the code than in the feedback it generates. Software development is no different. The initial conceptual design and anticipated technical architecture of a large project rarely stands up to the rigors of the discovery process, and those designs should be learned from and then thrown away.

The faulty assumptions and design flaws accumulate not only from sprint to sprint leading up to the initial release, but also from release to release. In spite of the fact that a product can be so seriously flawed, throwing it away and starting over is easier said than done. While sunk costs cannot be recovered, too sanguine an attitude towards them may not enhance your credibility with the customer. Having to pay for the same thing over and over can make them grumpy.

This sets up a dilemma, one that frequently leads to living with technical debt and attempting to incrementally patch it up. There are limits, however, to the number of band-aids that can be applied. This might make it tempting to propose a rewrite, but as Erik Dietrich stated in “The Myth of the Software Rewrite”:

Sure, they know things now that they didn’t know when they started on this code 3 years ago. But won’t the same thing be true in 3 years? Won’t the developers then be looking at the code and saying, “this is a mess — if only we knew in 2015 what we now know in 2018!” And, beyond that, what makes you think that giving the same group of people the same marching orders won’t result in the same kind of code?

The “big rewrite from scratch because this is a mess” is a losing strategy.

Fortunately, there is an alternative. Quoting Tom Cagley again from the same post as above:

If change is both inevitable and good (within limits), then both systems and organizations (a type of system) need to be engineered to support and facilitate change. Architecturally, techniques such as modularization, object-oriented design and other processes that foster simplification and incremental change create an environment in which change isn’t avoided, but rather encouraged.

While we may laugh at the image of changing a tire while the vehicle is in motion, it is an accurate metaphor. Customers expect flexibility and change on the go; waiting equals lost business. The keys to evolving in place are having an intentionally designed, modular architecture and an understanding of where the weaknesses lie. Both of these are concerns that reside squarely on the architect’s plate.

Modularity not only makes an application more easily maintainable via separation of concerns, but it also embraces change by making components replaceable. This is one of the qualities that has made microservices such a hot topic, although it would be a mistake to think that microservices are the only way (or best way in all cases) to achieve modularity.

Modularity brings benefits beyond the purely technical as well. Rewrites of a fraction of an application are more easily sold than big-bang efforts. Demonstrating forethought (while you can’t predict what the change will be, predicting the need for change is more of a sure thing) demonstrates concern for the customer’s welfare, which should make for a better relationship.

Being able to throw a system away a little at a time allows us to keep the car on the road while it changes and adapts to changing conditions.

“Laziness as a Virtue in Software Architecture” on Iasa Global

Laziness may be one of the Seven Deadly Sins, but it can be a virtue in software development. As Matt Osbun observed:

https://twitter.com/jetpack/status/563712928097251328

Robert Heinlein noted the benefits of laziness:

https://twitter.com/jetpack/status/562440343334170624

See the full post on the Iasa Global Site (a re-post, originally published here).

“Microservice Mistakes – Complexity as a Service” on Iasa Global

I’m pleased to announce that I’ve been asked to continue as a contributor to the Iasa Global site. I’m planning to post original content there on at least a monthly basis. In the interim, please enjoy a re-post of “Microservice Mistakes – Complexity as a Service”

Laziness as a Virtue in Software Architecture

Laziness may be one of the Seven Deadly Sins, but it can be a virtue in software development. As Matt Osbun observed:

https://twitter.com/jetpack/status/563712928097251328

Robert Heinlein noted the benefits of laziness:

https://twitter.com/jetpack/status/562440343334170624

Even in the military, laziness carries potential greatness (emphasis mine):

I divide my officers into four groups. There are clever, diligent, stupid, and lazy officers. Usually two characteristics are combined. Some are clever and diligent — their place is the General Staff. The next lot are stupid and lazy — they make up 90 percent of every army and are suited to routine duties. Anyone who is both clever and lazy is qualified for the highest leadership duties, because he possesses the intellectual clarity and the composure necessary for difficult decisions. One must beware of anyone who is stupid and diligent — he must not be entrusted with any responsibility because he will always cause only mischief.

Generaloberst Kurt von Hammerstein-Equord

The lazy architect will ignore slogans like YAGNI and the Rule of Three when experience and/or information tells them that it’s far more likely that the need will arise than not. As Matt stated in “Foreseeable and Imaginary Design”, architects must ask “What changes are possible and which of those changes are foreseeable”. The slogans point out that engineering costs but the reality is that so does re-work resulting from decisions deferred. Avoiding that extra work (laziness) avoids the cost associated with it (frugality).

Likewise, lazy architects are less likely cave when confronted with the sayings of some notable person. Rather than resign themselves to the extra work, they’re more likely to examine the statement as Kevlin Henney did:

It’s far cheaper to design and build a system according to its context than re-build a system to fix issues that were foreseeable. The re-work born of being too focused on the tactical to the detriment of the strategic is as much a form of technical debt as cutting corners. The lazy architect knows that time spent identifying and reconciling contexts can allow them to avoid the extra work caused by blind incremental design.

Microservice Mistakes – Complexity as a Service

Michael Feathers’ tweet about technical empathy packs a lot of wisdom into 140 characters. Lack of technical empathy can lead to a system that is harder to both implement and maintain since no thought was given to simplifying things for the caller. Maintainability is one of those quality of service requirements that appears to be a purely technical consideration right up to the point that it begins significantly affecting development time. If this sounds suspiciously like technical debt to you, then move to the head of the class.

The issue of technical empathy is particularly on point given the popular interest in microservice architectures (MSAs). The granular nature of the MSA style brings many benefits, but also comes with the cost of increased complexity (among others). Michael Feathers referred to this in a post from last summer titled “Microservices Until Macro Complexity”:

It is going to be interesting to see how this approach scales. Some organizations have a relatively low number of microservices. Others are pushing higher, around the 600 mark. This is a bit beyond the point where people start seeking a bigger picture. If services are often bigger than classes in OO, then the next thing to look for is a level above microservices that provides a more abstract view of an architecture. People are struggling with that right now and it was foreseeable. Along with that concern, we have the general issue of dealing with asynchrony and communication patterns between services. I strongly believe that there is a law of conservation of complexity in software. When we break up big things into small pieces we invariably push the complexity to their interaction.

The last sentence bears repeating: “When we break up big things into small pieces we invariably push the complexity to their interaction”. Breaking a monolith into microservices simplifies each individual component, however, as Eran Hammer observed in “The Fallacy of Tiny Modules”, “…at some point, someone has to put it all together and then, all the complexity is going to surface…”. As Yamen Sader illustrated in his slide deck “Microservices 101 – The Big Why” (slide #26), the structure of the organization, domain, and system will diverge in an MSA. The implication of this is that for a given domain task, we need to know more about the details of that task (i.e. have less encapsulation of the internals) in a microservice environment.

The further removed the consumer of these services are from the providers, the harder and less convenient it will be to transfer that knowledge. To put this in perspective, consider two fast food restaurants: one operates in a traditional manner where money is exchanged for burgers, the other is a microservice style operation where you must obtain the beef, lettuce, pickles, onion, cheese, and buns separately, after which the cooking service combines them (provided the money is there also). The second operation will likely be in business for a much shorter period of time in spite of the incredible flexibility it offers. Additionally, it that flexibility only truly exists when the contracts between two or more providers are swappable. As Ben Morris noted in “Do Microservices create extra challenges for distributed development?”:

Each team will develop its own interpretation of the system and view of the data model. Any service collaboration will have to involve some element of translation or mapping and will be vulnerable to subtle bugs that can arise from semantic differences.

Adopting a canonical model across the platform can help to address this problem by asserting a common vocabulary. Each service is expected to exchange information based on a shared definition of the main entities. However, this requires a level of cross-team governance which is very much at odds with the decentralised nature of microservices.

None of this is meant to say that the microservice architecture style is “bad” or “wrong”. It is my opinion, however, that MSA is first and foremost an architectural style of building applications rather than systems of systems. This isn’t an absolute; I could see multiple MSA applications within an organization sharing component microservices. Where those microservices transcend the organizational boundary, however, making use of them becomes more complex. Actions at a higher level of granularity, such as placing an order for some product or service, involves coordinating multiple services in the MSA realm rather than consuming one “chunkier” service. While the principles behind microservice architectures are relevant at higher levels of abstraction, a mismatch in granularity between the concept and implementation can be very troublesome. Maintainability issues are both technical debt and a source of customer dissatisfaction. External customers are far easier to lose than internal ones.