Architecture Corner: No Money – Lights Out! – Seven Deadly Sins of IT

Episode 5 of this season of Architecture Corner is out (I made a guest appearance in episode 1, “Good at Innovation”). In this installment, Chris the CEO falls victim to yet another temptation.

The CEO has decided that no more time and money will be “wasted” on old systems. Can Joakim Lindbom convince him that sloth breeds technical debt, leading to expensive outages?

Trash or Treasure – What’s Your Legacy?

Pirate's burying treasure

The topic of legacy systems is something of a contentious one. In most cases, a legacy is understood to be a good thing. What makes a system “legacy”? Is it a technical or business decision?

A little over a year ago, Greger Wikstrand took a stab at clarifying the term with his post “Legacy systems, a definition”. In the post, he looked at different definitions of what constituted a legacy system, ranging from “any code that is in use” to “outdated technology” to “high technical debt”. The definition he went with, in my opinion, is the most useful:

It should be clear that legacy systems are not about technical considerations. It is about how well the existing system meets and is able to adapt to business needs.

A pair of tweets from Joanna Young that I saw yesterday brought this to mind:

Whether or not a system has crossed the line into legacy territory is not a technical decision but a business one. As Greger and Joanna both noted, it’s about fitness for purpose. Technical considerations absolutely have immense bearing on whether the system is able to meet needs. However, they are not the sole determinant.

The standard narrative is for a system to start out “clean” and then rot via neglect and/or ad hoc enhancement. This is certainly a common scenario, but it overlooks the obvious. While failure to maintain a system and its platform will certainly degrade it, keeping the technology components up to date does not ensure that the system will continue to match the needs of those who depend on it. For that matter, it’s easy enough to build a brand new system using the latest and greatest technology that is a legacy right out the gate due to its failure to meet the needs of its stakeholders.

Age of the platform is not a problem; an inability to get support or find people knowledgeable about the platform is a problem. Technical debt in and of itself is not a problem; being impeded or prevented from maintaining/enhancing the system due to technical debt is a problem. This works for any given technical issue – substitute the tangible, stakeholder-oriented result of that technical issue and the point becomes clearer to those with the ability to address them.

The key is not to focus solely on functional aspects nor quality of service and/or technical aspects, but the system as a whole. This requires the participation of the entire set of social systems involved in the creation, maintenance, and usage of the software system. Communication and collaboration across all elements of those social systems is critical to effectively maintaining the software system and the social systems that it enables.

A critically important part of promoting that communication and collaboration is maintaining the cohesion of the social systems involved in creating and maintaining the software system. Where those social systems are ad hoc and episodic, the potential for forming the relationships necessary for effective situational awareness is minimal. IT won’t know about functional gaps until too late and the stakeholders won’t know what their options are for addressing them nor will they have advance notice of impending technical issues.

Social systems create, maintain, and use software systems. Systems that are designed to work together have a better chance of doing so than those that are just thrown together and wished well.

Capability Now, Capability Later

Mock tank, British Army in Italy, WWII

In my post “Strategic Tunnel Vision”, I touched on the concept of capability. I discussed how focusing on new capabilities can crowd out existing capabilities and the detrimental effects of that when those existing capabilities are still necessary. I also spoke to how choices about strategic capabilities can trickle down to effect tactical capabilities.

What I failed to do, however, was define what was meant by the term “capability”. That’s a pretty big oversight on my part, because, in my opinion, understanding the concept is critical across all levels of architectural concerns.

Tom Graves, in his “Definitions on capability”, defines the term (along with some related concepts):

— Capability: the ability to do something.

— Capability-based planning: planning to do something, based on capabilities that already exist, and/or that will be added to the existing suite of capabilities; also, identifying the capabilities that would be needed to implement and execute a plan.

— Capability increment: an extension to an existing capability; also, a plan to extend a capability.

— Capability map: a visual and/or textual description of (usually) an organisation’s capabilities.

Yes, I do know that those definitions are terribly bland and generic – and they need to be that way. That’s the whole point: they need to be generic enough to be valid and usable at every possible level and in every possible context – otherwise we’ll introduce yet more confusion to something that’s often way too confused already.

That last paragraph is critical. The concept of “capability” is a high-level one that is useful across multiple levels of architectural concern (ie. application, solution, enterprise IT, and the enterprise itself). Quoting Tom again:

Note what else is intentionally not in that definition of ‘capability‘:

  • there’s no actual doing – it’s just an ability to do something, not the usage of that ability
  • there’s no ‘how’ – we don’t assume anything about how that capability works, or what it’s made up of
  • there’s no ‘why‘ – we don’t assume any particular purpose
  • there’s no ‘who‘ – we don’t assume anything about who’s responsible for this capability, or where it sits in an organisational hierarchy or suchlike

We do need all of those items, of course, as we start to flesh out the details of how the capabilities would be implemented and used in real-world practice. But in the core-definition itself, we very carefully don’t – they must not be included in the definition itself.

The reason why we have to be so careful and pedantic about this is because the relationship between service, capability, function and the rest is inherently recursive and fractal: each of them contains all of the others, which in turn each contain all of the others, and so on almost to infinity. If we don’t use deliberately-generic definitions for all of those items, we get ourselves into a tangle very quickly indeed – as can be seen all too easily in the endless definitional-battles about the relationships between ‘business-function’ versus ‘business-process’ versus ‘business-service’ versus ‘business-capability’ and so on.

In short, it’s a crucial building block in our designs and plans (which is redundant, since design is a form of planning). If we don’t have and can’t get the ability to do something, it’s game over. However, as Tom noted, we need to move beyond the raw ability in order to make effective use of capabilities. We need to think timing and personnel (which will probably largely drive timing anyway). A capability later may well not be as valuable as the same capability right now.

This was brought to mind while skimming a book review on a military strategy site (emphasis added by me):

In March 2015, then-Chief of Staff of the U.S. Army General Raymond T. Odierno admitted to the British newspaper The Telegraph that the so-called special relationship between the United States and Great Britain isn’t what it used to be. “In the past we would have a British Army division working alongside an American army division,” he said, but he feared that in the future British battalions and brigades would have to operate “inside” American units. “What has changed,” Odierno declared, “is the level of capability.”

Later that week, I asked a senior British general about Odierno’s remarks. He replied, deadpan, that although Odierno’s candor was appreciated, his statement was factually incorrect. “We can still field a division,” the general insisted. “It is just a question of how long it takes us to field one.” Potential tanks, he seemed to think, were just as relevant as an actual ones.

The highlighted portion of the quote illustrates my point. Having the capability to do something immediately and the capability to do that same thing at some point in the future are not equivalent (just to be fair to the British Army, the US Army was in this same position during Operation Desert Shield – the initial ground forces that could be deployed were extremely thin). Treating them as equivalent potentially risks disaster.

It should be noted, however, that level of concern will color the perception of the value of a future capability versus a current one. At the tactical level, in business as well as in war, “…first with the most…” is likely a winning move. At the strategic level, however, where resources must be budgeted across multiple initiatives, priorities should dictate which capabilities get preference. Tactical leaders may have to be satisfied with “on time with just enough”.

Regardless of level, a clear assessment of capabilities, what’s available when, is key to making effective decisions.

Building a Legacy

Greek Trireme image from Deutsches Museum, Munich, Germany

 

Over the last few weeks, I’ve run across a flurry of articles dealing with the issue of legacy systems used by the U.S. government.

An Associated Press story on the findings from the Government Accountability Office (GAO) issued in May reported that roughly three-fourths of the $80 billion IT budget was used to maintain legacy systems, some more than fifty years old and without an end of life date in sight. An article on CIO.com about the same GAO report detailed seven of the oldest systems. Two were over 56 years old, two 53, one 51, one 35, and one 31. Four of the seven have plans to be replaced, but the two oldest have no replacement yet planned.

Cost was not the only issue, reliability is a problem as well. An article on Timeline.com noted:

Then there’s the fact that, up until 2010, the Secret Service’s computer systems were only operational about 60% of the time, thanks to a highly outdated 1980s mainframe. When Senator Joe Lieberman spoke out on the issue back in 2010, he claimed that, in comparison, “industry and government standards are around 98 percent generally.” It’s alright though, protecting the president and vice president is a job that’s really only important about 60 percent of the time, right?

It would be easy to write this off as just another example of public-sector inefficiency, but you can find these same issues in the private sector as well. Inertia can, and does, affect systems belonging to government agencies and business alike. Even a perfectly designed implemented system (we’ve all got those, right?) is subject to platform rot if ignored. Ironically, our organizations seem designed to do just that by being project-centric.

In philosophy, there’s a paradox called the Ship of Theseus, that explores the question of identity. The question arises, if we maintain something by replacing its constituent parts, does it remain the same thing? While many hours could be spent debating this, to those whose opinion should matter most, those who use the system, the answer is yes. To them, the identity of the system is bound up in what they do with it, such that it ceases to be the same thing, not when we maintain it but when its function is degraded through neglect.

Common practice, however, separates ownership and interest. Those with the greatest interest in the system typically will not own the budget for work on it. Those owning the budget, will typically be biased towards projects which add value, not maintenance work that represents cost.

Speaking of cost, is 75% of the budget an unreasonable amount for maintenance? How well are the systems meeting the needs of their users? Is quality increasing, decreasing, or holding steady? Was more money spent because of deferred maintenance than would have been spent with earlier intervention? How much business risk is involved? Without this context, it’s extremely difficult to say. It’s understandable that someone outside an organization might lack this information, but even within it, would a centralized IT group have access to it all? Is the context as meaningful at a higher, central level as it is “at the pointy end of the spear”?

Maintaining systems bit by bit, replacing them gradually over time, is likely to be more successful and less expensive, than letting them rot and then having a big-bang re-write. In my opinion, having an effective architecture for the enterprise’s IT systems is dependent on having an effective architecture for the enterprise itself. If the various systems (social and software) are not operating in conjunction, drift and inertia will take care of building your legacy (system).

[Greek Trireme image from Deutsches Museum, Munich, Germany via Wikimedia Commons]

Can you afford microservices?

Check

Much has been written about the potential benefits of designing applications using microservices. A fair amount has also been written about the potential pitfalls. On this blog, there’s been a combination of both. As I noted in “Are Microservices the Next Big Thing?”: It’s not the technique itself that makes or breaks a design, it’s how applicable the technique is to problem at hand.

It’s important, however, to understand that “applicable to the problem at hand” isn’t strictly a technical question. The diagram in Philippe Kruchten‘s tweet below captures the full picture of a workable solution:

As Kruchten pointed out in his post ‘Three “-tures”: architecture, infrastructure, and team structure’, the architecture of the system, the system’s infrastructure, and the structure of the team developing the system are mutually supporting. These aspects of the architecture of the solution must be kept aligned in order for the solution to work. In my opinion, it should be a taken as a given that this architecture of the solution must also align with the architecture of the problem as a minimum condition to be considered fit for purpose.

Martin Fowler alluded to the need to align architecture, infrastructure, and team structure in “MicroservicePrerequisites” when he listed rapid provisioning, basic monitoring, and rapid deployment as pre-conditions for microservices. These capabilities not only represent infrastructure requirements, but also “…imply an important organizational shift – close collaboration between developers and operations: the DevOps culture”. Permanent product teams building and operating applications are, in my opinion, an extremely effective way to deliver IT. It must be realized, however, that effectiveness comes with a price tag, in terms of people, tools, and infrastructure.

In “MicroservicePremium”, Fowler further stated “don’t even consider microservices unless you have a system that’s too complex to manage as a monolith”, identifying “sheer size” as the biggest source of complexity. Size will encompass both technical and organizational concerns:

The microservice approach to division is different, splitting up into services organized around business capability. Such services take a broad-stack implementation of software for that business area, including user-interface, persistant storage, and any external collaborations. Consequently the teams are cross-functional, including the full range of skills required for the development: user-experience, database, and project management.

Expanding on this, the ideal organization will be one cross-functional team per microservice/bounded context. Even with very small teams, this requires either significant expenditure or a compromise of how the architectural and social aspects (i.e. Conway’s Law) work together in this architectural style.

Other requirements inherent in a microservice architecture are things like API governance and infrastructure services to support distributed processing (e.g. a service registry). Data considerations that are trivial in monolithic environment like transactions, referential integrity, and complex queries are absent in a distributed environment and facilities may need to be bought or built to compensate. In a distributed environment, even error logging requires special consideration to avoid drowning in complexity:

The overhead in terms of organization, infrastructure, and tooling, whether in ideal or comprised form, will introduce complexity and cost. I would, in fact, expect compromises to avoid costs to introduce even more complexity. If the profile of the system in terms of business value and necessary complexity (i.e. complexity inherent in the business function) warrants the additional overhead, then that overhead can represent a valid solution to the problem at hand. If, however, the complexity is solely created by the overhead, without an underlying need, the solution becomes suspect. Adding cost and complexity without offsetting benefits will likely lead to problems. Matching the solution to the problem and balancing those costs and benefits requires the attention of an architectural role at the application level, rather than relying on each team to work independently and hope for coherence and economy.

Let’s Talk Value (Who Needs Architects?)

Gold Bars

Value is a term that’s heard often these days, but I wonder how well it’s understood. Too often, it seems, value is taken to mean raw benefit rather than its actual meaning, benefit after cost (i.e. “bang for the buck”). An even better understanding of the concept can be had from Tom Cagley’s “Breaking Down Value”: “Value = (Benefit + Perception) – (Cost + Perception)”.

The point being?

Change involves costs and, one would hope, benefits. Not only does the magnitude of the cost matter, but also its perception matters. Where a cost is seen as unnecessary or incurred to benefit someone other than the one paying the bills, its perception will likely be unfavorable. Changes that come about due to unforeseen circumstances are more likely to be seen as necessary than those stemming from foreseeable ones. Changes to accommodate known needs are the least likely to be seen as reasonable. This is why I’ve always maintained that YAGNI doesn’t scale beyond low-level design into the realm of architectural decisions. Where cost of change determines architectural significance, decision churn is problematic.

After I posted “Who Needs Architects? Who’s Minding the Architecture?”, Charlie Alfred tweeted:

One way to be seen as an asset is to provide value. As Cesare Pautasso put it:

This is not to say that architectural refactoring is without value, but that refactoring will be seen as redundant work. When that work is for foreseeable needs, it will be perceived as costlier and less beneficial than strictly new functionality. Refactoring to accommodate known needs will suffer an even greater perception problem.

YAGNI presumes that the risk of flexible design being unnecessary outweighs the risk of refactoring being unnecessary. In my opinion, that is far too simplistic a view. Some functionality and qualities may be speculative, but the need for others (e.g. security) will be more certain.

Studies have shown that the ability to modify an application is a prime quality concern for stakeholders. Flexible design enables easier and cheaper change. “Big bang” changes (expensive and painful) are more likely where coherent design is lacking. Holistic design based on context seems to provide more value (both tangible and perceived) than a dogma-driven process of stringing together tactical decisions and hoping for the best.

“Laziness as a Virtue in Software Architecture” on Iasa Global

Laziness may be one of the Seven Deadly Sins, but it can be a virtue in software development. As Matt Osbun observed:

Robert Heinlein noted the benefits of laziness:

See the full post on the Iasa Global Site (a re-post, originally published here).