Capability Now, Capability Later

Mock tank, British Army in Italy, WWII

In my post “Strategic Tunnel Vision”, I touched on the concept of capability. I discussed how focusing on new capabilities can crowd out existing capabilities and the detrimental effects of that when those existing capabilities are still necessary. I also spoke to how choices about strategic capabilities can trickle down to effect tactical capabilities.

What I failed to do, however, was define what was meant by the term “capability”. That’s a pretty big oversight on my part, because, in my opinion, understanding the concept is critical across all levels of architectural concerns.

Tom Graves, in his “Definitions on capability”, defines the term (along with some related concepts):

— Capability: the ability to do something.

— Capability-based planning: planning to do something, based on capabilities that already exist, and/or that will be added to the existing suite of capabilities; also, identifying the capabilities that would be needed to implement and execute a plan.

— Capability increment: an extension to an existing capability; also, a plan to extend a capability.

— Capability map: a visual and/or textual description of (usually) an organisation’s capabilities.

Yes, I do know that those definitions are terribly bland and generic – and they need to be that way. That’s the whole point: they need to be generic enough to be valid and usable at every possible level and in every possible context – otherwise we’ll introduce yet more confusion to something that’s often way too confused already.

That last paragraph is critical. The concept of “capability” is a high-level one that is useful across multiple levels of architectural concern (ie. application, solution, enterprise IT, and the enterprise itself). Quoting Tom again:

Note what else is intentionally not in that definition of ‘capability‘:

  • there’s no actual doing – it’s just an ability to do something, not the usage of that ability
  • there’s no ‘how’ – we don’t assume anything about how that capability works, or what it’s made up of
  • there’s no ‘why‘ – we don’t assume any particular purpose
  • there’s no ‘who‘ – we don’t assume anything about who’s responsible for this capability, or where it sits in an organisational hierarchy or suchlike

We do need all of those items, of course, as we start to flesh out the details of how the capabilities would be implemented and used in real-world practice. But in the core-definition itself, we very carefully don’t – they must not be included in the definition itself.

The reason why we have to be so careful and pedantic about this is because the relationship between service, capability, function and the rest is inherently recursive and fractal: each of them contains all of the others, which in turn each contain all of the others, and so on almost to infinity. If we don’t use deliberately-generic definitions for all of those items, we get ourselves into a tangle very quickly indeed – as can be seen all too easily in the endless definitional-battles about the relationships between ‘business-function’ versus ‘business-process’ versus ‘business-service’ versus ‘business-capability’ and so on.

In short, it’s a crucial building block in our designs and plans (which is redundant, since design is a form of planning). If we don’t have and can’t get the ability to do something, it’s game over. However, as Tom noted, we need to move beyond the raw ability in order to make effective use of capabilities. We need to think timing and personnel (which will probably largely drive timing anyway). A capability later may well not be as valuable as the same capability right now.

This was brought to mind while skimming a book review on a military strategy site (emphasis added by me):

In March 2015, then-Chief of Staff of the U.S. Army General Raymond T. Odierno admitted to the British newspaper The Telegraph that the so-called special relationship between the United States and Great Britain isn’t what it used to be. “In the past we would have a British Army division working alongside an American army division,” he said, but he feared that in the future British battalions and brigades would have to operate “inside” American units. “What has changed,” Odierno declared, “is the level of capability.”

Later that week, I asked a senior British general about Odierno’s remarks. He replied, deadpan, that although Odierno’s candor was appreciated, his statement was factually incorrect. “We can still field a division,” the general insisted. “It is just a question of how long it takes us to field one.” Potential tanks, he seemed to think, were just as relevant as an actual ones.

The highlighted portion of the quote illustrates my point. Having the capability to do something immediately and the capability to do that same thing at some point in the future are not equivalent (just to be fair to the British Army, the US Army was in this same position during Operation Desert Shield – the initial ground forces that could be deployed were extremely thin). Treating them as equivalent potentially risks disaster.

It should be noted, however, that level of concern will color the perception of the value of a future capability versus a current one. At the tactical level, in business as well as in war, “…first with the most…” is likely a winning move. At the strategic level, however, where resources must be budgeted across multiple initiatives, priorities should dictate which capabilities get preference. Tactical leaders may have to be satisfied with “on time with just enough”.

Regardless of level, a clear assessment of capabilities, what’s available when, is key to making effective decisions.

Building a Legacy

Greek Trireme image from Deutsches Museum, Munich, Germany

 

Over the last few weeks, I’ve run across a flurry of articles dealing with the issue of legacy systems used by the U.S. government.

An Associated Press story on the findings from the Government Accountability Office (GAO) issued in May reported that roughly three-fourths of the $80 billion IT budget was used to maintain legacy systems, some more than fifty years old and without an end of life date in sight. An article on CIO.com about the same GAO report detailed seven of the oldest systems. Two were over 56 years old, two 53, one 51, one 35, and one 31. Four of the seven have plans to be replaced, but the two oldest have no replacement yet planned.

Cost was not the only issue, reliability is a problem as well. An article on Timeline.com noted:

Then there’s the fact that, up until 2010, the Secret Service’s computer systems were only operational about 60% of the time, thanks to a highly outdated 1980s mainframe. When Senator Joe Lieberman spoke out on the issue back in 2010, he claimed that, in comparison, “industry and government standards are around 98 percent generally.” It’s alright though, protecting the president and vice president is a job that’s really only important about 60 percent of the time, right?

It would be easy to write this off as just another example of public-sector inefficiency, but you can find these same issues in the private sector as well. Inertia can, and does, affect systems belonging to government agencies and business alike. Even a perfectly designed implemented system (we’ve all got those, right?) is subject to platform rot if ignored. Ironically, our organizations seem designed to do just that by being project-centric.

In philosophy, there’s a paradox called the Ship of Theseus, that explores the question of identity. The question arises, if we maintain something by replacing its constituent parts, does it remain the same thing? While many hours could be spent debating this, to those whose opinion should matter most, those who use the system, the answer is yes. To them, the identity of the system is bound up in what they do with it, such that it ceases to be the same thing, not when we maintain it but when its function is degraded through neglect.

Common practice, however, separates ownership and interest. Those with the greatest interest in the system typically will not own the budget for work on it. Those owning the budget, will typically be biased towards projects which add value, not maintenance work that represents cost.

Speaking of cost, is 75% of the budget an unreasonable amount for maintenance? How well are the systems meeting the needs of their users? Is quality increasing, decreasing, or holding steady? Was more money spent because of deferred maintenance than would have been spent with earlier intervention? How much business risk is involved? Without this context, it’s extremely difficult to say. It’s understandable that someone outside an organization might lack this information, but even within it, would a centralized IT group have access to it all? Is the context as meaningful at a higher, central level as it is “at the pointy end of the spear”?

Maintaining systems bit by bit, replacing them gradually over time, is likely to be more successful and less expensive, than letting them rot and then having a big-bang re-write. In my opinion, having an effective architecture for the enterprise’s IT systems is dependent on having an effective architecture for the enterprise itself. If the various systems (social and software) are not operating in conjunction, drift and inertia will take care of building your legacy (system).

[Greek Trireme image from Deutsches Museum, Munich, Germany via Wikimedia Commons]

Can you afford microservices?

Check

Much has been written about the potential benefits of designing applications using microservices. A fair amount has also been written about the potential pitfalls. On this blog, there’s been a combination of both. As I noted in “Are Microservices the Next Big Thing?”: It’s not the technique itself that makes or breaks a design, it’s how applicable the technique is to problem at hand.

It’s important, however, to understand that “applicable to the problem at hand” isn’t strictly a technical question. The diagram in Philippe Kruchten‘s tweet below captures the full picture of a workable solution:

As Kruchten pointed out in his post ‘Three “-tures”: architecture, infrastructure, and team structure’, the architecture of the system, the system’s infrastructure, and the structure of the team developing the system are mutually supporting. These aspects of the architecture of the solution must be kept aligned in order for the solution to work. In my opinion, it should be a taken as a given that this architecture of the solution must also align with the architecture of the problem as a minimum condition to be considered fit for purpose.

Martin Fowler alluded to the need to align architecture, infrastructure, and team structure in “MicroservicePrerequisites” when he listed rapid provisioning, basic monitoring, and rapid deployment as pre-conditions for microservices. These capabilities not only represent infrastructure requirements, but also “…imply an important organizational shift – close collaboration between developers and operations: the DevOps culture”. Permanent product teams building and operating applications are, in my opinion, an extremely effective way to deliver IT. It must be realized, however, that effectiveness comes with a price tag, in terms of people, tools, and infrastructure.

In “MicroservicePremium”, Fowler further stated “don’t even consider microservices unless you have a system that’s too complex to manage as a monolith”, identifying “sheer size” as the biggest source of complexity. Size will encompass both technical and organizational concerns:

The microservice approach to division is different, splitting up into services organized around business capability. Such services take a broad-stack implementation of software for that business area, including user-interface, persistant storage, and any external collaborations. Consequently the teams are cross-functional, including the full range of skills required for the development: user-experience, database, and project management.

Expanding on this, the ideal organization will be one cross-functional team per microservice/bounded context. Even with very small teams, this requires either significant expenditure or a compromise of how the architectural and social aspects (i.e. Conway’s Law) work together in this architectural style.

Other requirements inherent in a microservice architecture are things like API governance and infrastructure services to support distributed processing (e.g. a service registry). Data considerations that are trivial in monolithic environment like transactions, referential integrity, and complex queries are absent in a distributed environment and facilities may need to be bought or built to compensate. In a distributed environment, even error logging requires special consideration to avoid drowning in complexity:

The overhead in terms of organization, infrastructure, and tooling, whether in ideal or comprised form, will introduce complexity and cost. I would, in fact, expect compromises to avoid costs to introduce even more complexity. If the profile of the system in terms of business value and necessary complexity (i.e. complexity inherent in the business function) warrants the additional overhead, then that overhead can represent a valid solution to the problem at hand. If, however, the complexity is solely created by the overhead, without an underlying need, the solution becomes suspect. Adding cost and complexity without offsetting benefits will likely lead to problems. Matching the solution to the problem and balancing those costs and benefits requires the attention of an architectural role at the application level, rather than relying on each team to work independently and hope for coherence and economy.

Let’s Talk Value (Who Needs Architects?)

Gold Bars

Value is a term that’s heard often these days, but I wonder how well it’s understood. Too often, it seems, value is taken to mean raw benefit rather than its actual meaning, benefit after cost (i.e. “bang for the buck”). An even better understanding of the concept can be had from Tom Cagley’s “Breaking Down Value”: “Value = (Benefit + Perception) – (Cost + Perception)”.

The point being?

Change involves costs and, one would hope, benefits. Not only does the magnitude of the cost matter, but also its perception matters. Where a cost is seen as unnecessary or incurred to benefit someone other than the one paying the bills, its perception will likely be unfavorable. Changes that come about due to unforeseen circumstances are more likely to be seen as necessary than those stemming from foreseeable ones. Changes to accommodate known needs are the least likely to be seen as reasonable. This is why I’ve always maintained that YAGNI doesn’t scale beyond low-level design into the realm of architectural decisions. Where cost of change determines architectural significance, decision churn is problematic.

After I posted “Who Needs Architects? Who’s Minding the Architecture?”, Charlie Alfred tweeted:

One way to be seen as an asset is to provide value. As Cesare Pautasso put it:

This is not to say that architectural refactoring is without value, but that refactoring will be seen as redundant work. When that work is for foreseeable needs, it will be perceived as costlier and less beneficial than strictly new functionality. Refactoring to accommodate known needs will suffer an even greater perception problem.

YAGNI presumes that the risk of flexible design being unnecessary outweighs the risk of refactoring being unnecessary. In my opinion, that is far too simplistic a view. Some functionality and qualities may be speculative, but the need for others (e.g. security) will be more certain.

Studies have shown that the ability to modify an application is a prime quality concern for stakeholders. Flexible design enables easier and cheaper change. “Big bang” changes (expensive and painful) are more likely where coherent design is lacking. Holistic design based on context seems to provide more value (both tangible and perceived) than a dogma-driven process of stringing together tactical decisions and hoping for the best.

“Laziness as a Virtue in Software Architecture” on Iasa Global

Laziness may be one of the Seven Deadly Sins, but it can be a virtue in software development. As Matt Osbun observed:

Robert Heinlein noted the benefits of laziness:

See the full post on the Iasa Global Site (a re-post, originally published here).

Laziness as a Virtue in Software Architecture

Laziness may be one of the Seven Deadly Sins, but it can be a virtue in software development. As Matt Osbun observed:

Robert Heinlein noted the benefits of laziness:

Even in the military, laziness carries potential greatness (emphasis mine):

I divide my officers into four groups. There are clever, diligent, stupid, and lazy officers. Usually two characteristics are combined. Some are clever and diligent — their place is the General Staff. The next lot are stupid and lazy — they make up 90 percent of every army and are suited to routine duties. Anyone who is both clever and lazy is qualified for the highest leadership duties, because he possesses the intellectual clarity and the composure necessary for difficult decisions. One must beware of anyone who is stupid and diligent — he must not be entrusted with any responsibility because he will always cause only mischief.

Generaloberst Kurt von Hammerstein-Equord

The lazy architect will ignore slogans like YAGNI and the Rule of Three when experience and/or information tells them that it’s far more likely that the need will arise than not. As Matt stated in “Foreseeable and Imaginary Design”, architects must ask “What changes are possible and which of those changes are foreseeable”. The slogans point out that engineering costs but the reality is that so does re-work resulting from decisions deferred. Avoiding that extra work (laziness) avoids the cost associated with it (frugality).

Likewise, lazy architects are less likely cave when confronted with the sayings of some notable person. Rather than resign themselves to the extra work, they’re more likely to examine the statement as Kevlin Henney did:

It’s far cheaper to design and build a system according to its context than re-build a system to fix issues that were foreseeable. The re-work born of being too focused on the tactical to the detriment of the strategic is as much a form of technical debt as cutting corners. The lazy architect knows that time spent identifying and reconciling contexts can allow them to avoid the extra work caused by blind incremental design.

Design Follies – ‘I paid for it so you have to use it’

Sinking of the SMS Ostfriesland
Sunk Costs

The Daily WTF is almost always good for a laugh (or a facepalm, at least). It’s even been the inspiration for posts on error handling and dependency management. The June 24th post, “The System”, is yet another one that I have to comment on. You have to read the full post to fully appreciate the nature of the “system” in question (really, read it – it’s so bad I can’t even begin to do it justice via an excerpt), but the punch line is that it’s so wretchedly convoluted due to the fact that one of the components of it was a “significant expense”.

The term “sunk cost” refers to a “cost that has already been incurred and cannot be recovered”. Although economic theory asserts that sunk costs should not influence future decisions, it’s a common enough phenomenon. The sunk cost fallacy (aka escalation of commitment) is the academic term for “throwing good money after bad”. It’s common in endeavors as disparate as business, gambling, and military operations. Essentially, a person operating under this fallacy justifies spending more money based on the fact that they’ve already invested so much and to abandon it would be a waste. Needless to say, this can easily become a death spiral where ever larger sums are wasted to avoid acknowledging the original mistake.

In the IT realm, the sunk cost fallacy can lead to real problems. Weaving a poorly chosen platform/component/service/etc. into your applications not only wastes money and effort up front, but also complicates future efforts. Quality of service impairments (e.g. maintainability, performance, scalability, and even security) can quickly add up, eclipsing the original cost of the bad decision.

One area where sunk costs are quasi-relevant is when a given capability is adequate. Theoretically, you’re not really considering the prior cost, but deciding to avoid a new cost. In practical terms, it works out the same – you have something that works, no need to pay for a redundant capability.

Beyond being aware of this particular anti-pattern, there’s not much else that can be done about it in many instances. Typically this is something imposed from above, so beyond getting on the record with your reservations, there may be little else you can do. If, however, you’re in a managerial position, reducing the cultural factors that promote this fallacy could serve you well.