Take a couple of seconds and watch the clip in the tweet below:
The Darwin Awards (@AwardsDarwin) March 25, 2016
While it would be incredibly difficult to predict that exact outcome, it is also incredibly easy to foresee that it’s a possibility. As the saying goes, “forewarned is forearmed”.
Being forewarned and forearmed is an important part of what an architect does. An architect is supposed to focus on the architecturally significant aspects of a system. I like to use Ruth Malan‘s definition of architectural significance due to its flexibility:
"What decisions does the architect make?" Architecturally significant ones. "What is architecturally significant?" The architect decides!—
VisArch (@ruthmalan) January 12, 2015
Decisions (both those that were made and those that were left unmade) that end up taking systems offline and causing very public embarrassment are, in my opinion, architecturally significant.
Last week, two very public, very foreseeable failures took place: first was the chaos caused by a developer removing his modules from NPM, which was followed by Microsoft having to pull the plug on its Tay chatbot when it was “trained” to spew offensive comments in less than 24 hours. In my opinion, these both represented design failures resulting from a lack of due consideration of the context in which these systems would operate.
After all, can anyone really claim that no one would expect that people on the internet would try to “corrupt” a chatbot? According to Azeem Azhar as quoted in Business Insider, not really:
“Of course, Twitter users were going to tinker with Tay and push it to extremes. That’s what users do — any product manager knows that.
“This is an extension of the Boaty McBoatface saga, and runs all the way back to the Hank the Angry Drunken Dwarf write in during Time magazine’s Internet vote for Most Beautiful Person. There is nearly a two-decade history of these sort of things being pushed to the limit.”
The current claim, as reported in CIO.com, is that Tay was developed with filtering built-in, but there was a “critical oversight” for a specific kind of attack. According to the article, it’s believed that the attack vector involved asking Tay to “repeat after me”.
Or, as Matt Ballantine put it:
Yet another one for my "it's basically just Eliza, innit" file... businessinsider.com/microsoft-dele…—
Matt Ballantine (@ballantine70) March 24, 2016
Likewise, who could imagine issues with a centralized repository of cascading dependencies? Failing to consider what would happen if someone suddenly pulled one of the bottom blocks out led to a huge inconvenience to anyone depending on that module or any downstream module. There’s plenty of blame to go around: the developer who took his toys and went home, those responsible for NPM’s design, and those who depended on it without understanding its weaknesses.
Does your code depend on fewer than 72 NPM packages? Your code is lonely and you need to read this book. https://t.co/O0pFQ5H9jF—
Practical Developer (@ThePracticalDev) March 24, 2016
“The Iron Law of Tools” is “that which does for you will also do to you”. Understanding the trade-offs allows you to plan for risk mitigation in advance. Ignoring them merely ensures that they will have to be dealt with in crisis mode. This is something I covered in a previous post, “Dependency Management is Risk Management”.
Effective design involves not only the internals of a system but its externals as well. The conditions under which the system will be used, it’s context, is highly significant. That means considering not only the system’s use cases, but also its abuse cases. A post written almost a year ago by Brandon Harris, “Designing for Evil”, conveys this well:
When all is said and done, when you’ve set your ideas to paper, you have to sit down and ask yourself a very specific question:
How could this feature be exploited to harm someone?
Now, replace the word “could” with the word “will.”
How will this feature be exploited to harm someone?
You have to ask that question. You have to be unflinching about the answers, too.
Because if you don’t, someone else will.
When I began working on this post, the portion above was what I had in mind to say. In essence, I planned a longer-form version of what I’d tweeted about the Tay fiasco:
Gene Hughson (@GeneHughson) March 24, 2016
However, before I had finished writing the post, Greger Wikstrand posted “The fail fast fallacy”. Greger and I have been carrying on a conversation about innovation over the last few months. While I had initially intended to approach this as a general issue of architectural practice rather than innovation, the points he makes are just too apropos to leave out.
In the post, Greger points out that the focus seems to have shifted from learning to failure. Learning from experience can be the best way to test an idea. However, it’s not the only way:
Evolution and nature has shown us that there are two, equally valid, approaches to winning the gene game. The first approach is to get as much offspring as possible and “hope” many of them survive (r-selection). The second approach is to have few offspring but raise them and nurture them carefully (K-selection). Biologists tell us that the first strategy works best in a harsh, unpredictable environment where the effort of creating offspring is low. The second strategy works better in an environment where there is less change and offspring are more expensive to produce. Some of the factors that favour r-selection seems to be large uncompeted resources. K-selection is more favourable in resource scarce, low predator areas.
The phrase “…where the effort of creating offspring is low” is critical here. The higher the “cost” of the experiment, the more risk is involved in failure. This makes it advisable to tilt the playing field by supporting and nurturing the “offspring”.
In response to Greger’s post, Casimir Artmann posted two excellent articles that further elaborated on this. In “Fail Fast During Adventures”, he noted that “There is a fine line between fail fast and Darwin Awards in IRL.” His point, preparation beforehand and being willing to abort during an experiment before failure is equivalent to suffering a fatality can be effective learning strategies. Lessons that you don’t live to apply aren’t worth much.
Casimir followed with “Fail is not an Option”, in which he stated:
I want the project to succeed, but I plan for things going wrong so that the consequences wouldn’t be to huge. Some risk are manageable, as walking alone, but not alone and off-trail. That’s to risky. If you doing outdoor adventures, you are probably more prepared and skilled than a ordinarie project member, and thats a huge benefit.
I guess the best advice, when doing completely new things with IT, is to start really small so that the majority of your business is not impacted if there is a failure. When something goes wrong, be sure that you could go back to safe place. Point of no return is like being on a sailing boot in the middle of the Atlantic where you can’t go back.
That’s excellent advice. “Fail Fast” has the advantage of being able to fit on a bumper sticker, but the longer, more nuanced version is more likely to serve you well.