I’ve often come across a situation where a team needs to undertake a major refactoring or some other technical change, which may take in the order of days or weeks, and we need to discuss how to approach it.
In years gone by, before continuous deployment was the norm and microservices were popular, it usually concerned refactoring a single codebase. The question was how best to tackle it, as I don’t like teams to spend many days on a change without being able to deploy working code. In more recent years the problem often concerns making interdependent changes to multiple subsystems or microservices, and a team working with continuous deployment wonder themselves how to undertake these time-consuming changes while also meeting the need to deploy frequently.
In both situations the problem is that the only obvious solution to make the changes is the most direct route, which is fairly lengthy, and the system will be broken from the very start of the changes right up until the point the work is completed. Most people with some experience also realise that the work will probably take longer than originally estimated, which means we’ll have a non-working (and non-deployable) system for a very long time.
To help us find other solutions I use an analogy of a word puzzle. You have two words of the same length; you want to transform the first word into the second word, but you’re only allowed to change one letter at a time, and with each change you must still have a valid word.
There’s an example online which changes HARD to SOFT. Without the constraint of always requiring a valid word the obvious solution allows us to make the changes in four steps: HARD -> SARD -> SORD -> SOFD -> SOFT. We have a broken system throughout the process except at the start and end.
But if we insisted on always having a valid word at every step, how would you do it…?
It takes a lot more thought, and it’s clear it will require more steps than the obvious, direct solution. We need to get creative. But it is possible, and at every step we have a working system (which in this analogy means a valid word).
Here’s a solution for this particular pair of words:
When it comes to making system changes—a single codebase or a complex system—the teams I’ve worked with have always found a route that allows frequent points of stability and deployability. The first time is always the most difficult, and it can require much imagination and frustration. But they tend to get there in the end, and it does get much easier with practice.