One of the questions that arises with many teams I work with is: Is it worth spending an iteration (or more) not delivering any features but just working on our technical debt?

This is by no means the only way to deal with tech debt—most people (and I) favour addressing it a bit at a time alongside the usual feature work—but even so, sometimes things just get so difficult that it’s natural to ask whether one big, sustained effort could really help.

So I’ve spent a little bit of time doing some simple modelling to provide some kind of framework to help answer the question. But the question I’m really answering here is a slightly narrower one: If we stop for a period to achieve some kind of boost in productivity, when will we break even, and catch up from the hiatus in feature work? If we can answer this then we can have a more objective debate about whether it’s worthwhile.

Now we can see the answer to the question is really based on two variables: What length of pause are we talking about? And what do we get for that?

The quick outcome of this work is this:

If we spend

Ttime doing nothing but addressing the tech debt, and if it will give us a boost ofB(whereB= 0.1 if we get a 10% boost, or 0.2 if we get a 20% boost, etc) then we break evenT/Bafter restarting our feature work.

However, this is based on a very simple (possibly over-simple) premise. So let’s look at the detail, which demonstrates use of a cost-of-delay way of looking at things…

Let’s start by assuming we are routinely delivering a steady stream of value, week by week or month by month, up to the present time, like this:

*[Update 8 Aug 2017: Over on LinkedIn Richard Wild commented that if we are delivering value at a constant rate then our tech issues don’t really meet the definition of tech debt. I’m undecided on how true that is—tech debt is something that costs more to pay off the longer we leave it, and it doesn’t necessarily slow us down increasingly over time. Either way, I’m going to stick to the current model just to maintain simplicity. Meanwhile you may like to see this as being about “tech issues” rather than “tech debt”. And you might like to look at my follow-up: The cost of delay of technical debt.]*

Let’s call that one unit of value per week/month/whatever, continuously. The x-axis is time, of course. Now we could continue delivering like this, in which case our steady stream continues and our cumulative value is a straight line (shown here in blue):

But what if, instead, we stopped for a period to deal with tech debt and achieve some kind of boost compared to our previous rate of delivery? It would look a bit like this:

Here we can see we’ve delivered no tangible value for a period, but are back delivering more after our time tackling our issues.

Let’s be specific about what we’ve got here. *T* is the time we spent tackling the tech debt, and *B* is the proportional boost we’ve achieved as a result. Since we said our previous rate of delivery is one unit of value per whatever, a 10% boost in productivity means *B* would be 10% of 1, which is 0.1. A 15% boost in productivity would mean *B* = 0.15, and so on:

Now, we’ve clearly lost some value by spending *T* dealing with technical debt. But on the other hand we’re more productive as a result. When do we make back that loss? In other words, at which point will we have delivered the same value from our pause-and-then-productive approach as if we had continued with our ignore-the-tech-debt approach? The answer can be seen with the help of the next diagram:

If the two shaded rectangles are the same area then that’s when we’ve recovered from our pause. The lower left shaded rectangle is the value we lost from our pause. The upper right shaded rectangle is the extra value we’ve achieved from our improvements. If the two rectangles are equal in area then that’s when we’ve recovered from our concentrated tech debt exercise.

What is the width of the upper right rectangle? The lower left rectangle has width *T* and height 1 (because we said we’re regularly delivering one unit of value), so it has area *T*. The upper right rectangle has height *B* and it has the same area as the other one, which means it has area *T*. And therefore its width must be *T* / *B*, because height *B* times width *T* / *B* gives area *T*:

And in this diagram we can see in green the cumulative value we’re delivering in this pause-and-then-productive scenario:

The green slope is slightly steeper after our tech debt work, reflecting the improved productivity.

Overlaying that onto the earlier blue line, showing our cumulative value in the ignore-the-tech-debt scenario, we get this:

The two lines cross at the circled point *T* / *B* after we’ve finished dealing with the tech debt. Or equivalently, it’s *T* + *T* / *B* after we’ve *started* dealing with the tech debt.

Now, there are a lot caveats to this…

Most obviously, it doesn’t tell us whether it’s worthwhile to stop feature development and tackle tech debt. But it does at least give us some reasonable numbers to have a more objective discussion about it, which is a whole lot better than using gut instinct, which is what usually happens.

Another thing it that is doesn’t address the secondary advantages of being able to deliver faster. Johanna Rothman gives one dramatic example where the value of being able to deliver faster meant the organisation was able to keep pace with the market more effectively, and allowed the teams to make additional improvements to their working practices.

And finally, this is a *very* simple model. It assumes value delivery is constant and goes on forever. In reality we can expect many more complexities: failing to tackle tech debt tends to decrease our rate of delivering value; and on the other hand products don’t maintain their value indefinitely, even if we continually evolve them. Ben Godfrey suggests some other models.

But for the moment I think that’s secondary. What we have here is at least the start of a useful conversation. We’ve had some basic practice using cost of delay-like tools to turn an intuitive discussion into something a bit more objective. And we can evolve it from there.

The linked post to aftnn.org has some good points for increasing the complexity of the model by better modeling of tech debt.

I would also like to consider looking at better models for value.

Assuming, v(t) = constant, addressing only tech debt then we can extract the same value at t=0, t=1 week, and t=1 year. What about t=1 decade? is that how it works?

What if v(t) is actually parabolic (or complex series of positive and negative exponentials). A feature is not valuable at the start, grows in value as it is rolled out, eventually it diminishes (heat death of the universe etc).

Doing work, intervening in systems based on human judgement, is the process of changing v(t). For example, by adding a feature, we can increase (or decrease) the gradient of V, the cumulative value. V is always (we hope) increasing, what we can change is how fast.

Maybe we need to rethink the tech debt vs feature balance too? By addressing tech debt, we adjust the parameters that control v, just like we do when working to deliver features. Maybe there isn’t a difference at all?

Many thanks for the thoughts, @BenCord0.

Additionally Lukas Oberhuber has done some work at https://youtu.be/_i53gQen1rs on showing that tackling tech debt at an appropriate rate prevents the codebase running away to the point of technical bankruptcy—when it costs more to make a change than the change is worth.