Tech debt in depth: What is it, how is it measured, and what to do about it?
Most every Engineering Manager would agree that they need to keep their team’s tech debt low. Beyond that, you’ll find little agreement in how to identify or address tech debt. This guide will provide data-backed examples to address a few of the most important questions faced by pragmatic Engineering Managers: “where is our tech debt?” "how much debt is there?" and “when must the debt be addressed?”
Before diving into why tech debt matters, it should be noted that every useful real-world product, even the really good ones, have tech debt. In fact, they usually have lots of it, because "more code" is inextricably entwined with "more tech debt." If a code base is lucky enough to make it 5 years, its code base might have 30-50% considered “tech debt” by its developers. After five years, it tends to get even worse.
But we’re getting ahead of ourselves. Many teams are extremely effective in spite of their tech debt. If you're an Engineering Manager that knows what to pay attention to, you can both contain your tech debt and pay it down when advantageous.
📖 "Tech debt" defined
Since you found your way to this article it can be safely assumed that you have at least a passing familiarity with the concept of “tech debt.” The term was originally coined by Ward Cunningham, who explained the nomenclature thusly
Often, what makes code "time-consuming to maintain" is that changing it has repeatedly caused bugs. When a developer is tasked with working on a bug-prone legacy system, tech debt is felt as "code that requires deep, time-intensive testing after any change is made."
For non-technical product team members, a good analogy for "tech debt" is a house with an old, leaky roof. You can still live in the house, but the longer you do, the more you’re getting dripped on. When it rains, you have to set up buckets to catch the drips. If you’re too busy to empty the buckets that day, then your carpet gets wet. If an overflowing bucket isn't addressed promptly, then the carpet begins to rot. The “roof” in this analogy still "works" in some sense, but every day you let it remain unfixed, the problem gets a little bit worse.
Much like emptying buckets to service a leaky roof, tech debt saps energy and willpower that could be spent on more satisfying tasks. But there are still cases where tech debt can be deployed thoughtfully to get software out the door faster. Martin Fowler used this graphic to describe the assorted origins of tech debt
This graphic concisely shows that sometimes, you settle for a "leaky roof" because you have a deadline and that is the only type of roof that could be built before the deadline expires. In such cases, it's essential for management to realize the tradeoff they are making, and to have at least a rough plan for when the tech debt will be repaid.
🧐 How does tech debt show up in the real world projects?
How does tech debt manifest in real world projects? It depends on the phase of the project, but there are a few common forms of tech debt that experienced technical leaders will recognize:
DuplicatedDuplicated code. Arguably the most ubiquitous problem in software development. When all execution paths run through the same systems, the maintenance of those execution paths is simplified. In the real world, there are usually tens of thousands of files within a large git repo, so the odds are stacked against shared code paths. This leads to similar/redundant systems being maintained that cause hard-to-debug problems reported by customers.
Devops. Tech debt from Devops is common at all phases of a company’s growth. The faster the customer base grows, the more that an imperfect server configuration will contribute to possible downtime. In a perfect world, developers would learn all of the config settings for the database and the web server, and carefully choose the parameters that best match the project's use case. In the real world, people usually don't have time to learn about server configuration, so they use the default configuration and hope for the best. 🤷
Lack of tests. This is one of the more interesting flavors of tech debt, because if a company stays small enough (3 or less developers), it can weather this type of tech debt almost indefinitely. But for the average real-world dev team, whenever a new feature is implemented, the countdown until it will break is underway. There are many factors that feed into why code "decays" as it ages, but the end-result is that every time a new feature is launched without unit and integration tests, it is a type of tech debt that will later need to be revisited.
Lack of documentation. Similar to lack of tests, this problem can be largely sidestepped until the team grows past three developers. At that point, a lack of documentation (and its evil sibling, outdated documentation) slows down updates to the system. The problem is especially bad for new developers, who bear the brunt of code discovery costs. Without documentation, new developers will invest hour or days to learn the unwritten rules that govern repo contributions.
Lots of code. Every new line of code added to a project increases the scope of what must be maintained by future developers to the project. That’s why a large repo is inherently a debt-filled repo. Before any new developer can contribute to it, they must understand the existing system well enough to avoid duplications and adhere to code conventions. The larger the repo, the longer that learning process takes.
Some managers find it comforting to realize that even successful projects are swimming in some flavor of their own tech debt. Every week, engineering leaders make decisions about how much to build up or pay down their team's tech debt. Debt-ridden code is faster to write, so when prototyping, there’s strong incentive to set aside tech debt as a consideration. That works for awhile, but at some point, the piper must be paid.
🛠️ When to take action against tech debt?
Using a tool like GitClear's Tech Debt Inspector, it's possible to quantify the extent to which tech debt is slowing down the evolution of a particular system. However, what no tool can measure is the morale impact of developers being pulled away from fulfilling work in service of debt triage. Compared to creating features, paying down tech debt is less intrinsically satisfying. This helps explain why new developers are often directed to start by triaging bugs--it’s unfulfilling work and it flows downhill. Experienced developers are tempted to avoid it on behalf of more satisfying tasks (e.g., new features ✨).
Engineering Managers don’t often think about tech debt too much until they have a disaster, like a big release get deferred, or a system outage that costs revenue. At that point, they conclude that they waited too long to address their repo's tech debt. A more proactive approach, employed by seasoned Engineering Managers, is to keep the pulse of certain "action signals" that foreshadow the debt coming due.
Action signal: slow velocity in an oft-used directory
Recall that tech debt is any code that is “disproportionately time-consuming to maintain.” Expressing this cost formula mathematically,
To empirically calculate where tech debt resides in a project requires knowing the velocity of code changes in an average directory, and in a tech debt directory.
To find the directories that currently harbor the greatest volume of tech debt (the "worst case" tech debt directories), it is possible to sort a master list of leaf directories in a git repo, in order of velocity. As an example, here is the sorted list of highest tech debt directories during the past year in our GitClear repo:
For the numerator of the “velocity” equation, GitClear calculates the amount of code evolution via Line Impact. For the denominator, we estimate how much time a particular change took. If you're curious, you can read how GitClear estimates time used per commit here.
A proactive manager can use the Tech Debt Inspector to make an empirical estimate of how much progress a particular directory's tech debt has cost. For example, in the screenshot above, we see 18 hours were spent on the
This reference guide helps to convey the magnitude of 1,170 in wasted Line Impact. It's about as much forward progress as the entire Facebook React team (~10 developers) makes in one day. A manager doesn't want to see too many directories like that in their project.
Action signal: bug reports
Another signal that can be watched for tech debt is the volume of bug reports received by the support team. While tech debt is not to blame for every bug report, it is generally true that unwieldy, debt-ridden code is a breeding ground for bugs.
If the customer support team reports ongoing bug submissions in a particular system, it's worth checking whether the directories in which that system's code resides may suffer from tech debt.
Action signal: developer attrition
One of the best reasons to act on tech debt more sooner than later is developer morale. As researched by Besker, Ghanbari, Martini and Bosch,
One good way to assess the impact that tech debt is having on a developer team is to survey them about it. Developers, especially experienced ones, often have a usable intuition for the extent to which tech debt is chasing out morale and pushing down productivity.
⚔️ Reducing existing tech debt
If you've identified important directories with tech debt, how to go about fixing that?
While it's a good starting point to group tech debt by directory, it's more precise to say that tech debt is accrued per file. Thus, to remedy the debt requires a per-file strategy. A good starting point is to use the Tech Debt Inspector to click into the directory that was identified as low velocity, and check which specific files and commits made developers struggle most.
Because the right antidote to tech debt is very situational, it is outside the scope of this document to outline every possible remedy. That said, it is possible to invert the patterns that produce tech debt to create a list of actions that reduce tech debt:
- DRY code. Reusing an existing method saves time re-inventing the wheel, while also improving the extent to which the original method is exercised.
- But not too DRY 🙃. Be wary of reusing a pattern when it means adding new parameters to the method.
- More tests. Especially when it comes time to upgrade libraries, teams with thorough test coverage can evolve their repo more quickly. Tests also serve to reduce time-intensive research into customer-reported bugs.
- Better, more up-to-date documentation. When experienced developers share their knowledge with new developers, everyone wins. At a minimum, we recommend that each project consider adopting the best practice of including an architecture.md file.
- Use modern libraries. Far from a hard-and-fast rule, but teams that keep up-to-date with modern libraries spend less time triaging dependency and security issues.
- Reduce rate at which new developers are added to project. Not practical advice for many teams, but still makes the list because the single greatest contribution to tech debt is developers who break project conventions and rewrite methods that already existed. These habits are most often found in new developers and Junior Developers who have less experience wrestling with tech debt.
- Better devops monitoring, earlier notice when regressions occur. Good devops monitoring reduces time-intensive research into system issues. It also reduces downtime.
Depending on the starting state of the code, these steps could be simple or complex to undertake. As a general rule, the earlier you begin to tackle them, the simpler the path to remediation will be.
Since “the company that ships fastest” is usually “the company that wins,” keeping tech debt low is arguably the biggest predictor of whether a product will remain on the cutting-edge as it ages and fights against newer, less debt-ridden, incumbents.