How many errors does the average application have? And, how much do these errors cost companies each year?
We all know application issues cost money, but quantifying it – well, that’s a harder task. Many organizations have been struggling to understand not only the cost of downtime, but how to quantify the quality of their software and what the cost is of poor quality code. A new report from the Consortium for IT Software Quality (1) covers the cost of poor software quality, shedding light on those topics.
— OverOps (@overopshq) March 26, 2019
According to the report, developers make, on average, 100 to 150 errors for every thousand lines of code. Of those, about 10% are serious. Those serious errors have a direct influence on performance failures and slowdowns, thus directly impacting the company’s bottom line.
Poor quality software in the US cost an estimated $2.84 TRILLION in 2018 overall. How much did it cost you? http://bit.ly/2JnYwzx
In the industry as a whole, poor quality software in the US cost an estimated $2.84 TRILLION in 2018. The most obvious contributing factor in this are those massive application failures that become public relations fiascos for companies like Wells Fargo, United Airlines and their friends. In fact, software failures accounted for more than a trillion dollars on their own.
That still leaves a lot of the cost unaccounted for, though. Some of the other contributing factors to this obscene number are issues with legacy systems ($596 billion), time and investment wasted on cancelled projects ($54 billion) and, the factor that we’ll focus on in this post, technical debt ($511 billion).
What is Technical Debt?
Technical debt, like most terms in the technology sector, has a different meaning depending on who you ask. It’s come to mean just about anything from bugs or defects to legacy code or undefined workflows. Generally speaking, anything that we do or have done in the past that prioritized speed over quality is a contributor to our technical debt.
Herb Krasner, the author of the report on the cost of poor software quality, defined the term more narrowly as, “a forward-looking metric that represents the effort required to fix problems that remain in the code when an application is released.” He further specifies that it “includes only those problems that are highly likely to cause severe business disruption; it does not include all problems, just the most serious ones.”
Problems with a high likelihood of crashing an application or causing serious performance issues clearly contribute to technical debt. But with this definition, we may overlook the contribution of massive volumes of “inconsequential” errors.
Every application has errors. Any team lead or engineer that tried to introduce the goal of eliminating 100% of application errors would be laughed out of the room. Most of those errors are more or less harmful. If the failure rate of a particular transaction is 0.0002%, it would likely go unnoticed by users and engineers alike as it wouldn’t have a noticeable impact on overall performance.
On their own, these errors may not be considered “serious”, but consider them as a whole and more hidden costs come to light. Log files are bigger, CPU usage goes up and more than that, some of those errors may eventually contribute to more serious problems down the road.
So, what is technical debt? I’d argue that it’s more than just the most serious errors, it’s the combination of critical errors that are deployed to product (either knowingly or unknowingly) and the growing mass of “benign” errors that ultimately contribute to increased overhead costs.
The Cost of “Maintaining” Technical Debt
The whole idea of technical debt is that it more or less comes from us taking shortcuts or building software in a way that doesn’t easily scale. It’s fairly well known that the majority of yearly IT budgets go to just “keeping the lights on”, meaning just operating and maintaining existing software. Since technical debt is inherently part of that existing software, let’s look at how that budget is distributed.
Krasner lays out 4 main kinds of maintenance in his report, explained below:
- Adaptive Maintenance: Similar to corrective maintenance, this is related to modifications that are needed due to changing business environment such as change in goals, for example. This is the biggest chunk of our maintenance costs at about 50%.
- Perfective Maintenance: Aside from general fixes and changes to our business plan or product roadmap, we also spend a fair amount of money to improve or “perfect” our software and its overall performance and usability. This is generally about 25% of our maintenance costs.
- Corrective Maintenance: The modification of software to correct issues that are found after deployment to production. This generally accounts for about 20% of software maintenance costs.
- Preventive Maintenance: A relatively small portion of the budget, about 5% of maintenance costs, are spent on modifications to the software after deployment to identify and fix errors that have the potential to cause software failures.
Now, technical debt can most easily be identified as contributing to the need for corrective maintenance. Any time we rush to build a new feature with shortcuts in our code or deploy to production without running our full suite of tests, we’re introducing technical debt that will potentially need to be dealt with through corrective maintenance.
We can also consider how adaptive maintenance relates to technical debt. There’s a famous adage, “change is the only constant,” and businesses are no exception (ha!). Things like company goals, workflows and success metrics are changing all the time, and the majority of our maintenance costs are dedicated to making sure the software keeps up with those changes.
Although not directly related, the existence of technical debt makes keeping up with changing a business environment increasingly difficult. Just imagine the challenges involved in transitioning a 15 year old legacy monolith application to microservices without any of the original team members that originally built it.
So What Can We Do?
The “what” is actually pretty clear here, we need to stop deploying poor quality code to production. Easier said than done. Of course, the “what” isn’t the issue here, it’s the “how.” We aren’t deploying these errors on purpose or for lack of trying. We have our review flows, QA and automated testing suites… Those errors just keep getting through. After all, we can’t foresee every possible scenario that the code will experience in production.
In order to reduce the technical debt afflicting your application, you need access to better data. The OverOps Platform integrates into your entire software delivery lifecycle, from development and QA to production, and uses AI to analyze your code as it’s executing to automatically detect and prioritize issues. By combining static and dynamic code analysis, OverOps captures unique code-aware insight about every error and exception––both caught and uncaught––in any environment, including production.
This deep visibility into the functional quality of applications and services helps developers deliver higher quality code and empowers them to quickly resolve issues, even in production, as soon as they appear.
Interested in learning more about companies that use OverOps to reduce technical debt and overhead costs? Check out this post about using OverOps to block bad releases from deployment with our Jenkins integration.