How much does a production bug cost?

Fixing a defect discovered in production costs 15x to 100x more than fixing the same defect during the requirements or design phase, according to IBM Systems Sciences Institute data as cited by NIST in their 2002 report. At a typical fully-loaded engineer rate of $120 to $150 per hour, a single production defect requiring 8 engineer-hours to diagnose, fix, deploy, and verify costs $960 to $1,200 in direct labor alone, before downtime, customer impact, or incident response overhead.

How much CI time do flaky tests waste?

Google's engineering team reported in 2016 that approximately 16% of their tests had flaked at least once in a given 30-day period. In dollar terms, a 12% flake rate for a 20-engineer team generates approximately $120,000 in annual cost when CI-minute compute and engineer triage hours are modeled together.

What are the latest QA statistics?

The most current published QA statistics come from the Capgemini World Quality Report 2023-24 (test automation adoption at 58%), DORA State of DevOps Report 2023, JetBrains Developer Ecosystem 2023, and the CISQ Cost of Poor Software Quality 2022 report ($2.41 trillion annual cost in the US).

What Do Software Testing Statistics 2026 Reveal About QA?

Software testing statistics 2026: fixing a defect in production costs 15x more than fixing it at the coding stage, according to IBM Systems Sciences Institute research; test automation adoption has reached roughly 58% of teams surveyed by Capgemini's World Quality Report 2023; flaky tests consume an estimated 16% of engineering CI time at scale, per Google's internal data published in 2016 and widely cited since; and test maintenance consistently accounts for 25–50% of QA budgets in mature suites, documented across multiple Capgemini World Quality Report editions.

Every budget conversation about testing eventually needs a number. Without a specific, attributed figure, the argument stalls at opinion. The statistics below are the ones that actually move those conversations. Each is a self-contained, quotable sentence with a named source, a year, and where possible a direct URL. That format exists for a reason: a statistic you cannot attribute is a statistic you cannot defend.

This post is the citation hub for the testing-economics cluster. Where a statistic has a full model or deep analysis behind it, a link points to the companion piece.

Software testing cost statistics

Defect cost by stage

The most widely cited data on defect cost comes from the IBM Systems Sciences Institute, which studied software development projects and found that the relative cost to fix a defect rises sharply the later in the lifecycle it is discovered.

Fixing a defect found during the requirements phase costs roughly 1x the baseline. Moving to design roughly doubles that. At the coding stage, the cost is typically 5x the requirements baseline. Defects that slip into testing reach 10x. Defects that reach production cost between 15x and 100x the requirements baseline to fix, depending on system complexity and downstream impact, according to IBM Systems Sciences Institute data as reported by the National Institute of Standards and Technology (NIST) in their 2002 report "The Economic Impacts of Inadequate Infrastructure for Software Testing" (nist.gov/document/report02-3.pdf, 2002).

The NIST 2002 report independently estimated that software bugs cost the US economy approximately $59.5 billion annually at the time of publication. Adjusted for economy-wide software dependency growth, the Consortium for Information and Software Quality (CISQ) estimated the cost of poor software quality in the US at $2.41 trillion in 2022 (it-cisq.org/the-cost-of-poor-software-quality-in-the-us, 2022).

The IBM/NIST data is the reason "shift left" became a standard recommendation. The production-bug cost deep-dive, The Cost of a Production Bug in 2026, models those multiples against modern fully-loaded engineer salaries.

IBM/NIST defect-cost multipliers rise from 1x early work to 15-100x in production.

Test maintenance as a percentage of budget

Test maintenance consistently accounts for 25–50% of total QA budget in teams with mature automation suites, according to the Capgemini / Sogeti / OpenText World Quality Report across multiple editions (capgemini.com/research/world-quality-report, 2019–2023 editions). The range widens depending on how actively the application under test changes: highly iterative products skew toward the 50% end.

A 2021 edition of the World Quality Report found that 36% of respondents cited "test maintenance and updates" as one of their top three QA challenges, making it the most-cited operational burden after test environment management. The maintenance percentage is the testing budget line most directly at risk as AI-generated code accelerates feature output.

For a worked model of what maintenance costs annually in dollar terms, see the companion deep-dive on the cost of test maintenance.

Flaky-test CI waste

Google's engineering team published internal data in 2016 showing that approximately 1.5% of individual test runs resulted in a non-deterministic (flaky) outcome across their test infrastructure. More actionably, they found that 16% of their tests had flaked at least once in a given thirty-day period (Google Testing Blog, "Flaky Tests at Google and How We Mitigate Them," testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html, 2016). At that rate, roughly one in six tests in a large suite is contributing intermittent noise to CI.

The downstream cost is modeled in the flaky-test cost deep-dive: for a 20-engineer team, the combined CI-minutes and engineer-hours cost of a 12% flake rate reaches approximately $120,000 per year.

Compact dashboard using only cited figures from this article.

How Autonoma Removes the Maintenance Line Item

The cost and maintenance statistics above document a pattern, not a fixed constraint. Test maintenance consumes 25–50% of QA budgets because test suites are written once and then drift as the codebase changes. Each time a route is renamed, a component is refactored, or a new flow is added, some number of tests go stale. Updating them falls on engineers who were supposed to be building features.

Autonoma was built around eliminating that drift. Rather than relying on humans to keep tests synchronized with code changes, our Diffs Agent analyzes every pull request and determines which test cases need to be added, updated, or deprecated based on what the code actually changed. The four agents handle the full lifecycle: the Planner reads the codebase and plans test cases from routes and components, the Executor runs them in a managed preview environment per PR, the Reviewer classifies each result (real bug, agent error, or test mismatch), and the Diffs Agent runs on every subsequent PR to keep the suite aligned. No one maintains tests manually.

The maintenance percentage stat is the clearest argument for this approach. If 25–50% of your QA budget is going to upkeep, and an autonomous codebase-derived E2E layer like ours removes that upkeep cost, you have a concrete dollar delta to put in front of an engineering director. The statistics exist not just to quantify a problem but to quantify what removing the problem is worth.

Time statistics

QA time lost to flakiness and triage

A 2022 SmartBear survey on test automation found that developers and QA engineers spend on average 30 minutes per day triaging test failures, of which a significant portion are eventually attributed to flaky or stale tests rather than real regressions (SmartBear, "State of Software Quality | Testing" report, smartbear.com, 2022). At a 250-working-day year and a fully-loaded rate of $120/hr, that is $7,500 per engineer per year in test-triage overhead before a single real bug is resolved.

Time to write and validate a test

The JetBrains Developer Ecosystem Survey 2023 found that 40% of respondents who write automated tests reported spending more than 20% of their weekly working time on testing tasks (including writing new tests, fixing broken tests, and investigating failures) (JetBrains, "The State of Developer Ecosystem 2023," jetbrains.com/lp/devecosystem-2023/, 2023). For a 40-hour work week, that translates to 8 or more hours per week dedicated to testing-related activities.

Time-to-feedback in CI pipelines

The DORA "Accelerate State of DevOps Report 2023" found that elite-performing engineering teams achieve a median change fail rate of less than 5% and recover from failures in under one hour (Google / DORA, "Accelerate State of DevOps Report 2023," cloud.google.com/devops/state-of-devops, 2023). Teams in the lowest performance tier have a median recovery time of between one week and one month. Slow or unreliable test feedback is consistently identified in DORA methodology as one of the primary blockers preventing teams from moving up performance bands.

Test automation adoption statistics

Test automation adoption rate

The Capgemini World Quality Report 2023 found that 58% of organizations surveyed had automated more than half of their regression test suite, up from 51% in 2021 (Capgemini / Sogeti / OpenText, "World Quality Report 2023–24," capgemini.com/research/world-quality-report, 2023). The same report found that 26% of organizations had achieved more than 75% automation coverage. Test automation adoption has grown year over year since 2019 across all industry verticals tracked by the report, with financial services and technology sectors leading.

The GitLab DevSecOps Survey 2023 found that 57% of developers reported that their teams had partially or fully automated their testing processes (GitLab, "2023 Global DevSecOps Report," about.gitlab.com/developer-survey, 2023). That figure is consistent with Capgemini's data and reinforces that a majority of engineering teams now consider some level of automation table stakes, while full automation (greater than 75% of tests automated) remains a minority achievement.

AI-in-testing adoption

The World Quality Report 2023–24 found that 55% of organizations were actively exploring or piloting AI-assisted testing tools, up significantly from 32% in the 2021 edition (Capgemini / Sogeti / OpenText, "World Quality Report 2023–24," capgemini.com/research/world-quality-report, 2023). However, only 18% of respondents described AI-assisted testing as fully integrated into their quality engineering process, indicating that most organizations are still in early or experimental stages of adoption.

The Stack Overflow Developer Survey 2023 found that 70% of respondents were using or planning to use AI tools in their development workflow, though the survey did not break out testing-specific AI adoption separately (Stack Overflow, "2023 Developer Survey," survey.stackoverflow.co/2023, 2023). Among teams actively using AI in development, testing was consistently listed as one of the top three areas where AI assistance was expected to have the greatest productivity impact.

55% of organizations were actively exploring or piloting AI-assisted testing tools in 2023, up from 32% in 2021. Full integration remains a minority outcome at 18%.

Sources and methodology

The statistics in this article are drawn from the following named, dated, publicly available sources. Where a figure is a range, both ends are grounded in source data. No figures were fabricated or synthesized from non-public data.

IBM Systems Sciences Institute / NIST 2002: Defect cost multipliers by development stage. Primary source cited by NIST in "The Economic Impacts of Inadequate Infrastructure for Software Testing" (nist.gov/document/report02-3.pdf, 2002). Original IBM data from the IBM Systems Sciences Institute, published in the 1990s and updated through multiple editions.

CISQ "Cost of Poor Software Quality in the US" 2022: Aggregate cost of poor software quality estimated at $2.41 trillion for the US economy in 2022 (it-cisq.org/the-cost-of-poor-software-quality-in-the-us, 2022). Used for macro-scale anchoring of the defect cost section.

Capgemini / Sogeti / OpenText "World Quality Report" (2019-2023 editions): Source for test maintenance as 25-50% of QA budget, automation adoption rate (58% in 2023), and AI-in-testing adoption (55% exploring/piloting in 2023) (capgemini.com/research/world-quality-report). The report is published annually and draws on a survey of 1,750+ IT executives worldwide.

Google Testing Blog, 2016: "Flaky Tests at Google and How We Mitigate Them" (testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html, 2016). Source for the 1.5% per-run flake rate and 16% of tests flaking in a 30-day window.

SmartBear "State of Software Quality | Testing" 2022: Source for 30-minutes-per-day triage time per engineer (smartbear.com, 2022). Annual survey of software quality practitioners.

JetBrains "State of Developer Ecosystem 2023": Source for 40% of developers spending more than 20% of their time on testing tasks (jetbrains.com/lp/devecosystem-2023/, 2023). Annual survey of 26,000+ developers.

Google / DORA "Accelerate State of DevOps Report 2023": Source for elite team recovery time (under one hour) vs. low-performer recovery time (one week to one month) (cloud.google.com/devops/state-of-devops, 2023). The DORA metrics framework is the industry standard for measuring delivery performance.

GitLab "2023 Global DevSecOps Report": Source for 57% of developers reporting partial or full test automation (about.gitlab.com/developer-survey, 2023). Annual survey of 5,000+ DevOps practitioners.

Stack Overflow "2023 Developer Survey": Source for 70% of respondents using or planning to use AI tools in their development workflow (survey.stackoverflow.co/2023, 2023). Annual survey of 90,000+ developers.

Final thoughts

The testing statistics above are not academic. They map directly to line items in every engineering budget: the production-bug cost informs incident response resourcing; the maintenance percentage determines how much of QA headcount is forward-looking vs. upkeep; the flaky-test CI waste is literally a cloud invoice you can pull today.

The pattern across all of them is consistent. Testing debt compounds. Defects found late cost a multiple of defects found early. Maintenance consumes resources that could fund new coverage. Flaky tests erode confidence in CI signals. And adoption data confirms that most teams have automated some testing but fewer than one in five have automated it well enough to call it fully integrated.

The three statistics that matter most in a budget conversation are: the production-bug cost multiplier (15–100x the cost of early detection, per IBM/NIST), the maintenance percentage (25–50% of QA budgets going to upkeep), and the flaky-test annual cost (modeled at roughly $120,000 for a 20-engineer team). Together they make a case for investing in automated, self-maintaining coverage rather than growing a manual QA headcount. The return-on-investment math behind that case is worked through in the test automation ROI breakdown.

Autonoma addresses all three. Our agents generate tests from the codebase directly (no recording, no scripting), the Diffs Agent maintains the suite automatically on every PR (eliminating the 25–50% maintenance line), and our Reviewer agent distinguishes real bugs from agent errors and flakes (reducing the triage burden). The statistics quantify what the problem costs. We built Autonoma to remove those costs entirely.

FAQ

Approximately 58% of organizations had automated more than half of their regression test suite as of 2023, according to the Capgemini World Quality Report 2023–24. The GitLab DevSecOps Survey 2023 found a consistent figure of 57% of developers reporting partial or full test automation. Full automation coverage (above 75%) was achieved by only 26% of organizations in the Capgemini data.

Fixing a defect discovered in production costs 15x to 100x more than fixing the same defect during the requirements or design phase, according to IBM Systems Sciences Institute data as cited by the US National Institute of Standards and Technology (NIST) in their 2002 report 'The Economic Impacts of Inadequate Infrastructure for Software Testing.' At a typical fully-loaded engineer rate of $120–$150/hr, a single production defect requiring 8 engineer-hours to diagnose, fix, deploy, and verify costs $960–$1,200 in direct labor alone, before accounting for downtime, customer impact, or incident response overhead.

Test maintenance accounts for 25–50% of total QA budget in teams with mature automation suites, according to multiple editions of the Capgemini / Sogeti / OpenText World Quality Report (2019–2023). The figure reaches the high end of the range in teams with rapidly iterating products. A 2021 World Quality Report edition found that 36% of respondents cited test maintenance and updates as one of their top three QA challenges.

Google's engineering team reported in 2016 that approximately 16% of their tests had flaked at least once in a given 30-day period, and roughly 1.5% of individual test runs produced a non-deterministic result (Google Testing Blog, 2016). In dollar terms, a 12% flake rate on a 500-minute-per-day CI pipeline for a 20-engineer team generates approximately $120,000 in annual cost when CI-minute compute and engineer triage hours are modeled together, per the Autonoma flaky-test cost model.

The most current published QA statistics as of 2026 come from the Capgemini World Quality Report 2023–24 (test automation adoption at 58%, AI-in-testing exploration at 55%), the DORA State of DevOps Report 2023 (elite teams recover from failures in under one hour), the JetBrains Developer Ecosystem 2023 (40% of developers spend more than 20% of work time on testing), and the CISQ Cost of Poor Software Quality 2022 report ($2.41 trillion annual cost of poor software quality in the US). These are the attributed figures used throughout this article.

What Do Software Testing Statistics 2026 Reveal About QA?