Deprecated: get_settings is deprecated since version 2.1.0! Use get_option() instead. in /home/vemw27zv035v/public_html/wp-includes/functions.php on line 5326
Questions About the “Cost of Poor Data Quality” : The Practitioner's Guide to Data Quality Improvement

Questions About the “Cost of Poor Data Quality”

July 25, 2011 by
Filed under: Data Quality, Performance Measures 

I was reading through one of Jim Harris’s blog entries about the his reinterpretation of Pascal’s wager in terms of data quality, and the posting made reference to an email he had received from Gordon Hamilton about the estimated costs of poor information quality. I noted Richard Ordowich’s comments from the Linkedin group were incorporated regarding the claims that 15-45% of the operating expense of virtually all organizations is WASTED due to data quality issues and thought I’d spend a little bit of time investigating the origins of some of these numbers.

So I thought it would be worth exploring the sources for some of the more popular claims about the costs of poor data quality.

But just in case you are interested, I spend a lot of time in my book talking about assessing the real opportunities for increased value from data quality improvement.

First off, I came across a posting referring to Larry English’s recent book that had the quote “Poor information quality costs organizations 20-35% of operating revenue wasted in recovery from process failure and information scrap and rework.”

Actually, additional context is provided in this quote, but the emphasis is mine: “In the Costs of Poor Quality Information analyses that we have conducted, combined with the anecdotal evidence we have collected over the past twenty years, the evidence is clear. The Costs of Poor Quality Information as a percent of operating revenue or budget (for government and not-for-profit) is roughly equivalent to the costs of poor quality in the manufacturing and service sectors.”

IN 2003, TDWI produced a report that is often quoted, estimating that “that data quality problems cost U.S. businesses more than $600 billion a year.” Actually, when you read the report and look at the footnotes, you see them qualify that statement: “TDWI estimate based on cost-savings cited by survey respondents and others who have cleaned up name and address data, combined with Dunn (sic) & Bradstreet counts of U.S. businesses by number of employees.” So in fact, that estimate is really an estimate based on survey respondents, who may be “self-selected” to some extent; it might be interesting to go back and review the survey responses.

Further back in time (2002), we see Tom Redman claim that “Poor data quality costs the typical company up to twenty percent of revenue.”

Earlier, in 1999, Larry English’s previous book suggests that the costs are actually lower: “Based on numerous cost analyses, the typical organization may see from 15 to 25 percent of its revenue go to pay the costs of information scrap and rework.”

However, in an article published a year before that book was released, we see his claim that “If early data assessments are an indicator, the business costs of non-quality data, including non-recoverable costs, rework of products and services, workarounds, lost and missed revenue, may be as high as 10-20 percent of revenue or total budget of an organization”

And Tom Redman’s 1998 article in Communications of the ACM comments that his “article would be enhanced with an estimate of the total cost of poor data quality, but studies to produce such estimates have proven difficult to perform.” However, he then notes that he is “aware of three proprietary studies that yielded estimates in the 8–12% of revenue range.

So where are we? Here are some additional qualifying notes: First of all, I am a firm believer that the value gap attributable to poor data quality is real and can be estimated; see my series of articles; here is a link to one specific paper. I think that if effort is invested in understanding where value is impacted as a result of data issues, you can estimate the value of improvements.

However (and second) I do want to point out that I am not biasing my research here – I am quoting directly from published sources. Third, there are a lot of papers and articles with less-refined methods suggesting scenarios in which one could estimate a cost, (including my own) that do not show hard numbers. I am definitely open to notes regarding actual costs, savings, and added value. Fourth, there are some more academic attempts to collect a bunch of theories and provide a unified approach to estimating costs, if you are willing to invest the time to read through them.

From the sources I found, all of whom are reputed to be experts in the data quality space, we can conclude the following:

1)      There are few (if any) published papers on actual case studies providing tangible details about the cost of poor data quality.

2)      What academic notes and books that do exist and attempt to suggest the costs of poor data quality base their numbers on estimates, “proprietary studies,” accumulations from survey responses, or extrapolation from other estimates of the “cost of quality.”

3)      Even in the absence of tangible evidence of actual costs, according to the experts willing to state cost estimates, the costs seem to be rising, from a low of 8% of revenue in 1998 to 35% of operating revenue in 2009 (does that mean that the costs of poor data quality increased 400% over a ten year period?).

As I mentioned before, I am very open to have suggestions about actual case studies or reports that provide researchable numbers (that means the numbers are published and can be reviewed) about evaluating the costs of poor data quality. Having access to these types of articles, reports, etc. will enable people like me to refine our approaches to evaluating the value of data quality improvement and helping to truly come up with a model to define clear return on your data quality investment.


3 Comments on Questions About the “Cost of Poor Data Quality”

  1. Tom Lovell on Mon, 8th Aug 2011 9:34 AM
  2. This is the $64 Billion quesiton (inflation). I have done research in this area and know that if it was as easy as providing a number we wouldn’t be talking about the cost so much. My belief is that the cost/benefit is industry, role in that industry, and audience specific. Therefore talking about the cost with a CFO for a Retailer is completely different conversation than speaking to a Supply Chain Person at a Foodservice Manufacturer.

    I do find that the Data Crunches produced by GS1 & IBM have created some interesting sets of costs/benefits for the UK, Austrailia, and India when it comes to Product Data.

    […] his recent blog post Questions about the “Cost of Poor Data Quality”, David Loshin examined a common characteristic of estimates about the costs of poor data quality, […]

    […] was nice to see that Jim Harris referred to my earlier post questioning the experts’ pronouncements of the costs of poor data quality, and it triggered yet another thought about the perception of the value of data quality improvement […]