I made up that word, and it stands for “funding conundrum.” I was actually starting to think about this after my other recent post on data quality and dieting. It is the problem of establishing continuity in good data quality practices long after the issues are gone.
Let’s say that our company is directly impacted by a number of data quality issues where errors lead to unexpected or undesirable results. After quantifying the negative impact and doing a cost/benefit analysis for an investment in data quality consultation, processes, and tools, we are able to determine that there is value in finding and fixing the failed processes that introduce errors into the environment. We are good to go…
I was scanning the IAIDQ group listings at LinkedIn and came across a reference to a blog posting about the “Top 10 Reasons Data Quality Projects Fail.” I read the blog posting and while there is a reference to data quality projects that “fail,” it does not specify what is meant by “failure.” The list details some technical and/or operational aspects of not completing tasks associated with data quality techniques, but it is difficult to say that the project has failed unless one can discretely define what is meant by success. And this has got to be clearly defined in terms of the degree to which the data meets expectations for achieving business objectives.
Anyone who regularly reads this web site as well as my other media outlets knows that I am an advocate of clearly defining measures for assessing how poor data quality impacts the business. In fact, one of the main challenges of establishing a data quality program is effectively communicating the value of improved data quality to senior managers. You might think that showing some specific failures and their impacts would be enough to make the argument to invest in improvements, but unfortunately this is often not the case.
Management’s attention is more dramatically grabbed by catastrophic events. Acute disasters linked to data issues, ongoing horror stories, and even entertaining anecdotes will probably resonate with many people in the organization because of the drama as well as the chance for motivated individuals to react to the problem with a heroic effort that appears to save the day.
I have just finished a paper sponsored by Informatica titled “Understanding the Financial Value of Data Quality Improvement,” which looks at bridging the communications gap between technologists and business people regarding the value of data quality improvements. Here is a summary:
As opposed to the technical aspects of data validation and cleansing, often the biggest challenge in beginning a data quality program is effectively communicating the business value of data quality improvement. But using a well-defined process for considering the different types of costs and risks of low-quality data not only provides a framework for putting data quality expectations into a business context, it also enables the definition of clear metrics linking data quality to business performance. For example, it is easy to speculate that data errors impede up-selling and cross-selling, but to really justify the need for a data quality improvement effort, a more comprehensive quantification of the number of sales impacted or of the total dollar amount for the missed opportunity can be much more effective at showing the value gap.
This article looks at different classifications of financial impacts and corresponding performance measures that enables a process for evaluating the relationship between acceptable performance and quality information. This article is targeted to analysts looking to connect high quality information and optimal business performance to make a quantifiable case for data quality improvement.
Here is an old riddle: what is the difference between major surgery and minor surgery? It’s minor surgery when it happens to *you* but it’s major surgery when it happens to *me*.
How often is the same principle applied when it comes to tolerance of data errors? Let’s change the joke a little: what is the impact when one customer’s identity has been compromised and other people’s charges show up on that customer’s statement? It is a minimal statistical bloop if you are the credit card company, and it can be devestating if you are the customer. Admittedly, the joke is no so funny anymore, but let’s use it to consider the tolerance levels for data errors.
Presume that some internal process for identity resolution incorrectly linked two individual records and merged them into a single representation. This might account for other people’s charges showing up on one customer’s statement. From the customer’s perspective, this could be a catastrophe – without an understanding of the root cause of the problem, this could indicate identity theft, triggering a significant amount of work to shut down credit card accounts, change bank accounts, contact the many businesses who were charging those closed credit cards to change the automatic billing, contact credit scoring agencies, etc. In other words, this is a significant effort
From the credit card company’s perspective, if they have 20 million customers, and on customer’s statement is out of whack, that is but a miniscule fraction of their total number of customers. In fact, if that same company had 100,000 erred statements, that still is one-half of 1% – still a statistical drop in the bucket.
Impact on revenue? miniscule.
Impact on analytics? lost in the wash.
Impact on risk of exposure of private data? more of a public relations thing.
Impact on customer satisfaction? OK, this might be an issue.
Actually, think about each of these impact vectors – they all refer to different considerations in the business. Note that the tolerance from a revenue generation perspective is much greater than the tolerance from the customer satisfaction view, which is also different from the privacy and security arena.
But who in the company sets the standard? In a recent conversation with some folks at a conference, what is tolerable to one business function (say, for analytics) is less tolerable for another business function (such as security or compliance). I would be curious to spend some time with the leaders of a set of individual business functions and solicit their opinions about their level of tolerance to data errors. I suspect that their levels of tolerance reflect their own views on risk/reward of data validation, as well as their own “pareto point” of level of effort before diminishing returns set in.