I Know You’re in There…

May 17, 2011 by
Filed under: Data Quality 

Sorry for the long hiatus – travel has gotten the best of me. But I have been thinking a lot about an issue and thought it would be worth sharing: predictability of errors.

Let’s do a little though experiment. Let’s say we are managing a product catalog that is composed of part and product items submitted by a set of suppliers. Of course, most of our suppliers often upgrade their lists of products. Some old products are dropped, some new products are introduced, and some product entries are updated. Our suppliers like to make sure that our view of their products is up to date as well, so they provide updates on a semi-regular basis. In addition, some of the updates are provided because of errors in the existing records.

While there are different reasons for updates, I can monitor the frequency with which updates are made, and I can also track the types of updates that are made (deletes, new products, and corrections). And given those numbers, I should (over time) be able to profile specific quality characteristics of my product catalog, particularly with respect to currency and correctness.

Simply put, I will know that if a supplier corrects 3% of their records every month, that establishes two data points. First, it says that the data is at its most current immediately after the update (duh). Second, it says that by the end of every month (before their corrections) I would expect that 3% of their records are incorrect (also duh).

But that second expectation has some more important subtleties. For example, while I reasonably expect that 3 out of 100 records are incorrect, I don’t know which ones they are. Also, if the historical trend is 3%, if the next month the supplier’s correct rate is only 2%, then perhaps the supplier missed some corrections that still exist in the data set.

The idea that I can be aware of the existence of errors suggests that there might be strategies to address the situation. More on this in an upcoming post…


One Comment on I Know You’re in There…

  1. The Persistence of Error | The Data Roundtable on Tue, 2nd Aug 2011 10:03 AM
  2. […] few weeks ago I posted a blog entry about analyzing the existence of errors in a data set as a way of anticipating data quality flaws/failures prior to their incurring any business impact. […]

Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!