Lather, Rinse, Repeat, Repeat, Repeat, …? Repetitive Data Correction
In a recent discussion with a client, I was told about a situation in which there is a flip-flopping of automated data corrections. One day a record is identified as having an error (as part of an identity resolution process), the matching records are compared and a survival rule is applied that essentially deletes the old record and creates a new record. The next day, the new record is determined to be in error, again as part of a matching process, and a different survival rule is applied that, for all intents and purposes, reverts the record back to its original form.
This has become commonplace in the organization. So much so that are already aware of these repeat offenders and can track how many corrections are done for the first time and how many have been done before.
One might call the automation into question – how can it continue to go back and forth like that every day? I think there is a deeper issue involved having to do with the way the data is collected. For some reasong a correction rule is triggered by some set of value combinations, but the rule-based correction has not been properly vetted. The result is that the corrected version still does not comply with some set of expectations.
Recognition of repetitive correction indicates opportunities for increasing the levels of maturity for data quality management. Relying on automation is good, but less so if checks and balances are not in place to validate the applied rules.