Filed under: Business Intelligence, Business Rules, Data Quality
The other day I had a conversation about product master data, and one of the participants, almost as an aside, mentioned a concept of a “virtual product.” More specifically, he was referring to an operational context in which a maintenance team needed to look for a type of a part to be used to replace a existing worn machine part. The curious aspect of this was that they were not looking for a specific part. Rather, they needed to describe the characteristics of the part and then see which available parts match those characteristics. If none were available, they’d either need to create a new one or search other suppliers for a matching part.
I had done some talks on health care business intelligence and data quality. This morning I was pointed to a short article about health care and data in which the claim was made that the “health care field is fertile ground for semantic tech.” In the article, a reference to NIEM (the National Information Exchange Model) was made suggesting a “greater use of semantics.” Former Fed and now CTO at Accelerated Information Management Michael Daconta is quoted as saying that “the federal government’s ‘meaningful use’ directive, which focuses on the adoption of electronic health records, calls for decision support.”
Filed under: Business Rules, Data Profiling, Data Quality, Metrics
Yesterday our company was approached to provide a proposal for a data quality assessment project as part of a more comprehensive data quality assurance effort. When we get these types of requests, I am always amused by the fact that key pieces of information necessary for determining the amount of work. We typically have some basic questions in order to scope the level of effort, including:
• What data sets are to be used as the basis for analysis?
• How many tables?
• How many data elements?
• How many records in each table?
• Are reference data sets available for the common value domains?
• How many business processes source data into the target data set?
• How many processes use the data in the target data set?
• What documentation is available for the data sets and the business processes?
• What tools are in place to analyze the data?
• Will the client provide access to the sources for analysis?
• How is the organization prepared to take actions based on the resultant findings?
In general, I like to think that my company is pretty good at doing these types of assessments – of course, I wrote the book (or at least, a book) on the topic .
In a recent discussion with a client, I was told about a situation in which there is a flip-flopping of automated data corrections. One day a record is identified as having an error (as part of an identity resolution process), the matching records are compared and a survival rule is applied that essentially deletes the old record and creates a new record. The next day, the new record is determined to be in error, again as part of a matching process, and a different survival rule is applied that, for all intents and purposes, reverts the record back to its original form.
This has become commonplace in the organization. So much so that are already aware of these repeat offenders and can track how many corrections are done for the first time and how many have been done before.
One might call the automation into question – how can it continue to go back and forth like that every day? I think there is a deeper issue involved having to do with the way the data is collected. For some reasong a correction rule is triggered by some set of value combinations, but the rule-based correction has not been properly vetted. The result is that the corrected version still does not comply with some set of expectations.
Recognition of repetitive correction indicates opportunities for increasing the levels of maturity for data quality management. Relying on automation is good, but less so if checks and balances are not in place to validate the applied rules.
Filed under: Business Rules, Data Profiling, Data Quality, Metrics, Performance Measures
Data profiling can be an excellent approach to identifying latent issues and errors hidden in your data. We have seen a number of clients using data profiling as the first step in defining data quality metrics and using those metrics for reporting via scorecards and dashboards.
And if I can identify a problem and I can define a rule for determining that the problem exists, should I not be able to fix the problem? Here is a question, though: once I fix the root cause of the problem, do I need to still keep checking if the problem has occured?
More on this in an upcoming post; contact me if you have thoughts…