Using Data Replication to Enable Operational Synchronization

In my last post, we looked at some common use cases for operational synchronization, and each of those examples were effectively abstractions of scenarios in which there is benefit in establishing consistency and currency among either logically or physically distinct data assets. For example, creating a holistic and complete view of shared data entities is critical to any distributed master data management repository or distributed identity management service. Read more

Partial Entity Resolution

The other day I had a conversation about product master data, and one of the participants, almost as an aside, mentioned a concept of a “virtual product.” More specifically, he was referring to an operational context in which a maintenance team needed to look for a type of a part to be used to replace a existing worn machine part. The curious aspect of this was that they were not looking for a specific part. Rather, they needed to describe the characteristics of the part and then see which available parts match those characteristics. If none were available, they’d either need to create a new one or search other suppliers for a matching part.

Read more

Standards for “Meaningful Use” Before Semantics

March 23, 2011 by · 2 Comments
Filed under: Business Rules, Data Governance, Recommendations 

I had done some talks on health care business intelligence and data quality. This morning I was pointed to a short article about health care and data in which the claim was made that the “health care field is fertile ground for semantic tech.” In the article, a reference to NIEM (the National Information Exchange Model) was made suggesting a “greater use of semantics.” Former Fed and now CTO at Accelerated Information Management Michael Daconta is quoted as saying that “the federal government’s ‘meaningful use’ directive, which focuses on the adoption of electronic health records, calls for decision support.”

Read more

Data Quality Profiling and Assessment – Some Questions for the Client

March 18, 2011 by · Leave a Comment
Filed under: Business Rules, Data Profiling, Data Quality, Metrics 

Yesterday our company was approached to provide a proposal for a data quality assessment project as part of a more comprehensive data quality assurance effort. When we get these types of requests, I am always amused by the fact that key pieces of information necessary for determining the amount of work. We typically have some basic questions in order to scope the level of effort, including:

• What data sets are to be used as the basis for analysis?
• How many tables?
• How many data elements?
• How many records in each table?
• Are reference data sets available for the common value domains?
• How many business processes source data into the target data set?
• How many processes use the data in the target data set?
• What documentation is available for the data sets and the business processes?
• What tools are in place to analyze the data?
• Will the client provide access to the sources for analysis?
• How is the organization prepared to take actions based on the resultant findings?

In general, I like to think that my company is pretty good at doing these types of assessments – of course, I wrote the book (or at least, a book) on the topic ;-).

Lather, Rinse, Repeat, Repeat, Repeat, …? Repetitive Data Correction

December 22, 2010 by · Leave a Comment
Filed under: Business Rules, Data Quality 

In a recent discussion with a client, I was told about a situation in which there is a flip-flopping of automated data corrections. One day a record is identified as having an error (as part of an identity resolution process), the matching records are compared and a survival rule is applied that essentially deletes the old record and creates a new record. The next day, the new record is determined to be in error, again as part of a matching process, and a different survival rule is applied that, for all intents and purposes, reverts the record back to its original form.

This has become commonplace in the organization. So much so that are already aware of these repeat offenders and can track how many corrections are done for the first time and how many have been done before.

One might call the automation into question – how can it continue to go back and forth like that every day? I think there is a deeper issue involved having to do with the way the data is collected. For some reasong a correction rule is triggered by some set of value combinations, but the rule-based correction has not been properly vetted. The result is that the corrected version still does not comply with some set of expectations.

Recognition of repetitive correction indicates opportunities for increasing the levels of maturity for data quality management. Relying on automation is good, but less so if  checks and balances are not in place to validate the applied rules.

Next Page »