Filed under: Data Integration, Metrics, Performance Measures, Replication
In my last post, I introduced the need for operational synchronization, focusing on the characteristics necessary for a reasonable methodology for implementation. In this post, it is worth examining some example use cases that demonstrate the utility of operational synchronization in a more concrete way. Read more
Filed under: Business Rules, Data Profiling, Data Quality, Metrics
Yesterday our company was approached to provide a proposal for a data quality assessment project as part of a more comprehensive data quality assurance effort. When we get these types of requests, I am always amused by the fact that key pieces of information necessary for determining the amount of work. We typically have some basic questions in order to scope the level of effort, including:
• What data sets are to be used as the basis for analysis?
• How many tables?
• How many data elements?
• How many records in each table?
• Are reference data sets available for the common value domains?
• How many business processes source data into the target data set?
• How many processes use the data in the target data set?
• What documentation is available for the data sets and the business processes?
• What tools are in place to analyze the data?
• Will the client provide access to the sources for analysis?
• How is the organization prepared to take actions based on the resultant findings?
In general, I like to think that my company is pretty good at doing these types of assessments – of course, I wrote the book (or at least, a book) on the topic ;-).
You are probably familiar with the fact that over the past few years there have been some new laws passed regarding health care reform. One law in the HITECH legislation requires “HIPAA covered entities and their business associates to provide notification following a breach of unsecured protected health information.” What this effectively means is that if there is a case where some amount of protected health information (PHI) is inadvertently released, the organization that allowed that release is mandated to “provide notification of the breach to affected individuals, the Secretary (of Health and Human Services), and, in certain circumstances, to the media.”
In other words, you lose protected data and you have to report it to the Department of Health and Human Services.
I have just finished a paper sponsored by Informatica titled “Understanding the Financial Value of Data Quality Improvement,” which looks at bridging the communications gap between technologists and business people regarding the value of data quality improvements. Here is a summary:
As opposed to the technical aspects of data validation and cleansing, often the biggest challenge in beginning a data quality program is effectively communicating the business value of data quality improvement. But using a well-defined process for considering the different types of costs and risks of low-quality data not only provides a framework for putting data quality expectations into a business context, it also enables the definition of clear metrics linking data quality to business performance. For example, it is easy to speculate that data errors impede up-selling and cross-selling, but to really justify the need for a data quality improvement effort, a more comprehensive quantification of the number of sales impacted or of the total dollar amount for the missed opportunity can be much more effective at showing the value gap.
This article looks at different classifications of financial impacts and corresponding performance measures that enables a process for evaluating the relationship between acceptable performance and quality information. This article is targeted to analysts looking to connect high quality information and optimal business performance to make a quantifiable case for data quality improvement.
After having led or participated in a number of data quality assessments, I continue to think about good ways to present results of the analysis that convey both the severity of speciifc issues while simultaneously allowing the reader to compare the different issues. I will admit that I am not a “visualization” person, nor do I advocate creating dashboards and scorecards as the end product of a data quality activity. Rather, the scorecard is the means to an end, which is the prioritzation of the issues so that most effective use of resources can get the maximum benefit.
That being said, I do think that radar charts are one good visualization paradigm. A radar chart allows you to map multiple variable in a 2-dimensional view that conveys comparative information. Here is an example:
This example portrays the measures of severity for four different value driver areas for a single data quality issue. By looking at this graph, you can quickly see that incomplete dates have a high financial impact, but relatively low risk and productivity impacts. I am still experimenting with these types of images, and tinkering with excel to figure out how to get multiple axes represented in a single graph so that I can overlay the impact dimension with a “remediation suitability” dimension that presents the time to value, cost to resolve, and staff effort. Together that would provide a summary of the severity of the issue and the feasibility of its resolution. If you have some suggestions, let me know, and when I figure it out I will post a follow up.