Filed under: Business Rules, Data Profiling, Data Quality, Metrics
Yesterday our company was approached to provide a proposal for a data quality assessment project as part of a more comprehensive data quality assurance effort. When we get these types of requests, I am always amused by the fact that key pieces of information necessary for determining the amount of work. We typically have some basic questions in order to scope the level of effort, including:
• What data sets are to be used as the basis for analysis?
• How many tables?
• How many data elements?
• How many records in each table?
• Are reference data sets available for the common value domains?
• How many business processes source data into the target data set?
• How many processes use the data in the target data set?
• What documentation is available for the data sets and the business processes?
• What tools are in place to analyze the data?
• Will the client provide access to the sources for analysis?
• How is the organization prepared to take actions based on the resultant findings?
In general, I like to think that my company is pretty good at doing these types of assessments – of course, I wrote the book (or at least, a book) on the topic ;-).
Filed under: Business Impacts, Data Governance, Data Quality
Coincidentally, similar issues came up with different clients over the last week or so, focusing on ensuring the quality of elements of a specific data domain. In both environments information flowed from a number of source systems in the organization, usually starting with some customer-facing process, and in other cases, with data feeds coming from external sources. But what was obvious was that as these different systems have come on line, there had been little documentation of how the processes acquired, read, or modified the data. As a result, when an error occurred, it manifested itself in a downstream application, and it took a long time to figure out where the error occurred and how it was related to the negative impact(s).
This morning I was looking at a spreadsheet documenting data validation scores for a number of data sets at a particular client. The report provided basic measures of quality based on validity and completeness rules applied to a variety of largely location-oriented data elements. What I found interesting was that the coding formula for the error codes incorporated a degree of ambiguity.
Anyone who regularly reads this web site as well as my other media outlets knows that I am an advocate of clearly defining measures for assessing how poor data quality impacts the business. In fact, one of the main challenges of establishing a data quality program is effectively communicating the value of improved data quality to senior managers. You might think that showing some specific failures and their impacts would be enough to make the argument to invest in improvements, but unfortunately this is often not the case.
Management’s attention is more dramatically grabbed by catastrophic events. Acute disasters linked to data issues, ongoing horror stories, and even entertaining anecdotes will probably resonate with many people in the organization because of the drama as well as the chance for motivated individuals to react to the problem with a heroic effort that appears to save the day.