Data Validation, Data Validity Codes, and Mutual Exclusion

March 11, 2011 by · Leave a Comment
Filed under: Data Analysis, Data Governance, Data Quality 

This morning I was looking at a spreadsheet documenting data validation scores for a number of data sets at a particular client. The report provided basic measures of quality based on validity and completeness rules applied to a variety of largely location-oriented data elements. What I found interesting was that the coding formula for the error codes incorporated a degree of ambiguity.
Read more

Initial Thoughts: Data Quality of Non-Persistent Data Elements

February 22, 2011 by · 1 Comment
Filed under: Data Quality, Performance Measures 

Last week I attended the Data Warehousing Institute’s World Conference in Las Vegas, teaching a class on Practical Data Quality Management. As part of the discussion on critical data elements, I suggested some alternatives for qualifying a data element as “critical,” including “presence on a published report.” In turn, I pointed out an interesting notion that often does not occur to many data analysts, but does require some attention: monitoring the quality of non-persistent data.
Read more

Data Quality, Data Cleansing, Data Migration: Some Questions

February 1, 2011 by · 2 Comments
Filed under: Data Quality 

The other day I had a conversation with a prospective client who mentioned to me that the company is looking at changing their key processing system and was told by one of the potential vendors that they had to clean up their data before it could be migrated into the new system. This person, intrigued by this comment, did a bunch of research about data cleansing and asked me whether this made sense. After a few questions, I learned that the vendor claimed that unless the data were “clean,” the new system would not work right. Of course, my curiosity was raised at this comment, since in my opinion, before you “clean” (or rather in this case, transform/normalize) the data for a target system, don’t you need to know which system you are planning to migrate to? And if they had not yet selected a vendor system, how would they know what they needed to “clean”?

This got me thinking about the link between data migration and data quality. Actually, in a number of client situations, the company is considering a large investment in a new system – a new contract administration system, a new pricing system, a new sales system – requiring a significant $$$ investment. And consequently, in each of these cases, the question of the quality of the legacy data is raised as a technical hurdle that must be jumped as opposed to a key component of making the new system meet the business needs of the organization. So this has triggered a few more questions about system replacement, data migration, and data cleansing:

• What is the intent of the new system?
• What features of the old system were inadequate? How were they related to the quality of the data?
• What are the features of the new system that are expected to alleviate those shortcomings? What are the dependencies on the existing data?
• What other business processes will derive value from the data created or modified within the new system?
• What is the target model? Is metadata available at the data element level?
• Who is assessing the target system data requirements?
• What process is in place for source to target mapping?
• What process is in place for programming the transformations?
• What do you do with data instances that do not transform properly? Is there a remediation process?
• What cleansing needs to be done? Is that different from transformation?
• What processes are in place for validating source data against target model expectations?
• What is the data migration plan?
• Will both systems need to run at the same time until the new system is validated?

Any thoughts of adding to the list? Please feel free to post additional questions by adding a comment…

Quality of Master Data and Data Governance Maturity

November 9, 2010 by · Leave a Comment
Filed under: Data Quality, Master Data, Performance Measures 

For a current project I am looking at performance criteria associated with implementing data governance for master data management. One aspect is defining performance measures relating to the lifecycle of master data. As a result, one area under consideration is the quality of master data, and so far there are two aspects associated with the quality of master data:

1) The extent to which the data in a master repository meets the needs of the downstream consumers, and

2) The degree to which the data in the systems of entry meets the needs of the downstream consumers.

To some extent, these focus on the same requirements, but the difference lies in overseeing data validation. Compliance of the master data to defined business rules can be assessed after the fact – that is, after the data has been integrated within a unified master view. However, instituting compliance to those rules before the data has been “integrated” within the unified view implies a more comprehensive agreement regarding enterprise validation. That is because each owner of a system of entry must agree to validate the data against the enterprise requirements (and not just his or her own application’s expectations). And this implies a more sophisticated level of data governance maturity across the organization.

As I flesh out the performance criteria, I will update this site with more information. However, you can read more about data quality maturity by looking to your right and clicking on the link for the free book chapter from my book.