Data Quality, Data Cleansing, Data Migration: Some Questions

February 1, 2011 by · 2 Comments
Filed under: Data Quality 

The other day I had a conversation with a prospective client who mentioned to me that the company is looking at changing their key processing system and was told by one of the potential vendors that they had to clean up their data before it could be migrated into the new system. This person, intrigued by this comment, did a bunch of research about data cleansing and asked me whether this made sense. After a few questions, I learned that the vendor claimed that unless the data were “clean,” the new system would not work right. Of course, my curiosity was raised at this comment, since in my opinion, before you “clean” (or rather in this case, transform/normalize) the data for a target system, don’t you need to know which system you are planning to migrate to? And if they had not yet selected a vendor system, how would they know what they needed to “clean”?

This got me thinking about the link between data migration and data quality. Actually, in a number of client situations, the company is considering a large investment in a new system – a new contract administration system, a new pricing system, a new sales system – requiring a significant $$$ investment. And consequently, in each of these cases, the question of the quality of the legacy data is raised as a technical hurdle that must be jumped as opposed to a key component of making the new system meet the business needs of the organization. So this has triggered a few more questions about system replacement, data migration, and data cleansing:

• What is the intent of the new system?
• What features of the old system were inadequate? How were they related to the quality of the data?
• What are the features of the new system that are expected to alleviate those shortcomings? What are the dependencies on the existing data?
• What other business processes will derive value from the data created or modified within the new system?
• What is the target model? Is metadata available at the data element level?
• Who is assessing the target system data requirements?
• What process is in place for source to target mapping?
• What process is in place for programming the transformations?
• What do you do with data instances that do not transform properly? Is there a remediation process?
• What cleansing needs to be done? Is that different from transformation?
• What processes are in place for validating source data against target model expectations?
• What is the data migration plan?
• Will both systems need to run at the same time until the new system is validated?

Any thoughts of adding to the list? Please feel free to post additional questions by adding a comment…