Managing Information Consistency and Trust During System Migrations and Data Migrations

November 29, 2012 by
Filed under: Data Integration, Data Quality, Metadata 

If you have been following this series of articles about data validation and testing, you will (hopefully) come to the conclusion that there is a healthy number of scenarios in which large volumes of data are being moved (using a variety of methods), and in each of these scenarios, the choices made in developing a framework for data movement can introduce errors. One of our discussions (both in the article and in discussions with Informatica’s Ash Parikh) focused on data integration testing for production data sets, while another centered on verification of existing extraction/transformation/loading methods for data integration (you can listen to that conversation also).

In practice, though, both of these cases are specific instances of a more general notion of migration. There are basically two kinds of migrations: data migrations and system migrations. A data migration involves moving the data from one environment to another similar environment, while a system migration involves transitioning from one instance of an application to what is likely a completely different application.

An example of the first type of migration occurs when two businesses merge, and both are running the same underlying ERP software. As the corporate merger proceeds, the ERP functionality will be combined as well, and this implies that the data in one instance of the ERP environment will be migrated into the surviving ERP environment. An example of the second type of migration occurs in a similar scenario, except that the merging companies are running completely different ERP systems. In this case, the data from the system to be retired needs to be extracted and migrated into the new system – essentially migrating off of the old system and onto the new system.

These are not the only situations for either type of migration. In general, enlightened enterprises often take the opportunity to review the existing infrastructure and seek ways to renovate the environment in anticipation of future needs. Hardware renovations require one set of skills, and consolidation of the application environment requires additional care in ensuring that the system migration is verified. That being said, in most cases of application renovation, the data sets from the existing systems need to be migrated. And as most migrations focus on the system and less on the data, the migration process may be prone to introducing errors if not properly monitored.

In fact, the need for comprehensive inspection of the validity of the data is even greater for migration situations, specifically because of the precision necessary to ensure business continuity. By definition, almost any migration situation involves the need to maintain the integrity of existing systems that are being retired while enabling new ones. Transactions logged into the legacy system that are not properly migrated to the new system pose a serious risk to the business. Consider this famous situation in 2001 when athletic shoe manufacturer Nike rolled out a new supply chain system while the previous one was slated for retirement. At the time, (as reported in Information Week of March 1, 2001), “Some orders were placed twice, by the old and new systems, and the new system let orders for new shoe models fall through the cracks.” An over order of some shoe models and under-ordering of others resulted in Nike being forced at the last minute to make some types of shoes and having them shipped via expensive air freight instead of the typical more cost-effective means. Reading between the lines, one can infer a serious data validation issue related to data and system migration. By the way, this issue resulted in a sales projection shortfall of $80-$100 million, as well as a 25% drop in the value of Nike’s stock, a significant real example of a severe business impact directly related to data validity.

Managing consistency and accuracy of both data and system migrations cannot be left to chance. As we have seen in our previous articles, manual data review for the purpose of validation is tedious, sleep-inducing, and generally prone to error. The alternative is to employ automated methods for managing consistency of migrated data. When the migrations involve copies of data, they can be validated using direct comparisons of source and target data sets. In more complex system migrations transitions, some data transformations may have been introduced; in this case, use automated validation and verification tools that can be augmented with the same business rules to ensure that the transformations were applied in the right way.

It is also worth reviewing three key points suggested in this series of articles. First, while the degree of maturity in software testing has increased over the years, we are still at an early stage of maturity when it comes to data testing. Second, there are many situations that would benefit from introducing best practices for data validation and testing: reconciling production data assets, extract/transform/loads from a variety of sources into a data warehouse or set of data marts, as well as the subject of this article, migrations. Third, and most important, manual attempts of data validation are going to be decreasingly effective as data variety expands and volumes grow. The conclusion is that employing automated data validation and verification in concert with good metadata management along with best practices and disciplines for process oversight will result in increased levels of trust for data across the enterprise.


Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!