Criteria for a Data Replication Solution

April 30, 2013 by
Filed under: Data Integration, Replication 

In my last post we looked at the environmental drivers for the assessment criteria for a data replication solution, including environmental complexity, the need for application availability, the need to accommodate different types of systems and models, the growing volumes of data, and going beyond a point-to-point set of solutions. As I suggested, these frame the dimensions by which one might scope a data replication solution, and in conversations with both Ash Parikh and Terry Simonds from Informatica (here is the third installment of that conversation), we shared some thoughts about how they are approaching these dimensions in ways that reduce costs, speed delivery, and limit risk:

  • Ease of implementation – Instead of defaulting to working within the framework of complex systems that are managed via command-line operation and parameter-based scripts, look for a data replication solution that reflects simplistic configurability via GUI-based tools and reusable components that can deployed directly as services. This approach reduces the effort for programming and configuring scripts, both speeding delivery while reducing costs.
  • Non-intrusiveness – The process for initial synchronization of a replica should connect to the sources and extract data rapidly without introducing any kind of performance drag on the source system. Thereafter by utilizing a data replication technology based on log-based change data capture for continuous incremental delivery of data, , the degree of intrusiveness into the production environment is minimized. With a nonintrusive data replication solution, the work necessary for maintaining a consistent set of replicas is amortized over time, and once initially configured, has a relatively small demand on resources.
  • Heterogeneity – This is critical to enable a seamless range of data availability. As I have noted in the last two posts, there is bound to be a wide variety of hardware, software, database, and data models that need to be made available, so a desirable feature of a data replication solution is broad support for heterogeneous systems.
  • Scalability – Many solutions can be scaled with enough application of elbow grease. However, automating the capabilities that make a solution scalable (such as automatic determination of optimal methods of data loading, automated parallelization, and deployment across commodity components) reduces effort and decreases costs.
  • End-to-end interoperability – Lastly, there is a growing recognition that data integration in general is becoming more of a fundamental infrastructure requirement (as opposed to a supporting technology on a project-by-project basis). Replication itself should support the full spectrum of data availability, from a standardized set of methods for accessing data sources to standards for data delivery, and be part of a holistic strategy for data integration. Look for vendors whose data replication solutions are not dissociated from an end-to-end approach to data integration.

While data replication has long been deployed to ensure predictable performance for geographically disperse environments or as part of a general business continuity strategy for continuous availability, these criteria also address newer usage scenarios such as data warehouse population and continuous refresh and synchronization of data to ensure consistency across different operational environments, as well as supporting master data management and data federation and virtualization.  Therefore, keeping these criteria in mind for evaluation will help decision-makers determine the solutions that best meet their holistic operational and analytical data availability requirements.


Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!