Characteristics Driving a Data Replication Solution

April 27, 2013 by
Filed under: Data Integration, Replication 

In my last post, we discussed two (presumably) complementary business drivers for instituting a standard enterprise-wide strategy for data availability: the desire to absorb massive amounts of data for analytical purposes (AKA “big data”) while simultaneously enabling accessibility to internal data stored across a variety of different siloed systems that have evolved organically over the years. Yet while the desire for decreasing the latency for data access, often to the point of what is fuzzily referred to as “real-time,” drives the expectation for immediate accessibility to all data sets, it is valuable to take a step backward and consider the characteristics of the environment that need to be effectively addressed:

  • Complexity of the de facto environment for implementation: Siloed application development carries along a silo mentality for operations and maintenance, in which the patterns associated with system management reflect the idiosyncrasies of the original approaches to deployment. For example, older system may be configured using command-line requests and parameter-based scripts, with little or no oversight for ensuring consistency. Instituting data replication within this type of organizational model takes additional time and carries increased costs.
  • Maintaining high availability for production applications: This is a recurring theme in any environment in which a fundamental capability needs to be modernized or improved while it remains in production. Companies cannot afford to take their systems down for months at a time while they are augmented with new functionality.
  • Variety of data systems and data representation: There are few environments that have completely standardized along a particular hardware and software vendor for data management, and over a 30-40 year time frame, there are differences in the models, approaches, and even sophistication of the different data subsystems. Data replication applications that are limited to a small coterie of vendor approaches pose a risk to maintaining data availability.
  • Scaling to accommodate large data volumes: The growing interest in expanding data volumes for analytics is a common performance roadblock that is supposed to be alleviated using replication. However, you don’t want to manually engineer the necessary scalability (including manually parallelizing database access and distributing data) into your implementation.
  • The need for interoperability: Any modern data integration application has to not only account for the islands of data that exist across the organization, it should not force developers to create new interfaces for delivering accessed data either. Replication solutions must contribute to interoperability.

These characteristics influence the definition of different criteria for a data replication solution, and we will examine those criteria in my next entry. But you can learn more by listening to the second part of my conversation with Terry Simonds at Informatica.


Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!