Ensuring that Data Availability Meet the Business Needs

April 24, 2013 by
Filed under: Data Integration, Replication 

Almost everywhere you look these days, there is talk about big data, big data analytics, and the value of massive data volumes, and underscoring the demand for exploiting big data is the need to manage big data. This will be critical when dovetailing the desire for instituting analytical systems and addressing real-time needs for operational decision-making. Whether your company is looking to streamline supply chain management and inventory control, or deriving insight for enhancing customer experiences using numerous data streams linked with existing customer profiles, the best advantage comes from enabling the integration of analytics with operational systems in real time, or at least within the window of a defined (typically short) time frame.

At the same time, even with all the talk about big data analytics and the potential value of analyzing massive volumes of data from a variety of external sources, there is the risk that we lose sight of some of the more challenging aspects of data accessibility and management that plague existing infrastructures that have grown over time through organic siloed application development. The most prevalent manifestation is often described as “islands of data” in which data systems associated with various and sundry applications (including transaction processing systems, operations management systems, and analytical environments) all struggle to live together and satisfy the needs of the data consumers (figuratively speaking, of course). The greater the degree of systemic variety and isolation, the greater the costs to manage heterogeneous access, the more complexity in system integration, and the greater the risk of not being able to deliver actionable information to the right individual when it is needed.

Whether we are forward-facing and looking to scale up to absorb many streams carrying massive amounts of data, or backward–facing and looking to enable systemic interoperability and information delivery in a timely manner, there are some similar criteria for managing the data demand: provide predictability in the timeliness of data delivery, provide a level of trust in the consistency of the data, operate using a standardized mechanism that has limited impact to existing production systems, reduce the number of point-to-point solutions, and be scalable in relation to both data size and variety, among others.

There are a number of technical approaches. One example is stream processing that attempts to incorporate filtering, business rules, and triggers within the information flow network to help manage real-time events. Another, data federation and virtualization, looks at smoothing the differences across heterogeneous systems while embedded caching helps in improving access speed. These are both valuable techniques, but there is another technique that is not only regularly used in production to address long-standing demands for managing rapid data accessibility while remaining consistent, it also can easily satisfy the criteria I identified in the previous paragraph. Data replication enables high-speed data access, and when coupled with trickle-feeds and change data capture, retains a level of consistency with the original source that engenders a level of trust in the data.

To hear more about this topic, check out this conversation I had with Terry Simonds at Informatica.

In my next two blog entries, I will look at two aspects of data replication. First we will drill further into understanding the capabilities worth evaluating when considering a data replication solution, and then we will contrast potential pitfalls that one must look out for when considering replication solutions.


Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!