In my last post we looked at the environmental drivers for the assessment criteria for a data replication solution, including environmental complexity, the need for application availability, the need to accommodate different types of systems and models, the growing volumes of data, and going beyond a point-to-point set of solutions. As I suggested, these frame the dimensions by which one might scope a data replication solution, and in conversations with both Ash Parikh and Terry Simonds from Informatica (here is the third installment of that conversation), we shared some thoughts about how they are approaching these dimensions in ways that reduce costs, speed delivery, and limit risk: Read more
In my last post, we discussed two (presumably) complementary business drivers for instituting a standard enterprise-wide strategy for data availability: the desire to absorb massive amounts of data for analytical purposes (AKA “big data”) while simultaneously enabling accessibility to internal data stored across a variety of different siloed systems that have evolved organically over the years. Yet while the desire for decreasing the latency for data access, often to the point of what is fuzzily referred to as “real-time,” drives the expectation for immediate accessibility to all data sets, it is valuable to take a step backward and consider the characteristics of the environment that need to be effectively addressed: Read more
Almost everywhere you look these days, there is talk about big data, big data analytics, and the value of massive data volumes, and underscoring the demand for exploiting big data is the need to manage big data. This will be critical when dovetailing the desire for instituting analytical systems and addressing real-time needs for operational decision-making. Whether your company is looking to streamline supply chain management and inventory control, or deriving insight for enhancing customer experiences using numerous data streams linked with existing customer profiles, the best advantage comes from enabling the integration of analytics with operational systems in real time, or at least within the window of a defined (typically short) time frame. Read more
If you have been following this series of articles about data validation and testing, you will (hopefully) come to the conclusion that there is a healthy number of scenarios in which large volumes of data are being moved (using a variety of methods), and in each of these scenarios, the choices made in developing a framework for data movement can introduce errors. One of our discussions (both in the article and in discussions with Informatica’s Ash Parikh) focused on data integration testing for production data sets, while another centered on verification of existing extraction/transformation/loading methods for data integration (you can listen to that conversation also).
In practice, though, both of these cases are specific instances of a more general notion of migration. There are basically two kinds of migrations: data migrations and system migrations. A data migration involves moving the data from one environment to another similar environment, while a system migration involves transitioning from one instance of an application to what is likely a completely different application.
Filed under: Data Governance, Data Integration, Data Quality
What is now generally referred to as “data integration” is a set of disciplines that have evolved from the methods used for populating the data systems powering business intelligence: extracting data from one or more operational systems, their transfer to a staging area for cleansing, consolidation, transformations, and reorganization in preparation for loading into the target data warehouse. This process is usually referred to as ETL: extraction, transformation, and loading.
In the early days of data warehousing, the ETL scripts were, as one might politely say, “hand-crafted.” More colloquially, each script was custom-coded in relation to the originating source, the transformation tasks to be applied, and then the consolidation, integration, and loading. And despite the evolution of rule-driven and metadata-driven ETL tools that automate the development of ETL scripts, much time has been spent writing (and rewriting) data integration scripts to extract data from different sources, apply transformations, and then load the results into a target data warehouse or an analytical appliance. Read more