Despite my clear understanding that the world’s data volumes are growing by leaps and bounds, I sometimes wonder whether the information management industry’s hyperfocusing on unstructured data seems a bit over the top. Yes, I know that social media channels such as Twitter and LinkedIn and Facebook, and etc. are pushing mounds of what we want to believe is valuable content that can be mined for exploitation in terms of targeted marketing and upselling and cross-selling. But when you actually sit down and read a series of Twitter tweets, for example, you might notice a few things. First of all, a lot of the activity is not original, but is merely a repeat of something someone else said. Second, the ability to follow a thread based on the hash tags is limited by the absence of all metadata; the same tag may be used for any number of concepts, and presuming they can be converged is actually somewhat naïve. Third, much of the content is formulaic and even automatically generated as part of a corporate social media initiative designed to maintain a social media presence, even at the mercy of publishing anything with significant content. Read more
In my last post we looked at the environmental drivers for the assessment criteria for a data replication solution, including environmental complexity, the need for application availability, the need to accommodate different types of systems and models, the growing volumes of data, and going beyond a point-to-point set of solutions. As I suggested, these frame the dimensions by which one might scope a data replication solution, and in conversations with both Ash Parikh and Terry Simonds from Informatica (here is the third installment of that conversation), we shared some thoughts about how they are approaching these dimensions in ways that reduce costs, speed delivery, and limit risk: Read more
In my last post, we discussed two (presumably) complementary business drivers for instituting a standard enterprise-wide strategy for data availability: the desire to absorb massive amounts of data for analytical purposes (AKA “big data”) while simultaneously enabling accessibility to internal data stored across a variety of different siloed systems that have evolved organically over the years. Yet while the desire for decreasing the latency for data access, often to the point of what is fuzzily referred to as “real-time,” drives the expectation for immediate accessibility to all data sets, it is valuable to take a step backward and consider the characteristics of the environment that need to be effectively addressed: Read more
Almost everywhere you look these days, there is talk about big data, big data analytics, and the value of massive data volumes, and underscoring the demand for exploiting big data is the need to manage big data. This will be critical when dovetailing the desire for instituting analytical systems and addressing real-time needs for operational decision-making. Whether your company is looking to streamline supply chain management and inventory control, or deriving insight for enhancing customer experiences using numerous data streams linked with existing customer profiles, the best advantage comes from enabling the integration of analytics with operational systems in real time, or at least within the window of a defined (typically short) time frame. Read more
If you have been following this series of articles about data validation and testing, you will (hopefully) come to the conclusion that there is a healthy number of scenarios in which large volumes of data are being moved (using a variety of methods), and in each of these scenarios, the choices made in developing a framework for data movement can introduce errors. One of our discussions (both in the article and in discussions with Informatica’s Ash Parikh) focused on data integration testing for production data sets, while another centered on verification of existing extraction/transformation/loading methods for data integration (you can listen to that conversation also).
In practice, though, both of these cases are specific instances of a more general notion of migration. There are basically two kinds of migrations: data migrations and system migrations. A data migration involves moving the data from one environment to another similar environment, while a system migration involves transitioning from one instance of an application to what is likely a completely different application.