Warning: file_get_contents() [function.file-get-contents]: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/content/86/6065486/html/wp-content/plugins/statsurfer/append.php on line 314

Warning: file_get_contents(http://api.hostip.info/country.php?ip= [function.file-get-contents]: failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/content/86/6065486/html/wp-content/plugins/statsurfer/append.php on line 314
Big Data, Sensors, and Data Integration as Part of the Machinery : The Practitioner's Guide to Data Quality Improvement

Big Data, Sensors, and Data Integration as Part of the Machinery

June 4, 2013 by
Filed under: Data Integration 

Despite my clear understanding that the world’s data volumes are growing by leaps and bounds, I sometimes wonder whether the information management industry’s hyperfocusing on unstructured data seems a bit over the top. Yes, I know that social media channels such as Twitter and LinkedIn and Facebook, and etc. are pushing mounds of what we want to believe is valuable content that can be mined for exploitation in terms of targeted marketing and upselling and cross-selling. But when you actually sit down and read a series of Twitter tweets, for example, you might notice a few things. First of all, a lot of the activity is not original, but is merely a repeat of something someone else said. Second, the ability to follow a thread based on the hash tags is limited by the absence of all metadata; the same tag may be used for any number of concepts, and presuming they can be converged is actually somewhat naïve. Third, much of the content is formulaic and even automatically generated as part of a corporate social media initiative designed to maintain a social media presence, even at the mercy of publishing anything with significant content.

On the other hand, I am a proponent of big data and big data analytics, so these comments might seem somewhat contrarian. However, I have a continued fascination with what I think will be the most relevant sources of big data in the near-to-long term future: sensors. Actually, the relevance of machine-generated data from a broad network of interacting nodes is not new, especially in the world of computer networking (hint: think about how email actually works).  But more and more things are being outfitted with sensors to the point where there are mounds of devices always generating streams of information that can be subjected to analysis.

And yet the data integration challenges remain, particularly if you are relying on a single landing pad or staging area for lots and lots of data. As the number of devices and sensors generating data increases, there is a corresponding need for aligning data integration and transformation within individual devices. As devices are connected together, the ability to embed data transformations at strategic points across the network can not only reduce a computation bottleneck at the ultimate target destination, it can also optimize the computation as a result of data distribution and task parallelization.

I see two specific values in Informatica’s announcement of Vibe. First, because your developed transformations and integration directives can be developed on top of Vibe in one environment and can be deployed to any other platform running Vibe, you have effectively defined a standard for development and implementation. It allows you to develop within a controlled environment but deploy anywhere. Second, if I understand correctly, Vibe has a small-footprint that allows it to be embeddable. Informatica has embedded it into some applications via OEM relationships, and it powers most of its existing products. The roadmap includes shrinking the footprint even more for devices and sensors. This addresses the expectation for the active network, in which computations and transformations can be layered into the interconnectivity of devices.

Last, if you consider the various dynamic topologies of these interconnections, you begin to see how embeddability really can add value. For example, smart devices generating location data can sync up in self-organizing networks and perform transformations as aggregated statistics are sent to mobile towers. Road sensors can go beyond transmitting and begin to incorporate logic in relation to aggregated traffic data. Connect the device topologies with cloud applications that managed device profiles. There are many different examples, and it is clear that the product roadmap is intended to accommodate a wide variety of sensor-based big data applications.


Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!