Warning: file_get_contents() [function.file-get-contents]: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/content/86/6065486/html/wp-content/plugins/statsurfer/append.php on line 314

Warning: file_get_contents(http://api.hostip.info/country.php?ip= [function.file-get-contents]: failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/content/86/6065486/html/wp-content/plugins/statsurfer/append.php on line 314

Warning: Cannot modify header information - headers already sent by (output started at /home/content/86/6065486/html/wp-content/plugins/statsurfer/append.php:314) in /home/content/86/6065486/html/wp-includes/feed-rss2-comments.php on line 8
Comments for The Practitioner's Guide to Data Quality Improvement http://dataqualitybook.com Wed, 02 Sep 2015 09:20:36 +0000 hourly 1 http://wordpress.org/?v=4.3.3 Comment on Data Quality, Data Cleansing, Data Migration: Some Questions by Marteen Siddle http://dataqualitybook.com/?p=137&cpage=1#comment-35618 Wed, 02 Sep 2015 09:20:36 +0000 http://dataqualitybook.com/?p=137#comment-35618 Great share .. I was searching on the internet for information related to data cleansing and came across your blog post. It was proved to be a very helpful for me. Thank you so much for sharing such a useful post with us …

Comment on Data Modeling and Fractal Fairy Tales by robert http://dataqualitybook.com/?p=261&cpage=1#comment-24575 Thu, 09 Oct 2014 20:24:10 +0000 http://dataqualitybook.com/?p=261#comment-24575 Once the model is built for a specific set of requirements we normally implement that model but what we should do is to generalize the model to allow for an evolving business model over time. When the data model is generalized it becomes extensible because there are now places to added additional information that the business needs as needs change. This also forces reuse within our code because information that is related to the generalization becomes inherited by the entity sub-types and thus the code that is used to manage the sub-types. The model starts to take on the characteristics of a Class/Sub Class structure that you would see in an OO design.. For example, the model has an Employee and Customer and both have an address. The process that is used to manage the employee inherits the process that manages the address. As does the Customer. With this approach the generalized starts to become fractal in that if each entity is a doorway into the model any doorway can provide the same information.

Comment on Proposing a Master Data Domain for Collaborative Alignment by Karen Piper http://dataqualitybook.com/?p=215&cpage=1#comment-11086 Sun, 24 Nov 2013 17:22:23 +0000 http://dataqualitybook.com/?p=215#comment-11086 Taking an enterprise architecture course and we are discussing data domains and aligning them to business capabilities.

Comment on Data Modeling and Fractal Fairy Tales by admin http://dataqualitybook.com/?p=261&cpage=1#comment-10220 Tue, 15 Oct 2013 20:07:16 +0000 http://dataqualitybook.com/?p=261#comment-10220 Thanks for the comment, Jane! Yes, you are correct: data requirements for specific purposes within a multi-faceted enterprise does not provide the level of insight necessary to understand what needs to be done for enterprise integration. It is a little like the parable described in Edwin Abbot;s famous book “Flatland” (see http://www.amazon.com/gp/product/1492793051/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=1492793051&linkCode=as2&tag=wwwknowledgei-20) in which geometric shapes of one dimension cannot understand the entire concept of higher dimensions.

On the other hand, the approach that we often take for justifying a more comprehensive discussion about enterprise data requirements and data governance in particular is weighing the value of information against the costs of ignoring compliance with business policies and how they are translated into information policies. Gaps in this process can lead to missed opportunities, increased operational costs, or even a variety of regulatory, compliance, or customer satisfaction risks.
If you can articulate those potential negative impacts to the right people in the organization, that might help in generating support for a data summit!

Comment on Data Modeling and Fractal Fairy Tales by Jane Caddel http://dataqualitybook.com/?p=261&cpage=1#comment-10219 Tue, 15 Oct 2013 19:52:46 +0000 http://dataqualitybook.com/?p=261#comment-10219 The above page describes our situation precisely! Trying to predict the future business needs in order to model for flexibility. I was hoping for more information on why this looks like a data quality issue to you? I have been thinking there is an approach to the data modeling that would make a difference, and liken the previous modeling efforts to Lewis and Clark mapping by walking the terrain, as opposed to using sattelite images to map. I’m seeing that crawling through the data requirements and modeling for a perticular purpose doesn’t work well for long term enterprise integration. Gaining support for conducting a data summit to detail all business needs for a subject area is complex and difficult, and who can predict the business requirements of a merger or acquisition?

Thank you for any additional insight you can provide.

Jane Caddel
Data Architect

Comment on Use Cases for Operational Synchronization by Using Data Replication to Enable Operational Synchronization : The Practitioner's Guide to Data Quality Improvement http://dataqualitybook.com/?p=389&cpage=1#comment-8614 Thu, 25 Jul 2013 16:55:22 +0000 http://dataqualitybook.com/?p=389#comment-8614 […] my last post, we looked at some common use cases for operational synchronization, and each of those examples […]

Comment on The Need for Operational Synchronization by Use Cases for Operational Synchronization : The Practitioner's Guide to Data Quality Improvement http://dataqualitybook.com/?p=386&cpage=1#comment-8510 Mon, 22 Jul 2013 13:33:47 +0000 http://dataqualitybook.com/?p=386#comment-8510 […] my last post, I introduced the need for operational synchronization, focusing on the characteristics necessary […]

Comment on Using Data Integration Testing for Reconciling Production Data Assets by Managing Information Consistency and Trust During System Migrations and Data Migrations : The Practitioner's Guide to Data Quality Improvement http://dataqualitybook.com/?p=362&cpage=1#comment-2247 Thu, 29 Nov 2012 21:12:30 +0000 http://dataqualitybook.com/?p=362#comment-2247 […] our discussions (both in the article and in discussions with Informatica’s Ash Parikh) focused on data integration testing for production data sets, while another centered on verification of existing extraction/transformation/loading methods for […]

Comment on Best Practices for Data Integration Testing Series – Instituting Good Practices for Data Testing by John http://dataqualitybook.com/?p=358&cpage=1#comment-645 Fri, 02 Nov 2012 17:51:18 +0000 http://dataqualitybook.com/?p=358#comment-645 Thanks for sharing the thoughts..very useful..

My experience as an ETL tester below are my thoughts…
Complete validation of whole test data or even testing the production data could miss defects. The reason is that the ETL development is usually done for new requirement/changes. So there are chances like the data required for testing some new transformations will not present in the test environment.. So in that way we could not tell even the production cut data have “Integrity”. Integrity (w.r.t testing) can be achieved only when the data is sufficient to test all the Business rules. Some times we need to consider even 100 records, which can cover all the business rules, are more consistent that One Billion records junk records.

For most of the times scrambled production cut data is put into the testing environment due to security reasons(financial data/healthcare/HR System data ). This scrambling of data is inevitable due to security reasons and due to privacy laws in different countries. Even if we are testing One billion records with automation tools or with Minus queries, this will not help. (For E.g. In my current project we have 50 Million of records, but when we checked we could find 300 eligible records after filtering in some levels. So if we are automating/running minus to test the whole data, our actual testing is with just 300 records..In those records also some columns have default/null values for all records. So in order to validate those columns we need to manipulate data at source.

Hence my thought is Test Data Identification is very important. This can be achieved by creating Test Data Identification document while Requirement gathering stage itself and should be update till execution stage.
This will make more accountability for testing and test data. This will also help in test/test data auditing, which is an important step in Quality.

Developing of automation tool will help. But in my experience using in-built automation tools/ Automation tools used for other requirements/projects will not help as business rules/Complexity table joins will vary.
This can be avoid by developing framework tools suitable for the project need. ETL automation tool should only be used for the need…(means building automation tools/ providing training to wrong automation tools which is not productive, at the cost of Quality will not be a good idea!!).

The automation tools can be used for Test data set-up, Test documentation, test execution, generating SQL queries. I have used some automation tools of HP, Informatica , but it rarely suits for project needs, even though there are success stories..).

In my view Manual verification is getting less in ETL, if we are writing Minus queries and handling tools..(For E.g. Manual verification in excel can be avoid in very simple way by inserting data into database and use minus queries. Now most of the latest DB Query tools have this option..)

Comment on Using Data Integration Testing for Reconciling Production Data Assets by Diane http://dataqualitybook.com/?p=362&cpage=1#comment-472 Thu, 11 Oct 2012 07:24:39 +0000 http://dataqualitybook.com/?p=362#comment-472 Great point indeed. Testing the data should be incorporated in the system and data readiness checks before it is live! The problem is that it is difficult :)