<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for The Practitioner&#039;s Guide to Data Quality Improvement</title>
	<atom:link href="http://dataqualitybook.com/?feed=comments-rss2" rel="self" type="application/rss+xml" />
	<link>http://dataqualitybook.com</link>
	<description></description>
	<lastBuildDate>Thu, 29 Nov 2012 21:12:30 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
	<item>
		<title>Comment on Using Data Integration Testing for Reconciling Production Data Assets by Managing Information Consistency and Trust During System Migrations and Data Migrations : The Practitioner&#039;s Guide to Data Quality Improvement</title>
		<link>http://dataqualitybook.com/?p=362&#038;cpage=1#comment-2247</link>
		<dc:creator>Managing Information Consistency and Trust During System Migrations and Data Migrations : The Practitioner&#039;s Guide to Data Quality Improvement</dc:creator>
		<pubDate>Thu, 29 Nov 2012 21:12:30 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=362#comment-2247</guid>
		<description>[...] our discussions (both in the article and in discussions with Informatica’s Ash Parikh) focused on data integration testing for production data sets, while another centered on verification of existing extraction/transformation/loading methods for [...]</description>
		<content:encoded><![CDATA[<p>[...] our discussions (both in the article and in discussions with Informatica’s Ash Parikh) focused on data integration testing for production data sets, while another centered on verification of existing extraction/transformation/loading methods for [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Best Practices for Data Integration Testing Series &#8211; Instituting Good Practices for Data Testing by John</title>
		<link>http://dataqualitybook.com/?p=358&#038;cpage=1#comment-645</link>
		<dc:creator>John</dc:creator>
		<pubDate>Fri, 02 Nov 2012 17:51:18 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=358#comment-645</guid>
		<description>Thanks for sharing the thoughts..very useful..

My experience as an ETL tester below are my thoughts...
Complete validation of whole test data or even testing the production data could miss defects. The reason is that the ETL development is usually done for new requirement/changes. So there are chances like the data required for testing some new transformations will not present in the test environment.. So in that way we could not tell even the production cut data have &quot;Integrity&quot;. Integrity (w.r.t testing) can be achieved only when the data is sufficient to test all the Business rules. Some times we need to consider even 100 records, which can cover all the business rules, are more consistent that One Billion records junk records.

For most of the times scrambled production cut data is put into the testing environment due to security reasons(financial data/healthcare/HR System data ). This scrambling of data is inevitable due to security reasons and due to privacy laws in different countries. Even if we are testing One billion records with automation tools or with Minus queries, this will not help. (For E.g. In my current project we have 50 Million of records, but when we checked we could find 300 eligible records after filtering in some levels. So if we are automating/running minus to test the whole data, our actual testing is with just 300 records..In those records also some columns have default/null values for all records. So in order to validate those columns we need to manipulate data at source. 

Hence my thought is Test Data Identification is very important. This can be achieved by creating Test Data Identification document while Requirement gathering stage itself and should be update till execution stage.
This will make more accountability for testing and test data. This will also help in test/test data auditing, which is an important step in Quality.

Developing of automation tool will help. But in my experience using in-built automation tools/ Automation tools used for other requirements/projects will not help as business rules/Complexity table joins will vary. 
This can be avoid by developing framework tools suitable for the project need. ETL automation tool should only be used for the need...(means building automation tools/ providing training to wrong automation tools which is not productive, at the cost of Quality will not be a good idea!!).

The automation tools can be used for Test data set-up, Test documentation, test execution, generating SQL queries. I have used some automation tools of HP, Informatica , but it rarely suits for project needs, even though there are success stories..).

In my view Manual verification is getting less in ETL, if we are writing Minus queries and handling tools..(For E.g. Manual verification in excel can be avoid in very simple way by inserting data into database and use minus queries. Now most of the latest DB Query tools have this option..)</description>
		<content:encoded><![CDATA[<p>Thanks for sharing the thoughts..very useful..</p>
<p>My experience as an ETL tester below are my thoughts&#8230;<br />
Complete validation of whole test data or even testing the production data could miss defects. The reason is that the ETL development is usually done for new requirement/changes. So there are chances like the data required for testing some new transformations will not present in the test environment.. So in that way we could not tell even the production cut data have &#8220;Integrity&#8221;. Integrity (w.r.t testing) can be achieved only when the data is sufficient to test all the Business rules. Some times we need to consider even 100 records, which can cover all the business rules, are more consistent that One Billion records junk records.</p>
<p>For most of the times scrambled production cut data is put into the testing environment due to security reasons(financial data/healthcare/HR System data ). This scrambling of data is inevitable due to security reasons and due to privacy laws in different countries. Even if we are testing One billion records with automation tools or with Minus queries, this will not help. (For E.g. In my current project we have 50 Million of records, but when we checked we could find 300 eligible records after filtering in some levels. So if we are automating/running minus to test the whole data, our actual testing is with just 300 records..In those records also some columns have default/null values for all records. So in order to validate those columns we need to manipulate data at source. </p>
<p>Hence my thought is Test Data Identification is very important. This can be achieved by creating Test Data Identification document while Requirement gathering stage itself and should be update till execution stage.<br />
This will make more accountability for testing and test data. This will also help in test/test data auditing, which is an important step in Quality.</p>
<p>Developing of automation tool will help. But in my experience using in-built automation tools/ Automation tools used for other requirements/projects will not help as business rules/Complexity table joins will vary.<br />
This can be avoid by developing framework tools suitable for the project need. ETL automation tool should only be used for the need&#8230;(means building automation tools/ providing training to wrong automation tools which is not productive, at the cost of Quality will not be a good idea!!).</p>
<p>The automation tools can be used for Test data set-up, Test documentation, test execution, generating SQL queries. I have used some automation tools of HP, Informatica , but it rarely suits for project needs, even though there are success stories..).</p>
<p>In my view Manual verification is getting less in ETL, if we are writing Minus queries and handling tools..(For E.g. Manual verification in excel can be avoid in very simple way by inserting data into database and use minus queries. Now most of the latest DB Query tools have this option..)</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Using Data Integration Testing for Reconciling Production Data Assets by Diane</title>
		<link>http://dataqualitybook.com/?p=362&#038;cpage=1#comment-472</link>
		<dc:creator>Diane</dc:creator>
		<pubDate>Thu, 11 Oct 2012 07:24:39 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=362#comment-472</guid>
		<description>Great point indeed.  Testing the data should be incorporated in the system and data readiness checks before it is live!  The problem is that it is difficult :)</description>
		<content:encoded><![CDATA[<p>Great point indeed.  Testing the data should be incorporated in the system and data readiness checks before it is live!  The problem is that it is difficult <img src='http://dataqualitybook.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on iOS6, Apple Maps, and the Biggest Data Quality Story This Year by Link Roundup &#8211; October 1, 2012 &#124; Enterprise Information Management in the 21st Century</title>
		<link>http://dataqualitybook.com/?p=366&#038;cpage=1#comment-443</link>
		<dc:creator>Link Roundup &#8211; October 1, 2012 &#124; Enterprise Information Management in the 21st Century</dc:creator>
		<pubDate>Mon, 01 Oct 2012 17:32:55 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=366#comment-443</guid>
		<description>[...] iOS6, Apple Maps, and the Biggest Data Quality Story This Year (The Practitioner&#8217;s Guide to Data Quality) &#8211; You know I just had to have at least on link on this story&#8230; [...]</description>
		<content:encoded><![CDATA[<p>[...] iOS6, Apple Maps, and the Biggest Data Quality Story This Year (The Practitioner&#8217;s Guide to Data Quality) &#8211; You know I just had to have at least on link on this story&#8230; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Best Practices for Data Integration Testing Series &#8211; Instituting Good Practices for Data Testing by Using Data Integration Testing for Reconciling Production Data Assets : The Practitioner&#039;s Guide to Data Quality Improvement</title>
		<link>http://dataqualitybook.com/?p=358&#038;cpage=1#comment-289</link>
		<dc:creator>Using Data Integration Testing for Reconciling Production Data Assets : The Practitioner&#039;s Guide to Data Quality Improvement</dc:creator>
		<pubDate>Fri, 14 Sep 2012 13:41:56 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=358#comment-289</guid>
		<description>[...] my last post, we started to discuss the need for fundamental processes and tools for institutionalizing data [...]</description>
		<content:encoded><![CDATA[<p>[...] my last post, we started to discuss the need for fundamental processes and tools for institutionalizing data [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Data Governance and Quality: Data Reuse vs. Data Repurposing by Max Gano</title>
		<link>http://dataqualitybook.com/?p=349&#038;cpage=1#comment-151</link>
		<dc:creator>Max Gano</dc:creator>
		<pubDate>Thu, 01 Mar 2012 00:35:24 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=349#comment-151</guid>
		<description>Great topic, David, especially in regards to how you are defining data re-purposing. I have often used the Fit-For-Use analysis method from the Data Governance Institute. It really helps unravel the complex mix of challenges raised when multiple downstream consumers re-purpose data from a common source. Too often this occurs as a sort of &quot;secret second life&quot; of data that only comes to light when things go wrong. What works for reuse may be entirely different from what works for re-purposing. And the greatest challenge of all is that no one perspective is more correct than another. Keeps things VERY interesting. But one way to stay in front is to understand the need to be prepared to support Fit-For-Use early on.</description>
		<content:encoded><![CDATA[<p>Great topic, David, especially in regards to how you are defining data re-purposing. I have often used the Fit-For-Use analysis method from the Data Governance Institute. It really helps unravel the complex mix of challenges raised when multiple downstream consumers re-purpose data from a common source. Too often this occurs as a sort of &#8220;secret second life&#8221; of data that only comes to light when things go wrong. What works for reuse may be entirely different from what works for re-purposing. And the greatest challenge of all is that no one perspective is more correct than another. Keeps things VERY interesting. But one way to stay in front is to understand the need to be prepared to support Fit-For-Use early on.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Data Governance and Quality: Data Reuse vs. Data Repurposing by Fit for repurposing &#171; Liliendahl on Data Quality</title>
		<link>http://dataqualitybook.com/?p=349&#038;cpage=1#comment-150</link>
		<dc:creator>Fit for repurposing &#171; Liliendahl on Data Quality</dc:creator>
		<pubDate>Thu, 23 Feb 2012 15:38:39 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=349#comment-150</guid>
		<description>[...] by a blog post by David Loshin called Data Governance and Quality: Data Reuse vs. Data Repurposing I was, perhaps a bit off topic, inspired to pose the question about, if data are of high quality if [...]</description>
		<content:encoded><![CDATA[<p>[...] by a blog post by David Loshin called Data Governance and Quality: Data Reuse vs. Data Repurposing I was, perhaps a bit off topic, inspired to pose the question about, if data are of high quality if [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Data Governance and Quality: Data Reuse vs. Data Repurposing by Henrik Liliendahl Sørensen</title>
		<link>http://dataqualitybook.com/?p=349&#038;cpage=1#comment-148</link>
		<dc:creator>Henrik Liliendahl Sørensen</dc:creator>
		<pubDate>Wed, 22 Feb 2012 20:11:27 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=349#comment-148</guid>
		<description>Good musings David. Makes me pose the question: Is data of high quality if they are “fit for purpose of use” or “fit for repurposing”?</description>
		<content:encoded><![CDATA[<p>Good musings David. Makes me pose the question: Is data of high quality if they are “fit for purpose of use” or “fit for repurposing”?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Things I Want to Know About Customers by Clarke Patterson</title>
		<link>http://dataqualitybook.com/?p=337&#038;cpage=1#comment-138</link>
		<dc:creator>Clarke Patterson</dc:creator>
		<pubDate>Fri, 20 Jan 2012 21:37:34 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=337#comment-138</guid>
		<description>Great post, as always David!  To build on Henrik&#039;s thought, relationships to determine who influences who is also something to keep an eye out for.  Particularly in the social world, the more we can connect the dots across social networks, the more effective we can be in tailoring outreach efforts.</description>
		<content:encoded><![CDATA[<p>Great post, as always David!  To build on Henrik&#8217;s thought, relationships to determine who influences who is also something to keep an eye out for.  Particularly in the social world, the more we can connect the dots across social networks, the more effective we can be in tailoring outreach efforts.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Things I Want to Know About Customers by Prashant C</title>
		<link>http://dataqualitybook.com/?p=337&#038;cpage=1#comment-136</link>
		<dc:creator>Prashant C</dc:creator>
		<pubDate>Mon, 16 Jan 2012 15:58:52 +0000</pubDate>
		<guid isPermaLink="false">http://dataqualitybook.com/?p=337#comment-136</guid>
		<description>Good list of things to know David.

Another aspect which is given good importance is the relationship information of a customer. Along with this, we also store relationship roles which help in determining who is more influential in a given relationship.

Thanks
Prashant</description>
		<content:encoded><![CDATA[<p>Good list of things to know David.</p>
<p>Another aspect which is given good importance is the relationship information of a customer. Along with this, we also store relationship roles which help in determining who is more influential in a given relationship.</p>
<p>Thanks<br />
Prashant</p>
]]></content:encoded>
	</item>
</channel>
</rss>
