On the other hand, the approach that we often take for justifying a more comprehensive discussion about enterprise data requirements and data governance in particular is weighing the value of information against the costs of ignoring compliance with business policies and how they are translated into information policies. Gaps in this process can lead to missed opportunities, increased operational costs, or even a variety of regulatory, compliance, or customer satisfaction risks.
If you can articulate those potential negative impacts to the right people in the organization, that might help in generating support for a data summit!
Thank you for any additional insight you can provide.
My experience as an ETL tester below are my thoughts…
Complete validation of whole test data or even testing the production data could miss defects. The reason is that the ETL development is usually done for new requirement/changes. So there are chances like the data required for testing some new transformations will not present in the test environment.. So in that way we could not tell even the production cut data have “Integrity”. Integrity (w.r.t testing) can be achieved only when the data is sufficient to test all the Business rules. Some times we need to consider even 100 records, which can cover all the business rules, are more consistent that One Billion records junk records.
For most of the times scrambled production cut data is put into the testing environment due to security reasons(financial data/healthcare/HR System data ). This scrambling of data is inevitable due to security reasons and due to privacy laws in different countries. Even if we are testing One billion records with automation tools or with Minus queries, this will not help. (For E.g. In my current project we have 50 Million of records, but when we checked we could find 300 eligible records after filtering in some levels. So if we are automating/running minus to test the whole data, our actual testing is with just 300 records..In those records also some columns have default/null values for all records. So in order to validate those columns we need to manipulate data at source.
Hence my thought is Test Data Identification is very important. This can be achieved by creating Test Data Identification document while Requirement gathering stage itself and should be update till execution stage.
This will make more accountability for testing and test data. This will also help in test/test data auditing, which is an important step in Quality.
Developing of automation tool will help. But in my experience using in-built automation tools/ Automation tools used for other requirements/projects will not help as business rules/Complexity table joins will vary.
This can be avoid by developing framework tools suitable for the project need. ETL automation tool should only be used for the need…(means building automation tools/ providing training to wrong automation tools which is not productive, at the cost of Quality will not be a good idea!!).
The automation tools can be used for Test data set-up, Test documentation, test execution, generating SQL queries. I have used some automation tools of HP, Informatica , but it rarely suits for project needs, even though there are success stories..).
In my view Manual verification is getting less in ETL, if we are writing Minus queries and handling tools..(For E.g. Manual verification in excel can be avoid in very simple way by inserting data into database and use minus queries. Now most of the latest DB Query tools have this option..)]]>