Is Data Profiling a Commodity?

Here are some quick thoughts about the basic functionality of data profiling that make me wonder about the degree to which it has become a commodity capability. If so, then I have a few observations at the end to make folks think about what they are using profiling for.

Hierarchy Data Completeness and Semantic Convergence

Yesterday, Henrik Liliendahl Sørensen posted an interesting entry about data profiling, data values, and corresponding quality and completeness of the hierarchies associated with the data domain values used within a data set for any particular data element’s populated values. I’d like to jam along with that concept with respect to a conversation I had the other day that was essentially about capturing and tracking spend data, although the context was capturing and reporting the aggregate physician payments made by a pharmaceutical (or other covered manufacturer) to specific practitioners.
Data Quality Profiling and Assessment – Some Questions for the Client

Yesterday our company was approached to provide a proposal for a data quality assessment project as part of a more comprehensive data quality assurance effort. When we get these types of requests, I am always amused by the fact that key pieces of information necessary for determining the amount of work. We typically have some basic questions in order to scope the level of effort, including:

• What data sets are to be used as the basis for analysis?
• How many tables?
• How many data elements?
• How many records in each table?
• Are reference data sets available for the common value domains?
• How many business processes source data into the target data set?
• How many processes use the data in the target data set?
• What documentation is available for the data sets and the business processes?
• What tools are in place to analyze the data?
• Will the client provide access to the sources for analysis?
• How is the organization prepared to take actions based on the resultant findings?

In general, I like to think that my company is pretty good at doing these types of assessments – of course, I wrote the book (or at least, a book) on the topic ;-).

March 1,2,3 David Loshin Events – Strategic Business Value from Enterprise Data

I have been invited by data quality and MDM tool company Ataccama to be the invited guest speaker at a series of breakfast seminar events in early March at the following locations:

March 1 Bridgewater NJ

March 2 Chicago, IL

March 3 Charlotte, NC

The topic is “Strategic Business Value from your Enterprise Data,” and I will be discussing aspects of business value drivers for Data Quality and MDM. I believe that attendees will also get a copy of my book “Master Data Management.”

I participated in a few similar events at the end of 2010 and found that some of the attendees posed ssome extremenly interesting challenges, and I hope to share some new insights at these upcoming events!

Recommended Data Quality Books

I am putting together a list of book recommendations for the data quality practitioner, and have added a page link on the web site’s Page Link bar. Click here for a direct link.

