Changing Assumptions About Data Use

September 8, 2011 by
Filed under: Data Quality 

I was reading Tom Redman’s blog posting at the data roundtable about the customer-supplier model for data and thought it would be worth raising a particular question regarding an ongoing assumption about data use. Tom’s posting suggests that as individuals should cast themselves into a supplier or customer role. The suppliers should consider who the customers are and ask themselves whether they are meeting their customers’ needs. The customers should consider who the suppliers are and “act like customers.”

This model may work in a constrained environment where those roles can be ascertained. But that may be an assumption that is rapidly changing…

We recently worked on a project for a federal government agency, and one of their objectives is looking at more effective channels for sharing data across agencies. This agency produces a lot of data, and traditionally has been pretty good at publishing that data as well. And when we met with the data producers, they understood that the data created as a result of their application was potentially being used by other consumers within and outside the agency.

However: the sharing model for the agency (and practically for other federal agencies as embodied by the data.gov model) is to provide the data for the “community” and allow them to do what they wanted to do. The scientists could analyze some of the data, lobbyists could employ the data to recommend policy changes, reporters could use the data to rake up some muck and generate news stories, aggregators could restructure the data and resell it. But there was one common thread in this speculation about the data consumers: it was largely speculation.

In essence, the release of the data is unconstrained – anyone who finds it can use it for whatever purposes they wanted. That being said, it would be a challenge to project themselves into the role of supplier other than that of one of raw materials. Certainly, the quality constraints for one set of data consumers are going to be largely different than those of the other sets, and the agency was not in the business of providing data; it was just a byproduct of their operational activities. In other words, they might have thousands of different “customers” for the data, but there is no way that the agency was going to take on the responsibility and accountability of ensuring the quality for thousands of purposes.

I will cast this in light of another conversation I had at the June Data Governance conference. One attendee commented on the fact that the data his group acquired did not meet their needs. Another data quality guru suggested that the attendee send the data back to the producer and insist that it did not meet their needs. The attendee responded (as I secretly predicted) that the producer did not care, and was not about to allocate resources to ensure the quality of the data for some other group’s purposes.

And this sort of provides the basis for some different assumptions: once data sets are created and “findable,” they might be reused and repurposed, but that does not establish a supplier-customer relationship. Data repurposing creates new demands on expectations of use, each with its own set of data quality requirements. But until the perception of data availability changes from “reusability” to “utility,” one might lower their expectation that a supplier management model can be imposed to ensure that all data quality needs are met.

Comments

Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!