Filed under: Business Impacts, Business Intelligence, Data Analysis, Data Quality, Master Data, Metrics, Performance Measures
It would be unusual for there to be a company that does not use some physical facility from which business is conducted. Even the leaders and managers of home-based and virtual businesses have to sit down at some point, whether to access the internet, make a phone call, check email, or pack an order and arrange for its delivery. Consequently, every company eventually must incur some overhead and administrative costs associated with running the business, such as rent and facility maintenance, as well as telephones, internet, furniture, hardware, and software purchase/leasing and maintenance.
Today’s thoughts are about that last item: the costs associated with building, furniture, machinery, software, and grounds maintenance. There is a balance required for effective asset maintenance – one would like to essentially optimize the program to allocate the most judicious amount of resources to provide the longest lifetime to acquired or managed assets.
As an example, how often do offices need to be painted? When you deal with one or two rooms, that is not a significant question, but when you manage a global corporation with hundreds of office buildings in scores of countries, the “office painting schedule” influences a number of other decisions regarding bulk purchasing of required materials (e.g. paint and brushes), competitive engagement of contractors to do the work, temporary office space for staff as offices are being painted, etc., which provide a wide opportunity for cost reduction and increased productivity.
And data quality fits in as a byproduct of the data associated with both the inventory of assets requiring maintenance and the information used for managing the maintenance program. In fact, this presents an interesting master data management opportunity, since it involves the consolidation of a significant amount of data from potentially many sources regarding commonly-used and shared data concepts such as “Asset.” The “Asset” concept can be hierarchically organized in relation to the different types of assets, each of which exists in a variety of representations and each of which is subject to analysis for maintenance optimization. Here are some examples:
- Fixed assets (real property, office buildings, grounds, motor vehicles, large manufacturing machinery, other plant/facility items)
- Computer assets (desktops, printers, laptops, scanners)
- Telephony (PBX, handsets, mobile phones)
- Furniture (desks, bookcases, chairs, couches, tables)
I think you see where I am going here: errors in asset data lead to improper analyses with respect to maintenance of those assets, such as arranging for a delivery truck’s oil to be changed twice in the same week, or painting some offices twice in a six month period while other office remain unpainted for years. Therefore, there is a direct dependence between the quality of asset data and the costs associated with asset maintenance.
Having used a number of data profiling products, I often think about ways the products could be improved. Here is one:
After the profiling scan is completed, I will look at each data attribute’s value frequency histogram to look for outliers and frequently appearing values. I will typically want to drill through those values to look at the specific records that have those values.
But sometimes I would like to drill through the frequency analysis on more than one value. If through my observation I see a few values that are suspicious, I’d like to look at the records that have any of those values. For example, if I am examining a SEX field and see values other than M and F (such as U and N), I’d like to look at all records that have those values (in this case, either U or N) to see if there are any obvious patterns associated with any of the unusual values. That would require a “multi-select” feature in which you could ctrl-click on a number of values and then request a drill-through.
Right now most tools I have used are only configred to let you drill through one value. I have asked some vendors, who have basically said that their tool can do it, but the way to do it is to write a rule, do an extract, and then look at the resulting records, and that is certainly not a simple process.
I have a number of other suggestions that I plan to post. Meanwhile, if you have comments, send them to me through the Contact Us page! I will post the best comments and suggestions.
Filed under: Business Rules, Data Profiling, Data Quality, Metrics, Performance Measures
Data profiling can be an excellent approach to identifying latent issues and errors hidden in your data. We have seen a number of clients using data profiling as the first step in defining data quality metrics and using those metrics for reporting via scorecards and dashboards.
And if I can identify a problem and I can define a rule for determining that the problem exists, should I not be able to fix the problem? Here is a question, though: once I fix the root cause of the problem, do I need to still keep checking if the problem has occured?
More on this in an upcoming post; contact me if you have thoughts…
Filed under: Business Rules, Data Quality, Metadata, Metrics
There are many different dimensions of data quality that can be “configured” to measure and monitor compliance with data consumer expectations. We could classify a subset of the data quality dimensions that can be mapped to assertions at different levels of data precision, such as:
- Data value, in which a rule is used to validate a specific value. An example is a format specification for any ZIP code (no matter which data element is storing it) that says the value must be a character string that has 5 digits, a hyphen, then 4 digits.
- Data element, in which a value is validated in the context of the assignment of a value domain to a data element. An example is an assertion that the value of the SEX field must be either M or F.
- Record, in which the assertion refers to more than one data element within a record. An example would specify that the START_DATE must be earlier in time than the END_DATE.
- Column, which is some qualitative measure of the collection of values in one column. An example would assert that no value appears more than 5% of the time across the entire column.
- Table, which measures compliance over a collection of records. An example is a rule says that the table’s percentage of valid records must be greater than 85%.
- Cross-table, which looks at the relationships across tables. An example could specify that there is a one-to-one relationship between customer record and primary address record.
- Aggregate, which provides rules about aggregate functions. An example would apply a validation rule to averages and sums calculated in business intelligence reports.
I have been thinking about ways to map these rules to metadata concepts to understand how a services model could be implemented that could be invoked at different locations within the information production flow. For example, one could validate data values as they are created, but you’d have to wait until you have many records to validate a table rule. This suggests that the value rules can be mapped to value domains, while table rules are mapped to entities. As this mapping gets fleshed out, I will begin to assemble a service model for data validation that ultimately links through the metadata to the original definitions associated with business policies. Given that model, we can spec out an operational governance framework to manage the quality as it pertains to the business policies.