Data Modeling and Fractal Fairy Tales

June 16, 2011 by
Filed under: Data Analysis, Data Governance, Data Quality 

Back during my college days I was reading about fractals, defined by Benoit Mandelbrot as “a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole.” Lately I have been thinking about the same concept in relation to data modeling.

Many good examples show up in nature, such as this picture of a fern from freefoto.com:

Note how the shape of the main branch is mimicked in a smaller version in each of the offshoots, and how in return each leaf on each offshoot approximately shares again the same shape. It is the “approximately” part of the definition that is interesting with respect to models, since that is basically what a model is: a reduced-size approximation of some other thing.

The design of a data model is in reaction to an application requirement based on a functional specification intended to satisfy some business objective. A data model attempts to capture a representation of characteristics of the real world item necessary within the business context. For simplification purposes, you typically don’t include all the attributes – that would be too much. On the other hand, you do want to capture those attributes that carry information to achieve the business goal. So in fact, the process of modeling is constrained on two sides by necessity. One other thing: once we  hammer out the details of the mode, it is pretty much set in stone. Of course we can tweak it over time, but for the most part, the data model is relatively static.

Alternatively, business needs are not static, but are rather dynamic. And since it is difficult to foresee the future, it is also difficult to foresee the future requirements for a data model. As an example, one of our clients had been collecting data for over 50 years. It was only in recent times (the past 20 years) they had started to collect a particular attribute. But now that they wanted to do some new analyses, the absence of 30 years worth of those values turned out to be a barrier.

So the situation is this: the data model approximates a view of the real thing, but cannot completely capture everything that can be known about it. Because models are static, it is difficult to evolve a model with the same dynamicity as business does. And in turn, when we do reengineer models because they can’t keep up with business needs, the models are (yet again) adapted to the existing needs and don’t take into account the (as of yet unknown) future needs. Sounds like a data quality issue to me…

Comments

3 Comments on Data Modeling and Fractal Fairy Tales

  1. Jane Caddel on Tue, 15th Oct 2013 3:52 PM
  2. The above page describes our situation precisely! Trying to predict the future business needs in order to model for flexibility. I was hoping for more information on why this looks like a data quality issue to you? I have been thinking there is an approach to the data modeling that would make a difference, and liken the previous modeling efforts to Lewis and Clark mapping by walking the terrain, as opposed to using sattelite images to map. I’m seeing that crawling through the data requirements and modeling for a perticular purpose doesn’t work well for long term enterprise integration. Gaining support for conducting a data summit to detail all business needs for a subject area is complex and difficult, and who can predict the business requirements of a merger or acquisition?

    Thank you for any additional insight you can provide.

    Jane Caddel
    Data Architect

  3. admin on Tue, 15th Oct 2013 4:07 PM
  4. Thanks for the comment, Jane! Yes, you are correct: data requirements for specific purposes within a multi-faceted enterprise does not provide the level of insight necessary to understand what needs to be done for enterprise integration. It is a little like the parable described in Edwin Abbot;s famous book “Flatland” (see http://www.amazon.com/gp/product/1492793051/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=1492793051&linkCode=as2&tag=wwwknowledgei-20) in which geometric shapes of one dimension cannot understand the entire concept of higher dimensions.

    On the other hand, the approach that we often take for justifying a more comprehensive discussion about enterprise data requirements and data governance in particular is weighing the value of information against the costs of ignoring compliance with business policies and how they are translated into information policies. Gaps in this process can lead to missed opportunities, increased operational costs, or even a variety of regulatory, compliance, or customer satisfaction risks.
    If you can articulate those potential negative impacts to the right people in the organization, that might help in generating support for a data summit!

  5. robert on Thu, 9th Oct 2014 4:24 PM
  6. Once the model is built for a specific set of requirements we normally implement that model but what we should do is to generalize the model to allow for an evolving business model over time. When the data model is generalized it becomes extensible because there are now places to added additional information that the business needs as needs change. This also forces reuse within our code because information that is related to the generalization becomes inherited by the entity sub-types and thus the code that is used to manage the sub-types. The model starts to take on the characteristics of a Class/Sub Class structure that you would see in an OO design.. For example, the model has an Employee and Customer and both have an address. The process that is used to manage the employee inherits the process that manages the address. As does the Customer. With this approach the generalized starts to become fractal in that if each entity is a doorway into the model any doorway can provide the same information.

Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!