Data Modeling and Fractal Fairy Tales
Back during my college days I was reading about fractals, defined by Benoit Mandelbrot as “a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole.” Lately I have been thinking about the same concept in relation to data modeling.
Many good examples show up in nature, such as this picture of a fern from freefoto.com:
Note how the shape of the main branch is mimicked in a smaller version in each of the offshoots, and how in return each leaf on each offshoot approximately shares again the same shape. It is the “approximately” part of the definition that is interesting with respect to models, since that is basically what a model is: a reduced-size approximation of some other thing.
The design of a data model is in reaction to an application requirement based on a functional specification intended to satisfy some business objective. A data model attempts to capture a representation of characteristics of the real world item necessary within the business context. For simplification purposes, you typically don’t include all the attributes – that would be too much. On the other hand, you do want to capture those attributes that carry information to achieve the business goal. So in fact, the process of modeling is constrained on two sides by necessity. One other thing: once we hammer out the details of the mode, it is pretty much set in stone. Of course we can tweak it over time, but for the most part, the data model is relatively static.
Alternatively, business needs are not static, but are rather dynamic. And since it is difficult to foresee the future, it is also difficult to foresee the future requirements for a data model. As an example, one of our clients had been collecting data for over 50 years. It was only in recent times (the past 20 years) they had started to collect a particular attribute. But now that they wanted to do some new analyses, the absence of 30 years worth of those values turned out to be a barrier.
So the situation is this: the data model approximates a view of the real thing, but cannot completely capture everything that can be known about it. Because models are static, it is difficult to evolve a model with the same dynamicity as business does. And in turn, when we do reengineer models because they can’t keep up with business needs, the models are (yet again) adapted to the existing needs and don’t take into account the (as of yet unknown) future needs. Sounds like a data quality issue to me…