Partial Entity Resolution

May 20, 2011 by
Filed under: Business Intelligence, Business Rules, Data Quality 

The other day I had a conversation about product master data, and one of the participants, almost as an aside, mentioned a concept of a “virtual product.” More specifically, he was referring to an operational context in which a maintenance team needed to look for a type of a part to be used to replace a existing worn machine part. The curious aspect of this was that they were not looking for a specific part. Rather, they needed to describe the characteristics of the part and then see which available parts match those characteristics. If none were available, they’d either need to create a new one or search other suppliers for a matching part.

To achieve this, I expect they needed to blend a number of approaches. First, they must have needed to use a data model for a “part” that allowed critical attributes (such as “partNumber” or “partName”) to be null even though the entry itself would need a unique identifier. Second, they would have needed to allow for incremental completion of the record, so as they learned more about the necessary characteristics they’d be able to fill them in.Third, they might have had to have the ability to manage some loosening of precision for storing¬† values (to support a requirement such as “We need a bolt that is between 3 and 5 inches long”).¬† Next they would have had to use a method for searching for matches based on the loose definitions. I could refer to that as partial entity resolution: I am looking for the set of entities that most closely resemble my query entity.

I am pretty sure that most record linkage tools are capable of handling this, given some level of flexibility in defining rules and weights. I also suspect that a number of data mining tools are pretty good at this as well. I can also think of a number of different applications of this idea, such as employment recruiting, looking for persons of interest, looking for partnership or investment opportunities, etc.

More on this topic in upcoming posts…


