Chapter 5: IMPORTANCE OF FIELD DATA DEPENDENCE

In the process of record linkage it is important to compare the values in the fields of the query to the values in the corresponding fields of the record being compared. This chapter and the next address in detail the problems arising from the fact that there is sometimes a dependence between the values in the various fields within either of these records. We mentioned in (cf. ¶ 4-3.1) that we do not have a model for the frequency density distribution of the outcome weights. In this case it is important to take dependence into account. This allows us to estimate the probability that a particular comparison weight will occur for a pair of records whether matched or unmatched. Another place is in actually calculating the record comparison weight. The principle of adding together the agreement, disagreement (and missing) weights of the fields of the record to get a record comparison weight (cf. ¶ 4-1.5) depends critically on the fact that the probabilities represented are independent (cf. ¶ 2-2.6) and that we may therefore multiply their corresponding odds (above zero) together to get a value proportional to the probability of the combination occurring. And yet we will see that there are situations where ignoring the assumption that the field values are independent will mislead us into accepting incorrect record comparison weights.

Section 5-1OUTCOME PROBABILITIES
Section 5-2ACCOUNTING FOR PRESENCE DEPENDENCE
Section 5-3ACCOUNTING FOR VALUE DEPENDENCE
Section 5-4VALUE SPECIFICITY & CO-DEPENDENCE