Chapter 4: SELECTING A WEIGHTING SCHEME & WEIGHTING THRESHOLDS

Blocking efficiency is measured by recall and precision and the same holds true of weighting efficiency. Before measuring its efficiency, however, we must describe how to derive the weights themselves. Weighting is the process by which we determine whether a comparison would be indicative of a match. The more reliable a field is, the more likely that a disagreement in the data will indicate that the comparison is a non-match. The higher the coincidence value (agrees non-coincidentally), the more likely that agreement in the data will indicate that the comparison is a match. The record comparison weight calculations allow us to express this relationship more precisely

Depending on the agreement or disagreement of each field we may assign the appropriate weight to the comparison. A sufficiently high weight should indicate a high probability that the comparison represents a matched record. Ideally an unmatched pair should be assigned a rather low weight. The threshold is the comparison weight above which the pair is to be classed as linked and below which the pair is to be classed as unlinked. The proportion of matched pairs that we thereby class as linked is the weighting recall. The proportion of linked pairs that are actually matched is the weighting precision.

Section 4-1USING ODDS FOR WEIGHTED PROBABILITIES
Section 4-2WEIGHTING EFFICIENCY
Section 4-3SELECTING A THRESHOLD COMPARISON WEIGHT