In order to get at precision we need to estimate the noise and the noise depends on how many duplicates there are in the test database. Duplicates come in groups. Most of the time a duplicate group is a pair, but sometimes it is a triple, a quadruple, or an even larger group. The distribution of pairs, triples, etc., depends on the duplication rate and the size of the file. The larger the sample file and the higher the duplication rate, the more likely there will be larger sized duplicate groups.
| 3-2.1 | Two definitions for duplication rate. | ||
| 3-2.2 | Proportions of records. | ||
| 3-2.3 | Calculating an entity duplication rate. | ||
| 3-2.4 | Estimating numbers of duplicate n-tuples. | ||
| 3-2.5 | Anomalous duplication rates. | ||