This experiment proposes a diagram for displaying the results of a data matching (aka record linkage) problem. In this kind of problems, two different datasets A and B are automatically compared, in order to find pairs of records that refer to the same real-world entity. We took the assumption of representing the results of a constrained matching problem, where the matching function is a one-to-one mapping (a bijective function from A to B). The implication is that the number of matches found in A is equal to the number of matches found in B.
The number of elements in A and B is represented by the length of two justaxposed, “misaligned” bars. A is depicted in brown-orange, while B in different shades of cyan. The number of matches is proportional to the length of the aligned portion of the bars (represented with more vivid colors). The remaining parts show the unmatched records in A (brown) and B (darker cyan).
The diagram could have been made more theoretically correct by aligning the two bars on the same y coordinate, thus keeping both the bars representing unmatched records to the right. This would have enabled a better evaluation of the quantities encoded in the diagram, making it easier to compare the amount of unmatched records found in the two datasets. However, we feel that our design presents a more intuitive depiction of a matching process, yielding a better metaphor than the theoretical approach, sort of a quantitative version of the classic two-sets Venn diagram.
Terminology about data matching tasks is mainly from Cohen et al. 2002.