Adaptive Matching Algorithm

Adaptive Matching Algorithm

The Adaptive Matching Algorithm was introduced starting with version 4.4.0 of OpenEMPI. The Adaptive Matching algorithm in OpenEMPI is an advanced feature that works alongside the Probabilistic Matching Algorithm (PMA). Its core function is to use modern machine learning (AI) techniques to automatically resolve "probable matches"—record pairs that the PMA couldn't definitively classify as matches or non-matches, typically sending them to a manual review queue.

The key features of the Adaptive Matching Algorithm are:

  1. Resolves the Bottleneck: It dramatically increases the efficiency of the manual review of probable matches, saving time and resources for organizations with high data volumes.

  2. Self-Learning: Unlike the PMA, which only uses agreement/disagreement patterns (vector patterns), the Adaptive Matcher learns from actual user classifications and specific data values. This allows it to make more context-aware and accurate decisions.

  3. Automation Potential: Once trained and validated, it can be used to automatically process and clear the entire probable match review queue, accelerating record linkage and data quality efforts.

Using the Adaptive Matching Algorithm involves a process that consists of a number of steps:

  1. It must first be enabled for a particular entity and it is designed to work along with a properly configured Probabilistic Matching Algorithm instead of operating as an independent algorithm

  2. It should then be initialized in a step that involves data analysis of a sample of the full dataset of records followed by selection of a set of pairs of records for labeling by a user

  3. It must then undergo a training process where a user is presented with carefully selected pairs of records that are classified by a user as matches or non-matches

  4. Finally, a classification model is constructed that can be used to assist with the classification of probable matches as matches or non-matches or for the purpose of automatically classifying the complete set of pairs presently in the review queue.

At first, the Adaptive Matching Algorithm is disabled for every entity so, the first step in using this algorithm is to enable it. The figure below shows the Adaptive Matching page under the Matching menu option in the main navigation menu, where the algorithm is disabled by default. To enable the algorithm, simply move the toggle button to the right.

image-20251026-193410.png

Before moving to the next step of initializing the algorithm, you have the option of changing the configuration settings. The default values should work for most cases since we have spent hundreds of hours experimenting with these parameters to come up with default values that will fit most situations. The figure below shows the configuration page that exposes the parameters that control the operation of the algorithm.

image-20251027-115438.png

The next step consists of initializing the algorithm for the dataset that is currently loaded on the system for this entity. To initialize the algorithm, from the “More Options” menu, select the “Initialize Configuration” menu option as shown below. This step may take a while since the algorithm will identify an appropriate sample of the records in the system and analyze them in order to identify the best pairs to present to the user for labeling at the next step. The state of the algorithm will change from “INITIALIZING” to “LABELING” automatically when the initialization process has completed.

image-20251027-115712.png

You can now move on to the next phase of labeling record pairs as matches or non-matches. To do this, select the “Training” tab. For each pair, you are presented with the two records side-by-side. Field values that differ between the two records, will be highlighted to bring your attention to the differences between the records. In the example below, the two records only differ in the Family Name field as it appears that record 5242 includes a typo in that field. After reviewing the two records in the pair, you can choose to classify the pair as either a match or a non-match by pressing the appropriate button at the bottom. Once you have classified the pair, you will be automatically presented with the next pair.

image-20251027-120320.png

In the “Configuration” tab, you can review the counts of pairs that have already been labeled as well as the count of how many are left to review, before the minimum number of labeled samples are available to the algorithm. In the example below, we have labeled 84 pairs and there are 0 pairs left to label, which implies that we can move to the next phase of the algorithm where a classification model is generated to help us process the probable matches that are in the review queue. Note that you can continue labeling more pairs in order to improve the model. When the number of pairs left to label reaches zero, it means that you have achieved the minimum number of pairs that must be labeled before being able to generate a classification model. To generate the model, select the “Generate Model” option from the “More Options” menu. The state of the algorithm will change from “LABELING” to “GENERATING” while the model is being generated. It will then change to “MATCHING” when the model has been generated and is ready for use.

image-20251027-121420.png

Once the classification model of the Adaptive Matching algorithm is ready, when you go to the “Review Probable Links” page to review probable that are present in the queue, when you select a pair to review, at the top right corner you will now see the recommended classification decision made by the Adaptive Matching algorithm model. In the example shown below, the model is predicting that this probable match identified by the Probabilistic Matching algorithm should be classified as a match.

image-20251027-122110.png

If after reviewing the predicted classification decisions of the Adaptive Matching algorithm model, you are confident that its recommendations are always accurate, you have the option of automatically classifying all the probable matches currently in the review queue using this classification model. From the main page of the “Review Probable Links” page, you can select the “Classify all pairs” menu option from the “More Options” menu.

image-20251027-122608.png