Using Memorized Classification Decisions

Using Memorized Classification Decisions

Starting with version 3.5.3 of OpenEMPI, there is a new feature that allows the system to memorize the classification decisions a user makes in classifying probable links from the Review Queue so, that if an update operation would cause a record pair to be placed back in the Review Queue, the system will apply the memorized, manual classification decision instead. Enabling the feature is just a matter of adding a parameter in the mpi-config.xml file. The name of the property is "remember-manual-classifications" under the admin-configuration section and it should be set to true. If a future update operation causes the two records to change from one of the automatic algorithm classification states (Match, Non-Match) to the other state (a probable pair was manually classified as a match but after the update, the algorithm conclusively classifies this as a non-match), the previously memorized manual classification is removed.

...
    <admin-configuration>
        <session-duration>3600</session-duration>
        <file-repository-directory>fileRepository</file-repository-directory>
        <autostart-pixpdq>true</autostart-pixpdq>
        <data-directory>/sysnet/data</data-directory>
        <session-duration>1800</session-duration>
        <remember-manual-classifications>true</remember-manual-classifications>
...

The following example demonstrates in detail how this feature affects the matching decisions made by the system.

In this OpenEMPI instance, we have enabled this feature and the configuration of the probabilistic algorithm has been set up to classify pairs of records where there is disagreement on the postal code value to classify them as probable links. The record pair for the person by the name "Sean Richmond" is present in the Review Queue since the two records disagree in the value of the Postal Code field. Since it is clear that the two records refer to the same person, we classify this record pair as a match by clicking on the "Link" button on the bottom of the screen. Note that before linking the two records together, as expected the two records have different global identifier values.

 

After the two records are manually classified as a match, both records now have the same global identifier.

.

If we now lookup one of the two records in the pair and perform an update of a field that does not affect the matching decision for the pair, the record pair would normally return back to the Review Queue. But because the "Memorized Manual Classification" feature has been enabled, the two records will remain as a match. To demonstrate that this is the case, we update the first of the two records and set the middle name to "Thomas". Note that after the update operation, the two records still have the same global identifier value.