Using the Synonym Service

Using the Synonym Service

Starting with version 3.5.3 of OpenEMPI, there is now a Synonym Service that you can make use of to improve the match rates of your instance. The Synonym Service can be used to specify lists of values for a particular field in your data model that have values that should be considered to be equivalent during the evaluation of field similarity between two records. For example, you can specify that the values "Robert" and "Bob" should be considered synonyms when present in the first name field of two records. Two records, one of which has the value of "Robert" and the other one the value "Bob" will be considered to be in agreement in their first name field.

To make use of the Synonym Service, all you need to do is load lists of synonyms on your instance for the specific entity and field that you would like them to be applied to. You can manage the lists of synonyms using either a command line utility or the web service for the Synonym Resource. The Synonym Resource is documented here and includes support for retrieving the list of synonyms for a given field, add a list of synonyms, and deleting a synonym list.

Using the Synonym Loader

The synonym loader is a command line utility that you can use to add or remove lists of synonym from an instance of OpenEMPI. In order to use the utility, you must first shut down the OpenEMPI server. You may need to update the JVM memory configuration in the synonymLoader.sh to match the settings you have set in the setenv.sh file. If you are using OpenEMPI version 4.1 or later, you will need to adjust the openempi.data.directory parameter to match the location of the data directory on your instance.

Here is the syntax for the parameters that you can use with the Synonym Loader utility.

$ bin/synonymLoader.sh
Invalid input for synonym loader utility.
usage: synonymLoader
 -a,--attrib <attribute>    Attribute of the entity model to which the
                            synonyms apply.
 -e,--entity <entity>       Entity name for which to modify synonym
                            definitions.
 -f,--filename <filename>   File of synonyms to load into the system.
 -m,--mode <mode>           Specifies the mode used to load synonyms
                            (add/replace).
 -p,--password <password>   Password of account used to connect with
                            OpenEMPI.
 -u,--user <username>       Username of account used to connect with
                            OpenEMPI.

You can add one or more lists of synonyms to the server by using the add mode and the filename parameter to specify the CSV file from which synonyms will be loaded from. The following is a sample synonym list file:

Bob,Robert
Jon,John
Jim,James,Jamie
William,Bill,Will

To add this list of synonyms to the person entity in association with the givenName field, you can use the synonym loader utility with the following parameters (remember to shut down the server before doing this).

$ bin/synonymLoader.sh -u admin -p admin -e person -a givenName -f synonyms.csv