Settings Page
Starting with version 4.1.0 of OpenEMPI, the configuration information for your instance is maintained in the embedded graph database instead of in the mpi-config.xml file. The Settings page allows you to modify the configuration parameters that were previously stored in the mpi-config.xml file. It is important that you thoroughly understand what these settings do before you change them since some of them can have considerable impact on your instance.
Global Identifier
The Global Identifier tab is used to control whether OpenEMPI assigns global identifiers to records on your instance. The default value for this setting is to assign global identifiers since that is the desired behavior in most cases. When you enable global identifier assignment, you must also specify the identifier domain that will be associated with the global identifiers. The default value is “OpenEMPI” since it makes it clear that identifier was assigned to a record by OpenEMPI itself but you can set the identifier domain for the global identifier to any domain.
Miscellaneous Settings
The Miscellaneous Settings tab manages the values of various parameters that control the operation of your instance. Although some of the parameters on this tab should be modified to reflect the preferences for your site, others control internal parameters for the instance and should only be changed either through direction of the OpenEMPI support team or if you fully understand the impact of the change. You will need to restart your OpenEMPI instance for these changes to take effect.
The following is a description of each of the parameters that can be configured on the Miscellaneous Settings tab.
File Repository Directory: the directory where OpenEMPI places files that are uploaded to the server such as data files or entity schemas. This is also the directory where exported data will be placed in.
Data Directory: the directory where the embedded databases for each entity are created and maintained in.
Session Duration: the number of seconds that the session is kept alive before it expires. The session identifiers generated upon successful authentication by the REST API will be kept alive if inactive by the amount specified by this parameter.
Consumer Queue Wait Time: the number of seconds that some background processes such as the blocking re-indexing process will wait for records retrieved from the database to become available for processing before the process will assume that all the work has completed. This parameter may need to be increased if the instance has many records (tens of millions) and a slow file system which may cause the retrieval of batches of records to be delayed).
Date Format: date format used internally to parse dates (should not be changed aside from very special situations).
Null Value String: special value used by the REST API to set the value of a parameter to null through an update operation (should not be changed aside from very special situations).
Instance Name: name of the specific instance of OpenEMPI in a clustered deployment.
Enable Identifier Update Notifications: enables the identifier update notification mechanism which will cause the system to queue up entries of events where the global identifier assigned to a record has changed. These notifications can be retrieved using the REST API using the Notification resource. When you enable this feature you will need to select the username of the user that will be able to retrieve these notification events through the REST API, the amount of time that these events will be preserved on the server before they are deleted (only if the process to delete expired events is enabled on the instance), and the identifier domain that is used to track identifier changes. The domain selected here should always match the identifier domain that is used in assigning global identifiers specified in the Global Identifier tab.
Autostart PIX/PDQ Service: indicates whether the listeners for the PIX/PDQ service (HL7v2 bindings) should be started when the server starts or not. The default behavior is to start these listeners.
Enable Transitive Closure Behavior: enabling this feature will cause the system to generate links between records using transitive closure even though the matching rules may not support these links. For example, if there is a link between records A and B and between records A and C, then when this feature is enabled, the system will also link records B and C even though their association may not be supported by the configuration of the matching algorithm.
Preserve Update History: (starting in 4.3.3) when this feature is enabled, the system will preserve a historical record of all the update operations that have been applied to records over time. You may review the history or record update operations through the Search Page by selecting a record from the search results and clicking on the History button.
Remember Manual Classification Decisions: when this feature is enabled, the system will memorize classification decisions made manually by the user and reapply them if the same record pair is evaluated by the matching algorithm in the future. For example, if record pair (A,B) is placed in the review pair queue because the pair was classified as a probable match, when the user reviews the pair and resolves the probable match into a non-match, the system will remember this decision. If an update operation causes a change in one of the two records A or B and the matching algorithm reevaluates the pair as a probable link, the pair will be classified as a non-match automatically since this manual decision had been memorized.
Scheduled Tasks
The Scheduled Tasks tab controls which periodic processes are enabled on your instance for background processing of work. These tasks can be scheduled to run periodically to perform maintenance work on your instance. When you enable one of these tasks, you need to specify a number of parameters that control their behavior.
Schedule Type: used to specify how the periodic task is invoked and the type can be of either fixed rate or fixed delay. Using fixed rate means that the task will be scheduled for execution at a fixed rate even if the previous invocation has not completed yet. Using fixed delay means that the task will be scheduled for execution after a fixed delay from the previous invocation of the task.
Time Unit: indicates the units that are used in the interpretation of the initial delay, delay and period parameters.
Initial Delay: amount of time to wait before the first invocation of the scheduled task.
Delay: amount of time to wait before the next invocation of the scheduled task.
Period: amount of time to wait before periodic invocations of the task under fixed-rate scheduling.
Entity: some of the scheduled tasks perform work for a specific entity and for those tasks you will need to specify the entity against which the task should perform work.
The following is a list of supported scheduled tasks along with an explanation of the work they perform.
Audit Event Remover: this task runs periodically and removes from the system audit events that have expired (were generated a certain amount of time before the current time). The expiration time is specified in days in the applicationContext-resources.xml file using the parameter audit-event-expiration-days.
Data Profiler: this task runs periodically and generates a data profile of all the data in the repository for a specific entity. The preferred approach to generate a data profile is to do it manually since they are resource intensive to generate and preserve in the database.
Identifier Update Notification Remover: this task runs periodically and deletes from the database identifier update notification events that have exceeded their time to live parameter specified in the Miscellaneous Settings tab.
Job Queue Processor: this task runs periodically and initiates background processes to perform long-running processes. For example, re-indexing the blocking indexes or importing records from a file are processes that are performed in the background and are initiated by the Job Queue Processor as a result of a request by the user. The Job Queue Processor should always be enabled.
User Session Remover: this task runs periodically and deletes sessions that have been inactive for longer than the amount of time specified by the session duration parameter.