Release Notes

Release 4.3.1

  • Upgraded the version of various libraries to eliminate security vulnerabilities that were recently reported

  • Added a framework for evaluating the performance of the blocking algorithm against a ground-truth dataset where the set of duplicate records is known

  • Added a framework for evaluating the performance of the matching algorithm against a ground-truth dataset where the set of duplicate records is known

  • Reduced the startup time of the server especially for sites with millions of records in the repository

  • Fixed an issue with the operation to start and stop the listener service for the HL7v2 PIX/PDQ protocol support

  • Fixed an issue where after adding a new entity using an import statement, you are unable to create a custom field against it.

  • Fixed an issue where after switching to the probabilistic matching algorithm, you are unable to use a custom field as one of the matching fields until after restarting the server

  • Fixed an issue with the cache manager resetting the connection to the configuration database

  • Fixed an issue where after deleting an entity, the configuration of the blocking and matching algorithms associated with that entity were not being deleted

  • Added support for quickly loading the configuration of an entity from a JSON backup to help support customers that need assistance with the configuration of their instances

Release 4.3.0

  • Modified the architecture of OpenEMPI to solely rely on the graph database and to no longer use a relational database to store configuration information.

  • Added a new machine learning-based algorithm that allows the user to determine the best similarity metric (comparator) and threshold to use for a specific matching field specifically for their dataset.

  • Added a data standardization feature that allows the user to identify issues in their dataset and resolve them across the entire dataset

  • Added the ability to switch matching algorithms for a particular entity through the UI

  • Added a dashboard of performance related metrics to the home page of the web manager application to assist users identify resource related issues early and resolve them before they become critical

  • Added a scheduled task that deletes logged links from the database so that they don’t accumulate and cause resource issues after a while.

  • Modified the reports to not use fonts that require additional packages to be installed on some Linux distributions

  • Added more validation of the parameters that are used to configure transformation functions for custom fields

  • Upgraded numerous libraries and packages to recent versions to eliminated any security vulnerabilities that have been identified

  • Added the ability to rebuild individual blocking rounds instead of having to rebuild all blocking rounds at once

  • Added the ability to copy the blocking and matching configuration from one entity to another through the UI

  • Made various changes that improved the performance of the system

Release 4.2.7

  • Added a number of charts on the home page of the administrative application that present performance information about the instance to help administrators identify resource issues early

  • Upgraded a number of libraries to eliminate potential security vulnerabilities

  • Fixed an issue with the REST API where a request to retrieve clusters of a given size fails when a record has an identifier associated with an unknown identifier domain

  • Added a report that presents statistics about probable links grouped by the associated vector number

  • Resolved an issue with some REST API calls where they would fail to serialize the response as an XML document

Release 4.2.6

  • Modified the probabilistic matching algorithm service so that when changing the upper and/or lower bounds through the UI, the classification of each pattern is updated immediately.

  • Added the ability to copy the blocking and matching configuration from one entity to another through the UI or REST API so that it doesn't have to be done manually.

  • Added the ability to rebuild the blocking indexes of a single blocking round so that we can reduce the impact on production instances for rebuilding all rounds at the same time.

  • Upgraded the Postgres and other dependencies that have reported security vulnerabilities

  • Modified the fonts used for some of the reports to eliminate dependencies on the Times New Roman font that is not present by default on all Linux distributions.

Release 4.2.5

  • Added a REST API endpoint that returns information about the candidate records by blocking round identified by the blocking algorithm for evaluation given a record

  • Enhanced the concatenation transformation function to allow for sorting of the values of the fields to be concatenated thereby increasing the effectiveness of the custom field in identifying matching records

  • Performed some optimizations to the blocking service to reduce the potential duplicate evaluation of record pairs by the matching algorithm

  • Fixed a bug where the dates associated with an entity in the UI were not being displayed properly

  • Fixed a bug with the import of an exported entity schema where custom fields were not being imported correctly

  • Fixed an issue in the creation of entity indexes where the naming convention of the indexes were causing confusion to the the database parser in the interpretation of the command

  • Upgraded the versions of various libraries including Artemis and GSON among others that are utilized by OpenEMPI to eliminate security vulnerabilities that have been identified

  • Upgraded the version of the Postgres driver to eliminate a security vulnerability present in an earlier version

Release 4.2.4

  • Fixed a bug with the generation of a Data Profile based on a file where the job was not completing successfully if the data file was not formatted properly

  • Added an option to the user profile page on the UI to allow the user to specify a blank value for the state field

  • Replaced the use of log4j-1 with reload4j to eliminate any concern regarding the vulnerabilities detected in log4j-1

  • Upgraded the version of the Postgres driver used by the application to a more recent version that is free from security vulnerabilities

Release 4.2.3

  • Applied a change to enforce the access permission that a user without the USERS_VIEW permission should not be able to view their own profile

  • Fixed a bug where disabling a user account through the UI would not actually disable the account

  • Made fixes to the PIX Feed v3 interface to ensure that the validation of identifiers doesn’t fail when identifiers are specified using the PIX/PDQv3 convention

  • Fixed the configuration of the deployment of the PIX/PDQ v3 interface to work properly when the interface is enabled

  • Upgraded the reference data lookup code to ensure the hibernate layer works properly with the version of Hibernate included in this release

  • Upgraded the version of Apache CXF used to eliminate any potential security vulnerabilities

  • Fixed the update of global identifiers assigned to records so that when an update changes the association between two clusters of records where each cluster also has manual links from an external source, the global identifiers are not updated correctly

  • Applied a validation check to prevent the search capability from allowing users to query for all records using the wildcard character

Release 4.2.2

  • Fixed a bug where certain data issues would cause the process that regenerates all links to stop before visiting every record

  • Add support to the UI for replacing one matching field with another without having to update the probabilistic model in the probabilistic matching algorithm

  • Extended the concatenation function to allow for the concatenation of more than two fields

Release 4.2.1

  • Added a new configuration setting that allow you to enforce the policy where manual links stored as part of the remember manual classifications feature override classifications by the matching algorithm to link records together

  • Improved the performance of the file load operation by using caching of identifier domain information

  • Fixed an issue where the modal dialog that display the evaluation of the association between two records in the system did not color-code correctly vector patterns that correspond to null-scored vector paterns

  • Improved the performance of the blocking re-indexing operation by implementing lower granularity locking of resources

  • Fixed a bug where the information dialog that pops-up when a record pair is classified manually through the review record pairs page could not be dismissed

Release 4.2.0

  • Added the ability to classify clusters of more than two records that have been linked together using a graphical interface that allows the users to review the field values and associations of all the records involved in the cluster

  • Added the ability to submit two records for evaluation by the matching algorithm both through the UI and the REST API

  • Added the ability to link any two records together manually through the UI

  • Added support for Ground Truth Analysis via the REST API to allow the user to evaluate the accuracy of the algorithm based on a labeled dataset

  • Added support for the Single Best Record functionality that allows the user to retrieve the most representative record (golden record) from a cluster of records that have been linked together by the matching algorithm

  • Added the ability to customize the color coding used for displaying vector agreement/disagreement patterns and pairs of records

  • Updated many libraries to eliminate security vulnerabilities

  • Fixed an issue with the Reevaluate Probable Links functionality when the Remember Manual Classification decisions has been enabled

  • Fixed an issue with paging in the User Files page of the UI

  • Fixed an issue in the displaying of two records side-by-side in the UI when the value for a field on the left-hand side record is null

  • Fixed an issue with Identifier Domain management using the latest version of Hibernate

  • Fixed an issue with the Reporting Module using the latest version of Hibernate

Release 4.1.5

  • Modified the log4j-1.2 jar file distributed with the release to remove classes that although were not used could be potentially exploited as a security vulnerability

  • Fixed an issue in the saving of vector values associated with logging by vectors in the configuration of the probabilistic matching algorithm that affected both the UI and the server

  • Changed the default sorting of attribute data in the display of a data profile to sort by the name of an attribute in ascending order

Release 4.1.4

  • Improved the Advanced tab of the Probabilistic Matching algorithm to display the calculated weights for each matching field

  • Added reports for viewing the matches from one specific domain or from across two domains to allow users to review matches from specific data sources

  • Added a REST API endpoint for retrieving the link between two specific records, if one exists

  • Added to the regular expression transformation function a parameter that specifies the value to return when the expression does not match in group mode

  • Added a REST API endpoint to allow the users to reassign global identifiers to a cluster of records

  • Developed a utility to delete orphaned identifier update entries that may arise

  • Added explicit delete of identifier update entries when the parent event is deleted so that we don't rely on foreign key constraints that may have been deleted

  • Modified the assignment of global identifiers to updated records to ensure that a record with memorized links that is not similar to records it is linked to does not get a new global identifier upon an update

  • Fixed the Custom Fields screen to disable the add custom field button while saving the custom field setting to prevent the user from entering duplicate entries on sites with many records

  • Added the default security policy file to the embedded graph database since it complains about not finding it and may cause login issue under certain conditions

  • Fixed the UI to allow the user to invoke the operation to initialize the probabilistic model

Release 4.1.3

  • Fixed an issue where on large clusters of records with memorized match links and probable links, subsequent updates can cause the system to assign the wrong global identifier

  • Fixed an issue with restoring the configuration database from a JSON backup from another site where an exception occuring due to the id's causing conflicts

  • Fixed an issue with a Hibernate exception was presented to the user when trying to add a record with an identifier that is not known to the system and where the identifier is invalid

  • Changed the access control rules on the UI to allow users with only the permission to review links to be able to view and resolve links without requiring the permission to change the matching configuration

Release 4.1.2

  • Fixed an issue where the UI, when browsing through records in the result screen would display the links from the previous record as a result of a race condition

  • Enforce the requirement that the name of a new entity starts with a letter and not a digit since the underlying database does not permit a digit as the first digit

  • Fixed an issue with the UI to prevent the user without the appropriate permissions from accessing the Operations and Settings pages

  • Fixed an issue with the enforcement of the permissions granted to a user via the roles assigned to the account in requests submitted through the REST API

  • Added an introductory section to the REST API with information on the authentication model along with some examples

Release 4.1.1

  • Fixed an issue with the UI where the vector pattern displayed in association with null-scored vector patterns was not correct

  • Fixed the probabilistic matching algorithm to correctly label links created as a result of a null-scored classification rule instead of labeling them in association with the baseline rule

  • Documented the record cleaning service that was been added to the REST API in 4.1.0

  • Fixed the response of the REST API search for clusters to properly serialize an empty list

  • Fixed an issue where updating an entity through the UI would mark the custom fields for deletion

  • Modified the configuration of the graph database server during installation/upgrade to be done automatically instead of manually

Release 4.1.0

  • Added option to specific endpoints in the Record and RecordLink resources to allow the caller to select that custom fields should also be included in the list of fields returned for a record

  • Added a scheduled task that periodically goes through and clears update notification events that are older than a certain date to help with the maintenance of these events

  • Added the record ID to the identifier update notification messages to make it easier to find the record that was affected by the global identifier change

  • Moved the configuration of the instance from the mpi-config.xml file to a database instance

  • Added support to the REST API for managing the administrative configuration parameters for the instance

  • Added support to the UI for changing administrative configuration parameters besides the algorithm settings

  • Added support for the Damerau Levenshtein similarity metric

  • Modified the implementation of the Jaccard similarity metric with one that tokenizes strings in a more flexible way

  • Added a tab to the search screen for searching for records by their record ID values

  • Added to the UI the ability to search for specific vector patterns in the probabilistic algorithm using the agreement/disagreement pattern

  • Added to the UI the ability to specify how to classify vector patterns with one or more null scored fields

  • Added additional information about links associated to records in different parts of the UI to assist with diagnosing matching issues

  • Added a new report that summarizes probable vectors by frequency of occurrence

  • Changed the default behavior of the file loader to remove fields that are blank from the record that is imported

  • Modified the findOrAdd REST API endpoint to return a status of 201 when a record is created and 200 when existing record(s) are found

  • Added support for license validation

  • Added highlighting to matching fields in the filed comparison pages by making matching fields bold

  • Fixed the UI to require that a file mapping field must be provided for importing a file

  • Fixed an issue with adding and updating identifier domains through the UI

Release 4.0.4

  • Fixed an issue where adding new attributes to an entity through the UI would fail under certain conditions

  • Fixed a cosmetic issue in the logging configuration tab of the probabilistic matching algorithm where the fraction of logged pairs would not be displayed properly

  • Fixed an issue with the assignment of global identifiers in large clusters of records where multiple match links must be remove to separate a subset of the records from the rest of the records in the cluster

  • Fixed the assignment of source to a classification request when invoked through the UI for probable links

  • Fixed an issue where an error was generated if the user brings up the detail view of a record but before they get to view a record link, the link gets deleted through the REST API

  • Fixed an issue with handling the remembering of probable link classifications through the UI

Release 4.0.3

  • Added validation to the file import functionality to require that a file mapping file is provided in the form before the request is processed successfully

  • Fixed the matching algorithm to properly selected manual overrides of matching rules with multiple null-scored fields

  • Added validation to the creation of an entity to ensure that the dash character is not used in the name since the underlying database cannot handle that character

  • Fixed the deletion of an entity that has reports and other objects associated with it

  • Added an option to the REST endpoint that returns links associated with a record to restrict them to only those that are direct links

  • Modified the use to only display direct links in association with a record in the search results since indirect links are causing confusion

Release 4.0.2

  • Added the ability to view vector patterns associated with record pairs presented through the UI

  • Added the ability to modify the link state of a record with other records in the system through the search interface

  • Add support for sorting the records returned by the find by attributes endpoint using a single record field value

  • Fixed the length of the identifier field in the search page of the UI to accommodate very long identifier values

  • Removed use of deprecated Hibernate APIs

  • Reduced the log level of the static content filter to reduce the data generated by default

  • Fixed the validation of the field threshold in the deterministic algorithm

  • Fixed the editing of matching fields for the probabilistic algorithm through the UI

  • Fixed the title of the screen that displays record links associated with search results on the UI

  • Fixed the selection of the identifier domain transformation function for a custom field

  • Increased the precision of m- and u- values for the probabilistic matching algorithm on the UI

  • Fixed a concurrency issue with adding a new identifier

  • Fixed the generation of XML formatted response through the REST API which was not working for a few endpoints

  • Fixed the generation of the default algorithm configuration when a new entity is added to base it on the existing entity

Release 4.0.1

  • Upgraded to the latest edition of the graph database along the stable branch

  • Fixed an issue where when a background job throws an unchecked exception the status of the job is not updated

  • Fixed an issue with the static content filter not working properly when the application is running behind a reverse-proxy

  • Fixed an issue with retrieving record links associated with a record using a state of 'A'

  • Fixed an issue with failing to fully delete an entity that has generated reports associated with it

  • Fixed an issue with not being able to save a user profile through the UI when a user already existing with this email address

  • Fixed the edit of a record through the search by identifiers UI path

  • Fixed the Potential Match Review Detail Report to show correct start date

  • Fixed the report Duplicate Summary statistics

Release 3.6.3

  • Fixed an issue where when a background job throws an unchecked exception the status of the job is not updated due to transaction management

  • Improved the generation of a default blocking and matching algorithm configuration when a new entity is added through the REST API

  • Fixed an issue with the graph database upgrade where during an import of a backup operation the synchronization of sequences fails

  • Fixed an issue with the concatenation transformation function where a custom field parameter that uses a special character of ',' or ':' breaks the encoding mechanism that persists the parameters

  • Fixed an issue where for the Logged Pairs resource the delete operation fails due to the non-transactional nature of the operation under the latest edition of the graph database

  • Fixed the REST API of the Probabilistic Matching algorithm to ensure that the operation to initialize the classification model is working properly

  • Developed and released a new report to present frequency counts of block sizes for a blocking round

  • Added new background jobs to help improve the performance of large instances that have been in production for many years by allowing the users to remove duplicate delete identifiers, expired identifiers and expired records

  • Modified the update operation operation to remove duplicate deleted identifiers on an update

  • Added a new REST endpoint to allow users to delete a field from the probabilistic matching configuration without having to update the entire configuration of the algorithm. This also handles the automatic adjustment of model parameters after the field deletion

  • Added a new REST endpoint to allow users to add a field to the probabilistic matching configuration without having to update the entire configuration of the algorithm. This also handles the automatic default initialization of model parameters for the new field

  • Added a new REST endpoint to support the background operation to reevaluate all probable links

  • Implemented performance improvements to the background job that reevaluates probable links

Release 3.6.2

  • Improved the error messages associated with common error conditions generated through the REST API to make it easier for developers to identify the issue causing the request to fail

  • Added a REST API endpoint to allow users to remove all global identifiers that have been generated in the repository

  • Added the ability to retrieve a filtered list of update notifications by filtering by end event date, source event type, and transition type 

  • Added a REST API endpoing for updating records that is more consistent with REST API standards

  • Standardized the formatting of creation dates for identifier domains

  • Enhanced the validation of custom field resources created through the REST API to ensure configuration parameters for certain transformation functions are correct

  • Fixed the update operation of a role to ensure that the permissions list is updated correctly through the REST API

  • Added an option to the HL7v3 PDQ Supplier to allow the user to collapse multiple records that have been linked together into a single subject with multiple identifiers

  • Made changes to the software installer so that it works on Windows machines

  • Fixed the parsing of dates into strings to avoid concurrency related errors under very heavy workloads

  • Fixed the update user account operation to prevent an issue that was causing it to fail under certain conditions

Release 3.5.9

  •  When searching for records by identifier, the service should try to incorporate the identifier domain in the query request even if only parts of the domain name are specified in the request

  • Added support to the REST API for retrieving notifications for filtering the notification events returned by the ending date, event source, and event transition

  • Added support to the REST API for retrieving probable links (or record links in general) for filtering the links returned by the value of one of the record fields. If this filtering criterion is used, then only record pairs where both records have the specified value in the specified value are returned.

  • Various performance improvements

Release 3.5.8

  •  The findByIdentifiers interface was not always returning the specified number of records through the paging parameters in the case where the query results included identifiers marked for deletion. The interface was fixed to always return the correct number of records based on the paging parameters.

  • Added support to the probabilistic algorithm to allow manually specified matching rules for null-scored patterns to take precedence over the rule for the same vector pattern without null scoring applied

  • Added a new REST interface to allow the caller to request that the matching algorithm reevaluates a record in its association with other records without requiring that the caller updates the record

  • Added support to the probabilistic algorithm for specifying manual matching rules for null-scored vector patterns

  • Added some minor improvements to log file management in the embedded instance of the Tomcat server

Release 3.6.1

  • Updated the implementation of the PIXm and PDQm implementations to reflect the latest versions of the iHE specifications 

  • Updated the PIX and PDQ HL7v3 implementation to bring them up to compliance with the latest specification

  • The PIXv3 query response does not include the assigningAuthorityName attribute when present 

  • The PDQv3 interface doesn't support the dateOfBirth attribute when the datatype for the date of birth is a string (new default in version 3.6.x)

  • Invalid audit message format according to the latest IHE/DICOM Specification

  • PIXv3 Query Result is not respecting the latest IHE Spec: The missing "QueryAck SHALL have a statusCode element" was a bug

  • The ATNA audit messages are not consistent with the latest IHE specifications

  • Deleting an entity fails if it already has entity groups and job queue entries associated with the new entity

  • Validate a new entity to ensure that includes the required fields and at least one attribute

  • The REST authentication filter was not sending an appropriate error message when the session ID is blank

  • In the probabilistic algorithm with a manual vector configuration and debug enabled, there is an exception

  • The update operation on the User Files resource was resetting the dateCreated field

  • The REST call to get user files should expose the name field of the File object

  • Importing a file through the Record resource should validate that either userFileId or filename are present in the request (either reference a previously uploaded file or upload one)

  • Manual matching override rules for null-scored patterns should take precedence over regular rules

  • Add support for creating and dropping indexes through the REST API

  • Replace deprecated Hibernate API with newer version

  • Add the entityVersionId attribute to the blocking configuration object in addition to the legacy entityName attribute.

  • Extend the CustomFields resource to support updates of Custom Fields rather than expect the user to delete the entry and recreate it.

  • Add the ability to initiate a long running export records of entity operation via the REST API

  • Updating an entity through the REST API should allow you to update attributes as well without having to use the entity attribute resource

  • Add a REST resource to start and stop the PIX/PDQ service

  • Need to add an operation on the Security resource that authenticates and returns the User resource associated with the authenticated user

Release 3.6.0

  • Migrated the persistence layer for the graph database to use the latest version of OrientDB 3.0.x

  • Performed load testing of OpenEMPI with the new OrientDB 3.0 persistence layer; results will be posted on the OpenEMPI web site

  • Upgraded the implementation of the REST API to the latest edition of Jersey. This introduced some incompatibilities with the OpenEMPI REST API in 3.5.x but unfortunately this is unavoidable

  • Exposed the audit logs as a REST resource through the REST API

  • Added to the audit service and persistence layer support for auditing by entity type. The events are now always be returned for a specific entity

  • Added the ability to specify that a scheduled task should perform its work against a specific, configurable entity

  • Exposed the Blocking Configuration through the REST API as a resource

  • Added default blocking and matching configurations when a new entity is created

  • Replaced the old cache library with a new one that is lighter

  • Added a REST API to allow a user to reevaluate a record's association to other records without the need for an update operation

  • Exposed the deterministic and probabilistic matching configurations as REST resources through the REST API

  • Exposed the User object as a REST Resource through the REST API

  • Enhanced the installation processes to include support for Apache HTTP installation

  • Exposed the Logged Links service as a REST resource through the REST API

  • Exposed the data profile service as a REST Resource through the REST API

  • Make sure all long-running operations invoked through the REST API create jobs and run asynchronously

  • Exposed the Job Queue service as a REST resource through the REST API

  • Added support for manual classification rules of vector patterns that correspond to null scored patterns. This is now available through the mpi-config.xml file

  • Added to the service layer the ability to retrieve a record along with all its inactive (voided) identifiers

  • Upgraded the web services test suite to Jersey 2

  • Fixed an issue where in certain cases when creating a custom field on a new instance caused an exception

  • Fixed an issue where a concurrent mod exception was generated in creating logged links during load test

  • Fixed an issue where to blocking algorithm was reporting not being able to load from blocks records that have been deleted. Since the records had been marked as deleted the blocking algorithm was not able to generate index entries for them

  • Fixed an issue with the blocking service not making use of the consumer queue wait time parameter

  • Replaced the implementation of the Hibernate's legacy Criteria API which has been deprecated

  • Upgrade older versions of dependencies on libraries that had been marked with security concerns

Release 3.5.7

  • Fixed an issue where under certain conditions when a probable link is updated to a match state, an update notification would not be generated

  • Added support to the REST API for being able to specify what type of record links should be returned in association with a specific record (request match, probable, or both)

  • Added support to the REST API for being able to specify what type of record links should be returned from the record link resource (request match, probable, or both)

  • Applied a number of changes to improve the performance of the REST API

Release 3.5.6

  • Fixed the issue where in certain cases uploading an entity definition was not creating a user file entry of entity type so the file entry was not showing up and an entity definition to be imported

  • Fixed an issue with the PDQ (HL7v2 binding) interface not returning the dateOfBirth field after the default data type for the date of birth field was changed from a date to a string

  • Fixed an issue with the PDQ (HL7v2 binding) interface not returning the phone number field in the PID segment when a phone type field has not been populated

  • Fixed an issue with the REST API where deleting a record by id was not working under a more recent version of the graph database

  • Added support to the service layer for retrieving a record along with all its inactive (voided) identifiers

  • Improved the performance of the REST API for retrieving a specific link by its two endpoints by making it possible for the optimizer to use existing indexes

Release 3.5.5

  • Added support for logging to the REST APIs so that it can be optionally turned-on on the server to enable debugging of issues with calls to the interface

  • Added support to the String Comparator Resource of the REST API for parameters to allow users to evaluate the similarity between strings for similarity metrics that require parameters to be passed down in the request.

  • Enhanced the Apache Artemis implementation of the notification service to support the propagation of identifier update notifications and use the latest stable version of the messaging service

  • Improved the performance of the process that generates candidate record pairs to be evaluated by the matching algorithm to reduce the number of record pairs that are generated. This modification can cause a considerable performance improvement to the processing of add and update requests

  • Modified the Record Link resource to return the internal record id for each link to make easier and more efficient to retrieve detailed information about each link.

  • Fixed the request for logged links to filter links returned by the entity specified in the request.

  • Fixed a bug in the processing of the findByMatching request where certain characters in the key-value pairs passed as parameters were causing an exception.

  • Fixed a bug with the asynchronous persistence of logged links to resolve a concurrent modification exception that would arise.

Release 3.5.4

  • Enhanced the performance of the File Import service so that when loading millions of records does not require a proportional amount of heap memory anymore.

  • Enhanced the processing of background jobs so that only one job is processed at a time and jobs are processed in the order in which they are created. This will prevent cases where multiple data-intensive jobs were being processed concurrently and bringing the server to its knees.

  • Added a new REST Resource to expose the string comparator service. This allows users to test various similarity metrics and thresholds so that they can identify the ideal parameters for their instance.

  • Added a new REST interface to support loading data from a file without requiring that the records are imported but with matching activated

  • Fixed the generation of blocks to skip generating blocks for a blockingKeyValue that is all null.

  • Disable the "Reevaluate task on the Matching page since it was causing confusion among some users

  • The import schema utility was not properly handling the synchronous vs asynchronous parameter in the serialized schema that was being imported

  • Improved the handling of background jobs so that if the server stops before a job in the queue is done, any job that is left in "Processing state" will be rescheduled upon startup

  • String Comparison REST Resource should return No Data found HTTP code when there are no parameters for a given metric instead of an empty list

  • Uploading a file while the REST interface is actively in use by a different user than the user that is currently logged into the UI would on occasion set the owner of the file to be the API user due to a race condition

  • Increased the default maximum connection pool size for the relational database so that for most deployments the users don't have to manually fine tune it

  • Added an attribute to the file loader mapping file that allows the user to specify the number of columns in the file in the Flexible File Loader. This is useful when the data file has lots of optional fields especially in the last columns of the file

  • Upgraded the release to embed Apache Tomcat 8.5.x

  • Saving records with invalid date values into a database field of date or timestamp type was being rejected by OrientDB causing the record to fail to be imported; the file loader will now detect and clear such values to allow the records to be imported into the database

Release 3.5.3

  • Added support for user authentication for OpenEMPI through LDAP instead of the default mechanism. You can read more about this feature and how to configure it here.

  • Added support for synonyms in matching, which are lists of two or more word that should be considered by the matching algorithm to be identical (Robert and Bob). You can read more about this feature and how to configure it here.

  • Modify the importFile RESTful web service endpoint to expose more file import features and to make it asynchronous

  • Modify the persistence of logged links during classification by the probabilistic matching algorithm so that they commit for each batch instead of for the entire operation, making the operation more efficient and scalable for sites with 10s of millions of records

  • Added support for communication with the LDAP server using StartTLS

  • Added support for a new feature in the configuration of the probabilistic matching algorithm that memorizes manually classified probable links so that such record pairs don't return to the review queue in the future. You can read more about this feature here.

  • Add RESTful web service resource to manage synonyms

  • Fixed a bug where overriding the probabilistic matching algorithm for a specific vector pattern to not match such record pairs, was causing record pairs with that particular pattern to generate an exception

Release 3.5.2

  • Add a new feature to the matching layer that allows the system to match two records in the case where the values for two fields have been transposed. For example, it is fairly common for the values of the first and last name to be transposed for a given record making it difficult to match the two records together

  • Add a configuration integration framework to OpenEMPI that now allows for data to flow into and out of OpenEMPI using a wide variety of data sources and sinks. For example, you can setup an instance of OpenEMPI to periodically load data from a database by issuing a query to retrieve the records

  • Added a new similarity metric that calculates the similarity between two numeric values such that numeric values that are closer together get a higher value

  • Added a new similarity metric that calculates the similarity between two dates such that dates that are closer together get a higher value

  • Modified the administrative application to immediately incorporate changes to custom fields so that there is no need to restart the service in order to proceed with subsequent configuration steps

  • Added a new web services endpoint that returns records that are similar to the record presented by the caller along with a weight indicating the relative similarity between the two records

  • Added the ability for a site to include a site-specific disclaimer message in the web application before the user is permitted to login

  • Fixed an issue where the flexible file loader would load records without setting a date for the date created field

  • Fixed the export process so that it is able to export records with identifiers that have no date created value

  • Fixed the review links page to allow for sorting by weight or date created to assist the users in locating the specific record pairs to be resolved first

  • Fixed the logging of the probabilistic algorithm for record links during the evaluation process to reduce the log level

Release 3.5.1

  • Added support for stronger password encryption in the professional edition of OpenEMPI that utilizes the latest encoding algorithms

  • The instance can now be configured to hide encoded passwords from log files to allow for easier sharing of those files

  • A new service is now included that can be configured to periodically delete old records from the audit log.

  • Upgraded the connection pool for the relational database to the Hikari pool that provides much better performance under heavy loads

  • Made it easier to configure the sampling rate of record pairs that are used during the training phase of the probabilistic matching algorithm

  • Fixed an issue where exceptions are generated during shutdown when the instance has not been configured properly

  • Added support for the identifier domain transformed which can now be used in two different transformation modes

  • Added a new report that provides a summary of all the records that have been classified as a match during a period of time

Release 3.5.0

  • Added full support for 2-way TLS encryption to the HL7 v2 service interface

  • Upgraded the embedded graph database OrientDB to the latest stable release

  • Developed better isolation between the notification service and the specific implementation to make it easier to support other messaging brokers in the future

  • Developed better integration between the PIX/PDQ service and the rest of the application to reduce the number of configuration files

  • Utilized a new interface in the embedded OrientDB database for smoothly shutting down the database

  • Made some of the operations that process record links configurable to better support sites with 10s of millions of links

  • Added support for both JSON and XML messages to the person-based REST API that didn't previously support JSON

  • Fixed the blocking service to load the latest configuration without requiring the server after configuration changes

  • Fixed a few minor issues with the single best record module

  • Added support for transitive closure of record pairs to resolve the issue with conflicting links being created with complex matching rule configurations 

Release 3.4.3

  • Added support for the findOrAdd service method to the RESTful web services interface which adds a record only if a matching one is not found in the system

  • Added a configuration parameter to support sites with tens of millions of links which need to be able to configure a larger block size when assigning global identifiers

  • Added a new transformation function which in forming a custom field extracts a value using a regular expression

Release 3.4.2

  • Added support for the findByMatching and findByBlocking service methods through the RESTful web services interface

  • Added support for custom fields in thefindByMatchingand findByBlocking service methods

  • Fixed the deterministic matching algorithm to allow it to use distance metric parameters where present for certain metrics

Release 3.4.1

  • Fixed the support of JSON as the input and output data format for some of the Person REST interface methods that didn't support it

Release 3.4.0

  • Added support for auditing all events (including events to view a record) for HIPPA compliance

  • Added an interface to the Entity REST API to support paging through all the records in an instance

  • Added new transformation functions that can be used in the generation of custom fields

  • Added support for SQLServer as the relational database supporting OpenEMPI

  • Added support for MySQL as the relational database supporting OpenEMPI

  • Added the report artifacts as part of the distribution of the commercial edition of OpenEMPI

  • Improved the support for remote connections to the graph database when used in place of the default embedded mode

  • Fixed the support for asynchronous matching of records

  • Added a new global identifier generator module to support the requirements of a customer

  • Improved the vector configuration screen of the probabilistic matching algorithm by showing available sample record pairs per vector

  • Improved the performance of bulk import of data by eliminating the generation of notification events

  • Upgrade and improved the integration with a messaging system switching to ActiveMQ Artemis as the default JMS server

  • Performed technical refresh of underlying software such as the Spring framework, Hibernate, etc.

  • A user account can be associated with a specified domain which enables filtering of review workload to associated domain

  • Added support for fault-tolerance during operation of an instance of OpenEMPI through replication

  • Added support in the commercial edition of OpenEMPI for collecting metrics about the operation and performance of the system

  • Fixed an issue where generation of blocking key values generates huge blocks for single blocking field blocks with records that have blank values in the blocking field

  • Fixed the substring transformation function so that it does not fail if the bounds specified fall outside the range of the field value affected

  • Fixed a bug where an invalid field parameter used in the Entity REST API could cause a NullPointerException

  • Fixed a bug where extending the entity schema of the default person entity would cause the Person REST API to fail on certain queries

  • Fixed a bug where the export function of records from the system would be affected by the caching configuration of the export module

  • Fixed a bug where the bulk import of records from another instance would cause the sequence generator to get out of sync

  • Fixed a bug where generating a record link of match type wouldn't be persisted if a record link of probable link type already existed

Release 3.3.7

  • [OPENEMPI-373] - Added the ability for the user to clear a field value through the Entity REST API

  • [OPENEMPI-372] - Bad data in a date field can cause blocking key value generation to fail and the blocking re-indexing process to end before processing every record

  • [OPENEMPI-371] - The substring transformation function sometimes fails if the bounds specified fall outside the range of the field value affected

  • [OPENEMPI-370] - In the entity REST API particular invalid values as field parameters can cause a NullPointerException

Release 3.3.6

  • [OPENEMPI-369] - Export process times out when the filesystem on which the database is stored is very slow

  • [OPENEMPI-365] - Release of OrientDB incorporated in 3.3.6 broke some queries used as part of the report generation process

  • [OPENEMPI-364] - Added auditing of records viewed by a user on the administrative console to the log; temporary solution until auditing of viewed records is implemented in a manner consistent with the auditing of other events

  • [OPENEMPI-362] - Algorithm to find connected components is recursive and may run out of default stack space allocated in implementations where the system is not configured properly; algorithm was modified to operate in a non-recursive manner

  • [OPENEMPI-361] - Before running the "link all records" process the operation to clear existing links is not working properly for all matching algorithms

  • [OPENEMPI-357] - Exposed the method findPersonsById through the web service interface; it searches the repository by an identifier and returns all matching records

Release 3.3.5

  • [OPENEMPI-352] - Editing the definition of a custom field fails with duplicate field in entity message

  • [OPENEMPI-351] - Sequence operations in OrientDB during concurrent updates can get out of synch and don't explicitly refresh their state with the database; caching causes the in-memory sequence to get out of synch with the database

  • [OPENEMPI-350] - Formatting a date for comparison purposes uses a non-thread safe call which may fail under intense workloads

  • [OPENEMPI-349] - When starting the database connection, the API currently used to establish initial connections uses default password instead of credentials provided

  • [OPENEMPI-348] - OrientDB released a new version 2.2.17 with a fix that may affect OpenEMPI in operation currently using 2.2.16

Release 3.3.4

  • [OPENEMPI-347] Viewing linked records through the user interface's search results option, sometimes results in records that are not linked through a match link to appear in the list

  • [OPENEMPI-346] Under some circumstances a record may end up having multiple global identifiers. In some scenarios involving more than two records that are linked together using both match and probable links, either through an update or a link operation, a record may end up having multiple global identifiers. It appears that in a recent update to OrientDB, the symantics of the API changed such that when an edge is deleted, the vertices associated with this edge do not end up with a null value for their incoming and outgoing edge but rather end up preserving their link association to a non-existent link.

  • [OPENEMPI-345] Assigning global identifiers on an instance with hundred of thousands of links is slow. The process was modified to operate more efficiently in such environments.

Release 3.3.3

  • [OPENEMPI-343] The flexible file loader uses a parsing library that is missing a null field value when it appears as the last field value in a record.

Release 3.3.2

  • Upgraded the version of OrientDB to 2.2.16 since an issue was fixed in the 2.2.16 release that may impact the operation of OpenEMPI under certain circumstances.

  • [OPENEMPI-342] Blocking service indexing when the blocking service is not configured well creates block for all null values of blocking keys. This slows down the matching process in cases where the blocking value is null for many records

  • [OPENEMPI-341] Importing blocks of records sends notifications to interested parties and this slows down the process some with no real benefit. Disabling the generation of notifications helps improve the performance.

  • [OPENEMPI-340] The exporter process uses the Avro library in such a way that it is caching data from previous records into new records for some fields causing invalid data to be imported.

Release 3.3.1

  • [OPENEMPI-336] After an import operation of records from a previous release, the record sequence is out of synch with the current record id.

  • [OPENEMPI-337] Update of records that generates a link of different state from existing one causes the link not to be saved. For example if through the user interface a link is updated from a probable link to a match link, the new state of the link is not saved properly.

  • [OPENEMPI-338] Update of blocking indexes in some cases doesn't properly create a new link block causing inefficiencies when records need to be evaluated for match status.

Release 3.3.0

  • Upgraded to a more recent version of the underlying graph database. This upgrade provides a number of performance improvements but made it necessary to modify how records from OpenEMPI are persisted at a low level. This change to the persistence of record data requires that a user migrates their data from the 3.2.0 release to the 3.3.0 release using the export/import tools that were developed.

  • Upgraded to Apache Tomcat version 8.x from 7.x as the standard web application server for deploying OpenEMPI

  • Developed a tool for exporting data from OpenEMPI and another one for importing data. This pair of tool performs the transformation that is needed for upgrading from the 3.2.0 release (or earlier) to the 3.1.0 release. This tool is available with the commercial edition of OpenEMPI.

  • Extended the Web Services API to allow users to perform the workflow of importing data all the way through generating all links solely through web services calls. This feature allows customers that automate the process of linking data on a regular basis to perform the whole process programmatically without any manual intervention.

  • Began the process of migrating the implementation of distance metrics to a different library that is more up-to-date and continues to be maintained. The process will be completed in the next release of OpenEMPI.

  • [OPENEMPI-185] - Unlink person which hasviodedperson link cause exception

  • [OPENEMPI-279] - Merge operation does not generate update notifications

  • [OPENEMPI-293] - Metrics generated by the data profiler seem to be incorrect in some cases.

  • [OPENEMPI-301] - The find duplicate feature from the record update screen is not working

  • [OPENEMPI-319] - Sessions expiring are causing the PIX/PDQ to be restarted

  • [OPENEMPI-320] - Adding a record with an email address fails due to validation error that is not caught and break the UI

  • [OPENEMPI-321] - Sequence use is not working on an existing database

  • [OPENEMPI-323] - Issue with handling of massive import declaration

  • [OPENEMPI-325] - Unlinking record through the user interface doesn't properly update their global identifier

  • [OPENEMPI-328] - Update before global identifiers have been assigned causes NPE

Release 3.2.0

  • Added Reporting Capabilities to the 3.2.0 release of the entity edition (Commercial Edition only). The users may now generate reports on the operation of the system. The reporting functionality was developed in an extensible manner so that new report types can be added over time. The current list of reports includes:

    • Event Activity

    • Data Profile Summary

    • Duplicate Summary Statistics

    • Potential Match Review Summary

    • Potential Match Review Detail

  • Added the ability to the user to be able to easily remove the global identifier assigned to all the records in the database. This feature is useful when first setting up an instance of OpenEMPI.

  • Enhanced the probabilistic matching algorithm with a new feature we call null scoring which improves the matching performance of the algorithm in the presence of null values in matching fields.

  • Added the ability to the user to be able to run the data profiling process against all records in the repository on demand instead of having to run it as a scheduled background process

  • Enhanced the performance oflong runningoperations for sites with millions of records. Operations such as assigning global identifiers, rebuilding the indexes of the blocking algorithms and running the matching algorithm against all record pairsdoesnot take advantage of the multiple processing nodes available in a clustered deployment of OpenEMPI (Commercial edition only). Modify the implementation of these operations to take advantages of all the nodes available on the cluster.

  • Added a new transformation function for custom field generation that changes the case of the associated field to have a certain case. A parameter of the transformation function specifies the case of the transformed field.

  • Added sequencing of transformation functions to the custom field generation process.Thisallows the user to define a custom field that is generated based onanothercustom fields which implies that transformation functions can now be composed to form much more complex functions from simpler ones.

  • Added the ability for the user to export all the records and links from the system. It is preferable that the export format is binary so that corruption of the file can be detected during subsequent loading of the file and ideally it should be easy to import the file for further processing in a cloud-based big data environment such as Hadoop.

  • The process of assigning global identifiers to all records of an instance can take a long time on an instance that has millions of records.

  • Modified the data profiler process when running against a file, to be able to specify the field delimiter along with the data types of the columns of the file instead of using the fixed field delimiter of the colon ':' character.

  • The user should be able to invoke the re-indexing of the blocking fields at the specific entity level instead of having to run the process against all entities currently defined on an instance of OpenEMPI.

  • IssueOPENEMPI-297: Under certain conditions, when trying to unlink two records from each other in the search result screen, the two records that are to be unlinked are not showing up side by side.

  • IssueOPENEMPI-298: In the search screen after searching for a record andselecingto view the list of records linked to a selected record, the selected record itself was showing up in the list.