Added the ability to classify clusters of more than two records that have been linked together using a graphical interface that allows the users to review the field values and associations of all the records involved in the cluster
Added the ability to submit two records for evaluation by the matching algorithm both through the UI and the REST API
Added the ability to link any two records together manually through the UI
Added support for Ground Truth Analysis via the REST API to allow the user to evaluate the accuracy of the algorithm based on a labeled dataset
Added support for the Single Best Record functionality that allows the user to retrieve the most representative record (golden record) from a cluster of records that have been linked together by the matching algorithm
Added the ability to customize the color coding used for displaying vector agreement/disagreement patterns and pairs of records
Updated many libraries to eliminate security vulnerabilities
Fixed an issue with the Reevaluate Probable Links functionality when the Remember Manual Classification decisions has been enabled
Fixed an issue with paging in the User Files page of the UI
Fixed an issue in the displaying of two records side-by-side in the UI when the value for a field on the left-hand side record is null
Fixed an issue with Identifier Domain management using the latest version of Hibernate
Fixed an issue with the Reporting Module using the latest version of Hibernate
Improved the Advanced tab of the Probabilistic Matching algorithm to display the calculated weights for each matching field
Added reports for viewing the matches from one specific domain or from across two domains to allow users to review matches from specific data sources
Added a REST API endpoint for retrieving the link between two specific records, if one exists
Added to the regular expression transformation function a parameter that specifies the value to return when the expression does not match in group mode
Added a REST API endpoint to allow the users to reassign global identifiers to a cluster of records
Developed a utility to delete orphaned identifier update entries that may arise
Added explicit delete of identifier update entries when the parent event is deleted so that we don't rely on foreign key constraints that may have been deleted
Modified the assignment of global identifiers to updated records to ensure that a record with memorized links that is not similar to records it is linked to does not get a new global identifier upon an update
Fixed the Custom Fields screen to disable the add custom field button while saving the custom field setting to prevent the user from entering duplicate entries on sites with many records
Added the default security policy file to the embedded graph database since it complains about not finding it and may cause login issue under certain conditions
Fixed the UI to allow the user to invoke the operation to initialize the probabilistic model
Added option to specific endpoints in the Record and RecordLink resources to allow the caller to select that custom fields should also be included in the list of fields returned for a record
Added a scheduled task that periodically goes through and clears update notification events that are older than a certain date to help with the maintenance of these events
Added the record ID to the identifier update notification messages to make it easier to find the record that was affected by the global identifier change
Moved the configuration of the instance from the mpi-config.xml file to a database instance
Added support to the REST API for managing the administrative configuration parameters for the instance
Added support to the UI for changing administrative configuration parameters besides the algorithm settings
Added support for the Damerau Levenshtein similarity metric
Modified the implementation of the Jaccard similarity metric with one that tokenizes strings in a more flexible way
Added a tab to the search screen for searching for records by their record ID values
Added to the UI the ability to search for specific vector patterns in the probabilistic algorithm using the agreement/disagreement pattern
Added to the UI the ability to specify how to classify vector patterns with one or more null scored fields
Added additional information about links associated to records in different parts of the UI to assist with diagnosing matching issues
Added a new report that summarizes probable vectors by frequency of occurrence
Changed the default behavior of the file loader to remove fields that are blank from the record that is imported
Modified the findOrAdd REST API endpoint to return a status of 201 when a record is created and 200 when existing record(s) are found
Added support for license validation
Added highlighting to matching fields in the filed comparison pages by making matching fields bold
Fixed the UI to require that a file mapping field must be provided for importing a file
Fixed an issue with adding and updating identifier domains through the UI
Added the ability to view vector patterns associated with record pairs presented through the UI
Added the ability to modify the link state of a record with other records in the system through the search interface
Add support for sorting the records returned by the find by attributes endpoint using a single record field value
Fixed the length of the identifier field in the search page of the UI to accommodate very long identifier values
Removed use of deprecated Hibernate APIs
Reduced the log level of the static content filter to reduce the data generated by default
Fixed the validation of the field threshold in the deterministic algorithm
Fixed the editing of matching fields for the probabilistic algorithm through the UI
Fixed the title of the screen that displays record links associated with search results on the UI
Fixed the selection of the identifier domain transformation function for a custom field
Increased the precision of m- and u- values for the probabilistic matching algorithm on the UI
Fixed a concurrency issue with adding a new identifier
Fixed the generation of XML formatted response through the REST API which was not working for a few endpoints
Fixed the generation of the default algorithm configuration when a new entity is added to base it on the existing entity
Upgraded to the latest edition of the graph database along the stable branch
Fixed an issue where when a background job throws an unchecked exception the status of the job is not updated
Fixed an issue with the static content filter not working properly when the application is running behind a reverse-proxy
Fixed an issue with retrieving record links associated with a record using a state of 'A'
Fixed an issue with failing to fully delete an entity that has generated reports associated with it
Fixed an issue with not being able to save a user profile through the UI when a user already existing with this email address
Fixed the edit of a record through the search by identifiers UI path
Fixed the Potential Match Review Detail Report to show correct start date
Fixed the report Duplicate Summary statistics
Fixed an issue where when a background job throws an unchecked exception the status of the job is not updated due to transaction management
Improved the generation of a default blocking and matching algorithm configuration when a new entity is added through the REST API
Fixed an issue with the graph database upgrade where during an import of a backup operation the synchronization of sequences fails
Fixed an issue with the concatenation transformation function where a custom field parameter that uses a special character of ',' or ':' breaks the encoding mechanism that persists the parameters
Fixed an issue where for the Logged Pairs resource the delete operation fails due to the non-transactional nature of the operation under the latest edition of the graph database
Fixed the REST API of the Probabilistic Matching algorithm to ensure that the operation to initialize the classification model is working properly
Developed and released a new report to present frequency counts of block sizes for a blocking round
Added new background jobs to help improve the performance of large instances that have been in production for many years by allowing the users to remove duplicate delete identifiers, expired identifiers and expired records
Modified the update operation operation to remove duplicate deleted identifiers on an update
Added a new REST endpoint to allow users to delete a field from the probabilistic matching configuration without having to update the entire configuration of the algorithm. This also handles the automatic adjustment of model parameters after the field deletion
Added a new REST endpoint to allow users to add a field to the probabilistic matching configuration without having to update the entire configuration of the algorithm. This also handles the automatic default initialization of model parameters for the new field
Added a new REST endpoint to support the background operation to reevaluate all probable links
Implemented performance improvements to the background job that reevaluates probable links
Improved the error messages associated with common error conditions generated through the REST API to make it easier for developers to identify the issue causing the request to fail
Added a REST API endpoint to allow users to remove all global identifiers that have been generated in the repository
Added the ability to retrieve a filtered list of update notifications by filtering by end event date, source event type, and transition type
Added a REST API endpoing for updating records that is more consistent with REST API standards
Standardized the formatting of creation dates for identifier domains
Enhanced the validation of custom field resources created through the REST API to ensure configuration parameters for certain transformation functions are correct
Fixed the update operation of a role to ensure that the permissions list is updated correctly through the REST API
Added an option to the HL7v3 PDQ Supplier to allow the user to collapse multiple records that have been linked together into a single subject with multiple identifiers
Made changes to the software installer so that it works on Windows machines
Fixed the parsing of dates into strings to avoid concurrency related errors under very heavy workloads
Fixed the update user account operation to prevent an issue that was causing it to fail under certain conditions
Updated the implementation of the PIXm and PDQm implementations to reflect the latest versions of the iHE specifications
Updated the PIX and PDQ HL7v3 implementation to bring them up to compliance with the latest specification
The PIXv3 query response does not include the assigningAuthorityName attribute when present
The PDQv3 interface doesn't support the dateOfBirth attribute when the datatype for the date of birth is a string (new default in version 3.6.x)
Invalid audit message format according to the latest IHE/DICOM Specification
PIXv3 Query Result is not respecting the latest IHE Spec: The missing "QueryAck SHALL have a statusCode element" was a bug
The ATNA audit messages are not consistent with the latest IHE specifications
Deleting an entity fails if it already has entity groups and job queue entries associated with the new entity
Validate a new entity to ensure that includes the required fields and at least one attribute
The REST authentication filter was not sending an appropriate error message when the session ID is blank
In the probabilistic algorithm with a manual vector configuration and debug enabled, there is an exception
The update operation on the User Files resource was resetting the dateCreated field
The REST call to get user files should expose the name field of the File object
Importing a file through the Record resource should validate that either userFileId or filename are present in the request (either reference a previously uploaded file or upload one)
Manual matching override rules for null-scored patterns should take precedence over regular rules
Add support for creating and dropping indexes through the REST API
Replace deprecated Hibernate API with newer version
Add the entityVersionId attribute to the blocking configuration object in addition to the legacy entityName attribute.
Extend the CustomFields resource to support updates of Custom Fields rather than expect the user to delete the entry and recreate it.
Add the ability to initiate a long running export records of entity operation via the REST API
Updating an entity through the REST API should allow you to update attributes as well without having to use the entity attribute resource
Add a REST resource to start and stop the PIX/PDQ service
Need to add an operation on the Security resource that authenticates and returns the User resource associated with the authenticated user
Migrated the persistence layer for the graph database to use the latest version of OrientDB 3.0.x
Performed load testing of OpenEMPI with the new OrientDB 3.0 persistence layer; results will be posted on the OpenEMPI web site
Upgraded the implementation of the REST API to the latest edition of Jersey. This introduced some incompatibilities with the OpenEMPI REST API in 3.5.x but unfortunately this is unavoidable
Exposed the audit logs as a REST resource through the REST API
Added to the audit service and persistence layer support for auditing by entity type. The events are now always be returned for a specific entity
Added the ability to specify that a scheduled task should perform its work against a specific, configurable entity
Exposed the Blocking Configuration through the REST API as a resource
Added default blocking and matching configurations when a new entity is created
Replaced the old cache library with a new one that is lighter
Added a REST API to allow a user to reevaluate a record's association to other records without the need for an update operation
Exposed the deterministic and probabilistic matching configurations as REST resources through the REST API
Exposed the User object as a REST Resource through the REST API
Enhanced the installation processes to include support for Apache HTTP installation
Exposed the Logged Links service as a REST resource through the REST API
Exposed the data profile service as a REST Resource through the REST API
Make sure all long-running operations invoked through the REST API create jobs and run asynchronously
Exposed the Job Queue service as a REST resource through the REST API
Added support for manual classification rules of vector patterns that correspond to null scored patterns. This is now available through the mpi-config.xml file
Added to the service layer the ability to retrieve a record along with all its inactive (voided) identifiers
Upgraded the web services test suite to Jersey 2
Fixed an issue where in certain cases when creating a custom field on a new instance caused an exception
Fixed an issue where a concurrent mod exception was generated in creating logged links during load test
Fixed an issue where to blocking algorithm was reporting not being able to load from blocks records that have been deleted. Since the records had been marked as deleted the blocking algorithm was not able to generate index entries for them
Fixed an issue with the blocking service not making use of the consumer queue wait time parameter
Replaced the implementation of the Hibernate's legacy Criteria API which has been deprecated
Upgrade older versions of dependencies on libraries that had been marked with security concerns
Added support for logging to the REST APIs so that it can be optionally turned-on on the server to enable debugging of issues with calls to the interface
Added support to the String Comparator Resource of the REST API for parameters to allow users to evaluate the similarity between strings for similarity metrics that require parameters to be passed down in the request.
Enhanced the Apache Artemis implementation of the notification service to support the propagation of identifier update notifications and use the latest stable version of the messaging service
Improved the performance of the process that generates candidate record pairs to be evaluated by the matching algorithm to reduce the number of record pairs that are generated. This modification can cause a considerable performance improvement to the processing of add and update requests
Modified the Record Link resource to return the internal record id for each link to make easier and more efficient to retrieve detailed information about each link.
Fixed the request for logged links to filter links returned by the entity specified in the request.
Fixed a bug in the processing of the findByMatching request where certain characters in the key-value pairs passed as parameters were causing an exception.
Fixed a bug with the asynchronous persistence of logged links to resolve a concurrent modification exception that would arise.
Enhanced the performance of the File Import service so that when loading millions of records does not require a proportional amount of heap memory anymore.
Enhanced the processing of background jobs so that only one job is processed at a time and jobs are processed in the order in which they are created. This will prevent cases where multiple data-intensive jobs were being processed concurrently and bringing the server to its knees.
Added a new REST Resource to expose the string comparator service. This allows users to test various similarity metrics and thresholds so that they can identify the ideal parameters for their instance.
Added a new REST interface to support loading data from a file without requiring that the records are imported but with matching activated
Fixed the generation of blocks to skip generating blocks for a blockingKeyValue that is all null.
Disable the "Reevaluate task on the Matching page since it was causing confusion among some users
The import schema utility was not properly handling the synchronous vs asynchronous parameter in the serialized schema that was being imported
Improved the handling of background jobs so that if the server stops before a job in the queue is done, any job that is left in "Processing state" will be rescheduled upon startup
String Comparison REST Resource should return No Data found HTTP code when there are no parameters for a given metric instead of an empty list
Uploading a file while the REST interface is actively in use by a different user than the user that is currently logged into the UI would on occasion set the owner of the file to be the API user due to a race condition
Increased the default maximum connection pool size for the relational database so that for most deployments the users don't have to manually fine tune it
Added an attribute to the file loader mapping file that allows the user to specify the number of columns in the file in the Flexible File Loader. This is useful when the data file has lots of optional fields especially in the last columns of the file
Upgraded the release to embed Apache Tomcat 8.5.x
Saving records with invalid date values into a database field of date or timestamp type was being rejected by OrientDB causing the record to fail to be imported; the file loader will now detect and clear such values to allow the records to be imported into the database
Added support for user authentication for OpenEMPI through LDAP instead of the default mechanism. You can read more about this feature and how to configure it here.
Added support for synonyms in matching, which are lists of two or more word that should be considered by the matching algorithm to be identical (Robert and Bob). You can read more about this feature and how to configure it here.
Modify the importFile RESTful web service endpoint to expose more file import features and to make it asynchronous
Modify the persistence of logged links during classification by the probabilistic matching algorithm so that they commit for each batch instead of for the entire operation, making the operation more efficient and scalable for sites with 10s of millions of records
Added support for communication with the LDAP server using StartTLS
Added support for a new feature in the configuration of the probabilistic matching algorithm that memorizes manually classified probable links so that such record pairs don't return to the review queue in the future. You can read more about this feature here.
Add RESTful web service resource to manage synonyms
Fixed a bug where overriding the probabilistic matching algorithm for a specific vector pattern to not match such record pairs, was causing record pairs with that particular pattern to generate an exception
Add a new feature to the matching layer that allows the system to match two records in the case where the values for two fields have been transposed. For example, it is fairly common for the values of the first and last name to be transposed for a given record making it difficult to match the two records together
Add a configuration integration framework to OpenEMPI that now allows for data to flow into and out of OpenEMPI using a wide variety of data sources and sinks. For example, you can setup an instance of OpenEMPI to periodically load data from a database by issuing a query to retrieve the records
Added a new similarity metric that calculates the similarity between two numeric values such that numeric values that are closer together get a higher value
Added a new similarity metric that calculates the similarity between two dates such that dates that are closer together get a higher value
Modified the administrative application to immediately incorporate changes to custom fields so that there is no need to restart the service in order to proceed with subsequent configuration steps
Added a new web services endpoint that returns records that are similar to the record presented by the caller along with a weight indicating the relative similarity between the two records
Added the ability for a site to include a site-specific disclaimer message in the web application before the user is permitted to login
Fixed an issue where the flexible file loader would load records without setting a date for the date created field
Fixed the export process so that it is able to export records with identifiers that have no date created value
Fixed the review links page to allow for sorting by weight or date created to assist the users in locating the specific record pairs to be resolved first
Fixed the logging of the probabilistic algorithm for record links during the evaluation process to reduce the log level
Added full support for 2-way TLS encryption to the HL7 v2 service interface
Upgraded the embedded graph database OrientDB to the latest stable release
Developed better isolation between the notification service and the specific implementation to make it easier to support other messaging brokers in the future
Developed better integration between the PIX/PDQ service and the rest of the application to reduce the number of configuration files
Utilized a new interface in the embedded OrientDB database for smoothly shutting down the database
Made some of the operations that process record links configurable to better support sites with 10s of millions of links
Added support for both JSON and XML messages to the person-based REST API that didn't previously support JSON
Fixed the blocking service to load the latest configuration without requiring the server after configuration changes
Fixed a few minor issues with the single best record module
Added support for transitive closure of record pairs to resolve the issue with conflicting links being created with complex matching rule configurations
Added support for auditing all events (including events to view a record) for HIPPA compliance
Added an interface to the Entity REST API to support paging through all the records in an instance
Added new transformation functions that can be used in the generation of custom fields
Added support for SQLServer as the relational database supporting OpenEMPI
Added support for MySQL as the relational database supporting OpenEMPI
Added the report artifacts as part of the distribution of the commercial edition of OpenEMPI
Improved the support for remote connections to the graph database when used in place of the default embedded mode
Fixed the support for asynchronous matching of records
Added a new global identifier generator module to support the requirements of a customer
Improved the vector configuration screen of the probabilistic matching algorithm by showing available sample record pairs per vector
Improved the performance of bulk import of data by eliminating the generation of notification events
Upgrade and improved the integration with a messaging system switching to ActiveMQ Artemis as the default JMS server
Performed technical refresh of underlying software such as the Spring framework, Hibernate, etc.
A user account can be associated with a specified domain which enables filtering of review workload to associated domain
Added support for fault-tolerance during operation of an instance of OpenEMPI through replication
Added support in the commercial edition of OpenEMPI for collecting metrics about the operation and performance of the system
Fixed an issue where generation of blocking key values generates huge blocks for single blocking field blocks with records that have blank values in the blocking field
Fixed the substring transformation function so that it does not fail if the bounds specified fall outside the range of the field value affected
Fixed a bug where an invalid field parameter used in the Entity REST API could cause a NullPointerException
Fixed a bug where extending the entity schema of the default person entity would cause the Person REST API to fail on certain queries
Fixed a bug where the export function of records from the system would be affected by the caching configuration of the export module
Fixed a bug where the bulk import of records from another instance would cause the sequence generator to get out of sync
Fixed a bug where generating a record link of match type wouldn't be persisted if a record link of probable link type already existed
Upgraded to a more recent version of the underlying graph database. This upgrade provides a number of performance improvements but made it necessary to modify how records from OpenEMPI are persisted at a low level. This change to the persistence of record data requires that a user migrates their data from the 3.2.0 release to the 3.3.0 release using the export/import tools that were developed.
Upgraded to Apache Tomcat version 8.x from 7.x as the standard web application server for deploying OpenEMPI
Developed a tool for exporting data from OpenEMPI and another one for importing data. This pair of tool performs the transformation that is needed for upgrading from the 3.2.0 release (or earlier) to the 3.1.0 release. This tool is available with the commercial edition of OpenEMPI.
Extended the Web Services API to allow users to perform the workflow of importing data all the way through generating all links solely through web services calls. This feature allows customers that automate the process of linking data on a regular basis to perform the whole process programmatically without any manual intervention.
Began the process of migrating the implementation of distance metrics to a different library that is more up-to-date and continues to be maintained. The process will be completed in the next release of OpenEMPI.
[OPENEMPI-185] - Unlink person which hasviodedperson link cause exception
[OPENEMPI-279] - Merge operation does not generate update notifications
[OPENEMPI-293] - Metrics generated by the data profiler seem to be incorrect in some cases.
[OPENEMPI-301] - The find duplicate feature from the record update screen is not working
[OPENEMPI-319] - Sessions expiring are causing the PIX/PDQ to be restarted
[OPENEMPI-320] - Adding a record with an email address fails due to validation error that is not caught and break the UI
[OPENEMPI-321] - Sequence use is not working on an existing database
[OPENEMPI-323] - Issue with handling of massive import declaration
[OPENEMPI-325] - Unlinking record through the user interface doesn't properly update their global identifier
[OPENEMPI-328] - Update before global identifiers have been assigned causes NPE
Added Reporting Capabilities to the 3.2.0 release of the entity edition (Commercial Edition only). The users may now generate reports on the operation of the system. The reporting functionality was developed in an extensible manner so that new report types can be added over time. The current list of reports includes:
Data Profile Summary
Duplicate Summary Statistics
Potential Match Review Summary
Potential Match Review Detail
Added the ability to the user to be able to easily remove the global identifier assigned to all the records in the database. This feature is useful when first setting up an instance of OpenEMPI.
Enhanced the probabilistic matching algorithm with a new feature we call null scoring which improves the matching performance of the algorithm in the presence of null values in matching fields.
Added the ability to the user to be able to run the data profiling process against all records in the repository on demand instead of having to run it as a scheduled background process
Enhanced the performance oflong runningoperations for sites with millions of records. Operations such as assigning global identifiers, rebuilding the indexes of the blocking algorithms and running the matching algorithm against all record pairsdoesnot take advantage of the multiple processing nodes available in a clustered deployment of OpenEMPI (Commercial edition only). Modify the implementation of these operations to take advantages of all the nodes available on the cluster.
Added a new transformation function for custom field generation that changes the case of the associated field to have a certain case. A parameter of the transformation function specifies the case of the transformed field.
Added sequencing of transformation functions to the custom field generation process.Thisallows the user to define a custom field that is generated based onanothercustom fields which implies that transformation functions can now be composed to form much more complex functions from simpler ones.
Added the ability for the user to export all the records and links from the system. It is preferable that the export format is binary so that corruption of the file can be detected during subsequent loading of the file and ideally it should be easy to import the file for further processing in a cloud-based big data environment such as Hadoop.
The process of assigning global identifiers to all records of an instance can take a long time on an instance that has millions of records.
Modified the data profiler process when running against a file, to be able to specify the field delimiter along with the data types of the columns of the file instead of using the fixed field delimiter of the colon ':' character.
The user should be able to invoke the re-indexing of the blocking fields at the specific entity level instead of having to run the process against all entities currently defined on an instance of OpenEMPI.
IssueOPENEMPI-297: Under certain conditions, when trying to unlink two records from each other in the search result screen, the two records that are to be unlinked are not showing up side by side.
IssueOPENEMPI-298: In the search screen after searching for a record andselecingto view the list of records linked to a selected record, the selected record itself was showing up in the list.