Applied a change to enforce the access permission that a user without the USERS_VIEW permission should not be able to view their own profile
Fixed a bug where disabling a user account through the UI would not actually disable the account
Made fixes to the PIX Feed v3 interface to ensure that the validation of identifiers doesn’t fail when identifiers are specified using the PIX/PDQv3 convention
Fixed the configuration of the deployment of the PIX/PDQ v3 interface to work properly when the interface is enabled
Upgraded the reference data lookup code to ensure the hibernate layer works properly with the version of Hibernate included in this release
Upgraded the version of Apache CXF used to eliminate any potential security vulnerabilities
Fixed the update of global identifiers assigned to records so that when an update changes the association between two clusters of records where each cluster also has manual links from an external source, the global identifiers are not updated correctly
Applied a validation check to prevent the search capability from allowing users to query for all records using the wildcard character
Fixed a bug where certain data issues would cause the process that regenerates all links to stop before visiting every record
Add support to the UI for replacing one matching field with another without having to update the probabilistic model in the probabilistic matching algorithm
Extended the concatenation function to allow for the concatenation of more than two fields
Added a new configuration setting that allow you to enforce the policy where manual links stored as part of the remember manual classifications feature override classifications by the matching algorithm to link records together
Improved the performance of the file load operation by using caching of identifier domain information
Fixed an issue where the modal dialog that display the evaluation of the association between two records in the system did not color-code correctly vector patterns that correspond to null-scored vector paterns
Improved the performance of the blocking re-indexing operation by implementing lower granularity locking of resources
Fixed a bug where the information dialog that pops-up when a record pair is classified manually through the review record pairs page could not be dismissed
Added the ability to classify clusters of more than two records that have been linked together using a graphical interface that allows the users to review the field values and associations of all the records involved in the cluster
Added the ability to submit two records for evaluation by the matching algorithm both through the UI and the REST API
Added the ability to link any two records together manually through the UI
Added support for Ground Truth Analysis via the REST API to allow the user to evaluate the accuracy of the algorithm based on a labeled dataset
Added support for the Single Best Record functionality that allows the user to retrieve the most representative record (golden record) from a cluster of records that have been linked together by the matching algorithm
Added the ability to customize the color coding used for displaying vector agreement/disagreement patterns and pairs of records
Updated many libraries to eliminate security vulnerabilities
Fixed an issue with the Reevaluate Probable Links functionality when the Remember Manual Classification decisions has been enabled
Fixed an issue with paging in the User Files page of the UI
Fixed an issue in the displaying of two records side-by-side in the UI when the value for a field on the left-hand side record is null
Fixed an issue with Identifier Domain management using the latest version of Hibernate
Fixed an issue with the Reporting Module using the latest version of Hibernate
Modified the log4j-1.2 jar file distributed with the release to remove classes that although were not used could be potentially exploited as a security vulnerability
Fixed an issue in the saving of vector values associated with logging by vectors in the configuration of the probabilistic matching algorithm that affected both the UI and the server
Changed the default sorting of attribute data in the display of a data profile to sort by the name of an attribute in ascending order
Improved the Advanced tab of the Probabilistic Matching algorithm to display the calculated weights for each matching field
Added reports for viewing the matches from one specific domain or from across two domains to allow users to review matches from specific data sources
Added a REST API endpoint for retrieving the link between two specific records, if one exists
Added to the regular expression transformation function a parameter that specifies the value to return when the expression does not match in group mode
Added a REST API endpoint to allow the users to reassign global identifiers to a cluster of records
Developed a utility to delete orphaned identifier update entries that may arise
Added explicit delete of identifier update entries when the parent event is deleted so that we don't rely on foreign key constraints that may have been deleted
Modified the assignment of global identifiers to updated records to ensure that a record with memorized links that is not similar to records it is linked to does not get a new global identifier upon an update
Fixed the Custom Fields screen to disable the add custom field button while saving the custom field setting to prevent the user from entering duplicate entries on sites with many records
Added the default security policy file to the embedded graph database since it complains about not finding it and may cause login issue under certain conditions
Fixed the UI to allow the user to invoke the operation to initialize the probabilistic model
Fixed an issue where on large clusters of records with memorized match links and probable links, subsequent updates can cause the system to assign the wrong global identifier
Fixed an issue with restoring the configuration database from a JSON backup from another site where an exception occuring due to the id's causing conflicts
Fixed an issue with a Hibernate exception was presented to the user when trying to add a record with an identifier that is not known to the system and where the identifier is invalid
Changed the access control rules on the UI to allow users with only the permission to review links to be able to view and resolve links without requiring the permission to change the matching configuration
Fixed an issue where the UI, when browsing through records in the result screen would display the links from the previous record as a result of a race condition
Enforce the requirement that the name of a new entity starts with a letter and not a digit since the underlying database does not permit a digit as the first digit
Fixed an issue with the UI to prevent the user without the appropriate permissions from accessing the Operations and Settings pages
Fixed an issue with the enforcement of the permissions granted to a user via the roles assigned to the account in requests submitted through the REST API
Added an introductory section to the REST API with information on the authentication model along with some examples
Fixed an issue with the UI where the vector pattern displayed in association with null-scored vector patterns was not correct
Fixed the probabilistic matching algorithm to correctly label links created as a result of a null-scored classification rule instead of labeling them in association with the baseline rule
Documented the record cleaning service that was been added to the REST API in 4.1.0
Fixed the response of the REST API search for clusters to properly serialize an empty list
Fixed an issue where updating an entity through the UI would mark the custom fields for deletion
Modified the configuration of the graph database server during installation/upgrade to be done automatically instead of manually
Added option to specific endpoints in the Record and RecordLink resources to allow the caller to select that custom fields should also be included in the list of fields returned for a record
Added a scheduled task that periodically goes through and clears update notification events that are older than a certain date to help with the maintenance of these events
Added the record ID to the identifier update notification messages to make it easier to find the record that was affected by the global identifier change
Moved the configuration of the instance from the mpi-config.xml file to a database instance
Added support to the REST API for managing the administrative configuration parameters for the instance
Added support to the UI for changing administrative configuration parameters besides the algorithm settings
Added support for the Damerau Levenshtein similarity metric
Modified the implementation of the Jaccard similarity metric with one that tokenizes strings in a more flexible way
Added a tab to the search screen for searching for records by their record ID values
Added to the UI the ability to search for specific vector patterns in the probabilistic algorithm using the agreement/disagreement pattern
Added to the UI the ability to specify how to classify vector patterns with one or more null scored fields
Added additional information about links associated to records in different parts of the UI to assist with diagnosing matching issues
Added a new report that summarizes probable vectors by frequency of occurrence
Changed the default behavior of the file loader to remove fields that are blank from the record that is imported
Modified the findOrAdd REST API endpoint to return a status of 201 when a record is created and 200 when existing record(s) are found
Added support for license validation
Added highlighting to matching fields in the filed comparison pages by making matching fields bold
Fixed the UI to require that a file mapping field must be provided for importing a file
Fixed an issue with adding and updating identifier domains through the UI
Fixed an issue where adding new attributes to an entity through the UI would fail under certain conditions
Fixed a cosmetic issue in the logging configuration tab of the probabilistic matching algorithm where the fraction of logged pairs would not be displayed properly
Fixed an issue with the assignment of global identifiers in large clusters of records where multiple match links must be remove to separate a subset of the records from the rest of the records in the cluster
Fixed the assignment of source to a classification request when invoked through the UI for probable links
Fixed an issue where an error was generated if the user brings up the detail view of a record but before they get to view a record link, the link gets deleted through the REST API
Fixed an issue with handling the remembering of probable link classifications through the UI
Added validation to the file import functionality to require that a file mapping file is provided in the form before the request is processed successfully
Fixed the matching algorithm to properly selected manual overrides of matching rules with multiple null-scored fields
Added validation to the creation of an entity to ensure that the dash character is not used in the name since the underlying database cannot handle that character
Fixed the deletion of an entity that has reports and other objects associated with it
Added an option to the REST endpoint that returns links associated with a record to restrict them to only those that are direct links
Modified the use to only display direct links in association with a record in the search results since indirect links are causing confusion
Added the ability to view vector patterns associated with record pairs presented through the UI
Added the ability to modify the link state of a record with other records in the system through the search interface
Add support for sorting the records returned by the find by attributes endpoint using a single record field value
Fixed the length of the identifier field in the search page of the UI to accommodate very long identifier values
Removed use of deprecated Hibernate APIs
Reduced the log level of the static content filter to reduce the data generated by default
Fixed the validation of the field threshold in the deterministic algorithm
Fixed the editing of matching fields for the probabilistic algorithm through the UI
Fixed the title of the screen that displays record links associated with search results on the UI
Fixed the selection of the identifier domain transformation function for a custom field
Increased the precision of m- and u- values for the probabilistic matching algorithm on the UI
Fixed a concurrency issue with adding a new identifier
Fixed the generation of XML formatted response through the REST API which was not working for a few endpoints
Fixed the generation of the default algorithm configuration when a new entity is added to base it on the existing entity
Upgraded to the latest edition of the graph database along the stable branch
Fixed an issue where when a background job throws an unchecked exception the status of the job is not updated
Fixed an issue with the static content filter not working properly when the application is running behind a reverse-proxy
Fixed an issue with retrieving record links associated with a record using a state of 'A'
Fixed an issue with failing to fully delete an entity that has generated reports associated with it
Fixed an issue with not being able to save a user profile through the UI when a user already existing with this email address
Fixed the edit of a record through the search by identifiers UI path
Fixed the Potential Match Review Detail Report to show correct start date
Fixed the report Duplicate Summary statistics
Fixed an issue where when a background job throws an unchecked exception the status of the job is not updated due to transaction management
Improved the generation of a default blocking and matching algorithm configuration when a new entity is added through the REST API
Fixed an issue with the graph database upgrade where during an import of a backup operation the synchronization of sequences fails
Fixed an issue with the concatenation transformation function where a custom field parameter that uses a special character of ',' or ':' breaks the encoding mechanism that persists the parameters
Fixed an issue where for the Logged Pairs resource the delete operation fails due to the non-transactional nature of the operation under the latest edition of the graph database
Fixed the REST API of the Probabilistic Matching algorithm to ensure that the operation to initialize the classification model is working properly
Developed and released a new report to present frequency counts of block sizes for a blocking round
Added new background jobs to help improve the performance of large instances that have been in production for many years by allowing the users to remove duplicate delete identifiers, expired identifiers and expired records
Modified the update operation operation to remove duplicate deleted identifiers on an update
Added a new REST endpoint to allow users to delete a field from the probabilistic matching configuration without having to update the entire configuration of the algorithm. This also handles the automatic adjustment of model parameters after the field deletion
Added a new REST endpoint to allow users to add a field to the probabilistic matching configuration without having to update the entire configuration of the algorithm. This also handles the automatic default initialization of model parameters for the new field
Added a new REST endpoint to support the background operation to reevaluate all probable links
Implemented performance improvements to the background job that reevaluates probable links
Improved the error messages associated with common error conditions generated through the REST API to make it easier for developers to identify the issue causing the request to fail
Added a REST API endpoint to allow users to remove all global identifiers that have been generated in the repository
Added the ability to retrieve a filtered list of update notifications by filtering by end event date, source event type, and transition type
Added a REST API endpoing for updating records that is more consistent with REST API standards
Standardized the formatting of creation dates for identifier domains
Enhanced the validation of custom field resources created through the REST API to ensure configuration parameters for certain transformation functions are correct
Fixed the update operation of a role to ensure that the permissions list is updated correctly through the REST API
Added an option to the HL7v3 PDQ Supplier to allow the user to collapse multiple records that have been linked together into a single subject with multiple identifiers
Made changes to the software installer so that it works on Windows machines
Fixed the parsing of dates into strings to avoid concurrency related errors under very heavy workloads
Fixed the update user account operation to prevent an issue that was causing it to fail under certain conditions
When searching for records by identifier, the service should try to incorporate the identifier domain in the query request even if only parts of the domain name are specified in the request
Added support to the REST API for retrieving notifications for filtering the notification events returned by the ending date, event source, and event transition
Added support to the REST API for retrieving probable links (or record links in general) for filtering the links returned by the value of one of the record fields. If this filtering criterion is used, then only record pairs where both records have the specified value in the specified value are returned.
Various performance improvements
The findByIdentifiers interface was not always returning the specified number of records through the paging parameters in the case where the query results included identifiers marked for deletion. The interface was fixed to always return the correct number of records based on the paging parameters.
Added support to the probabilistic algorithm to allow manually specified matching rules for null-scored patterns to take precedence over the rule for the same vector pattern without null scoring applied
Added a new REST interface to allow the caller to request that the matching algorithm reevaluates a record in its association with other records without requiring that the caller updates the record
Added support to the probabilistic algorithm for specifying manual matching rules for null-scored vector patterns
Added some minor improvements to log file management in the embedded instance of the Tomcat server
Updated the implementation of the PIXm and PDQm implementations to reflect the latest versions of the iHE specifications
Updated the PIX and PDQ HL7v3 implementation to bring them up to compliance with the latest specification
The PIXv3 query response does not include the assigningAuthorityName attribute when present
The PDQv3 interface doesn't support the dateOfBirth attribute when the datatype for the date of birth is a string (new default in version 3.6.x)
Invalid audit message format according to the latest IHE/DICOM Specification
PIXv3 Query Result is not respecting the latest IHE Spec: The missing "QueryAck SHALL have a statusCode element" was a bug
The ATNA audit messages are not consistent with the latest IHE specifications
Deleting an entity fails if it already has entity groups and job queue entries associated with the new entity
Validate a new entity to ensure that includes the required fields and at least one attribute
The REST authentication filter was not sending an appropriate error message when the session ID is blank
In the probabilistic algorithm with a manual vector configuration and debug enabled, there is an exception
The update operation on the User Files resource was resetting the dateCreated field
The REST call to get user files should expose the name field of the File object
Importing a file through the Record resource should validate that either userFileId or filename are present in the request (either reference a previously uploaded file or upload one)
Manual matching override rules for null-scored patterns should take precedence over regular rules
Add support for creating and dropping indexes through the REST API
Replace deprecated Hibernate API with newer version
Add the entityVersionId attribute to the blocking configuration object in addition to the legacy entityName attribute.
Extend the CustomFields resource to support updates of Custom Fields rather than expect the user to delete the entry and recreate it.
Add the ability to initiate a long running export records of entity operation via the REST API
Updating an entity through the REST API should allow you to update attributes as well without having to use the entity attribute resource
Add a REST resource to start and stop the PIX/PDQ service
Need to add an operation on the Security resource that authenticates and returns the User resource associated with the authenticated user
Migrated the persistence layer for the graph database to use the latest version of OrientDB 3.0.x
Performed load testing of OpenEMPI with the new OrientDB 3.0 persistence layer; results will be posted on the OpenEMPI web site
Upgraded the implementation of the REST API to the latest edition of Jersey. This introduced some incompatibilities with the OpenEMPI REST API in 3.5.x but unfortunately this is unavoidable
Exposed the audit logs as a REST resource through the REST API
Added to the audit service and persistence layer support for auditing by entity type. The events are now always be returned for a specific entity
Added the ability to specify that a scheduled task should perform its work against a specific, configurable entity
Exposed the Blocking Configuration through the REST API as a resource
Added default blocking and matching configurations when a new entity is created
Replaced the old cache library with a new one that is lighter
Added a REST API to allow a user to reevaluate a record's association to other records without the need for an update operation
Exposed the deterministic and probabilistic matching configurations as REST resources through the REST API
Exposed the User object as a REST Resource through the REST API
Enhanced the installation processes to include support for Apache HTTP installation
Exposed the Logged Links service as a REST resource through the REST API
Exposed the data profile service as a REST Resource through the REST API
Make sure all long-running operations invoked through the REST API create jobs and run asynchronously
Exposed the Job Queue service as a REST resource through the REST API
Added support for manual classification rules of vector patterns that correspond to null scored patterns. This is now available through the mpi-config.xml file
Added to the service layer the ability to retrieve a record along with all its inactive (voided) identifiers
Upgraded the web services test suite to Jersey 2
Fixed an issue where in certain cases when creating a custom field on a new instance caused an exception
Fixed an issue where a concurrent mod exception was generated in creating logged links during load test
Fixed an issue where to blocking algorithm was reporting not being able to load from blocks records that have been deleted. Since the records had been marked as deleted the blocking algorithm was not able to generate index entries for them
Fixed an issue with the blocking service not making use of the consumer queue wait time parameter
Replaced the implementation of the Hibernate's legacy Criteria API which has been deprecated
Upgrade older versions of dependencies on libraries that had been marked with security concerns
Fixed an issue where under certain conditions when a probable link is updated to a match state, an update notification would not be generated
Added support to the REST API for being able to specify what type of record links should be returned in association with a specific record (request match, probable, or both)
Added support to the REST API for being able to specify what type of record links should be returned from the record link resource (request match, probable, or both)
Applied a number of changes to improve the performance of the REST API
Fixed the issue where in certain cases uploading an entity definition was not creating a user file entry of entity type so the file entry was not showing up and an entity definition to be imported
Fixed an issue with the PDQ (HL7v2 binding) interface not returning the dateOfBirth field after the default data type for the date of birth field was changed from a date to a string
Fixed an issue with the PDQ (HL7v2 binding) interface not returning the phone number field in the PID segment when a phone type field has not been populated
Fixed an issue with the REST API where deleting a record by id was not working under a more recent version of the graph database
Added support to the service layer for retrieving a record along with all its inactive (voided) identifiers
Improved the performance of the REST API for retrieving a specific link by its two endpoints by making it possible for the optimizer to use existing indexes
Added support for logging to the REST APIs so that it can be optionally turned-on on the server to enable debugging of issues with calls to the interface
Added support to the String Comparator Resource of the REST API for parameters to allow users to evaluate the similarity between strings for similarity metrics that require parameters to be passed down in the request.
Enhanced the Apache Artemis implementation of the notification service to support the propagation of identifier update notifications and use the latest stable version of the messaging service
Improved the performance of the process that generates candidate record pairs to be evaluated by the matching algorithm to reduce the number of record pairs that are generated. This modification can cause a considerable performance improvement to the processing of add and update requests
Modified the Record Link resource to return the internal record id for each link to make easier and more efficient to retrieve detailed information about each link.
Fixed the request for logged links to filter links returned by the entity specified in the request.
Fixed a bug in the processing of the findByMatching request where certain characters in the key-value pairs passed as parameters were causing an exception.
Fixed a bug with the asynchronous persistence of logged links to resolve a concurrent modification exception that would arise.
Enhanced the performance of the File Import service so that when loading millions of records does not require a proportional amount of heap memory anymore.
Enhanced the processing of background jobs so that only one job is processed at a time and jobs are processed in the order in which they are created. This will prevent cases where multiple data-intensive jobs were being processed concurrently and bringing the server to its knees.
Added a new REST Resource to expose the string comparator service. This allows users to test various similarity metrics and thresholds so that they can identify the ideal parameters for their instance.
Added a new REST interface to support loading data from a file without requiring that the records are imported but with matching activated
Fixed the generation of blocks to skip generating blocks for a blockingKeyValue that is all null.
Disable the "Reevaluate task on the Matching page since it was causing confusion among some users
The import schema utility was not properly handling the synchronous vs asynchronous parameter in the serialized schema that was being imported
Improved the handling of background jobs so that if the server stops before a job in the queue is done, any job that is left in "Processing state" will be rescheduled upon startup
String Comparison REST Resource should return No Data found HTTP code when there are no parameters for a given metric instead of an empty list
Uploading a file while the REST interface is actively in use by a different user than the user that is currently logged into the UI would on occasion set the owner of the file to be the API user due to a race condition
Increased the default maximum connection pool size for the relational database so that for most deployments the users don't have to manually fine tune it
Added an attribute to the file loader mapping file that allows the user to specify the number of columns in the file in the Flexible File Loader. This is useful when the data file has lots of optional fields especially in the last columns of the file
Upgraded the release to embed Apache Tomcat 8.5.x
Saving records with invalid date values into a database field of date or timestamp type was being rejected by OrientDB causing the record to fail to be imported; the file loader will now detect and clear such values to allow the records to be imported into the database
Added support for user authentication for OpenEMPI through LDAP instead of the default mechanism. You can read more about this feature and how to configure it here.
Added support for synonyms in matching, which are lists of two or more word that should be considered by the matching algorithm to be identical (Robert and Bob). You can read more about this feature and how to configure it here.
Modify the importFile RESTful web service endpoint to expose more file import features and to make it asynchronous
Modify the persistence of logged links during classification by the probabilistic matching algorithm so that they commit for each batch instead of for the entire operation, making the operation more efficient and scalable for sites with 10s of millions of records
Added support for communication with the LDAP server using StartTLS
Added support for a new feature in the configuration of the probabilistic matching algorithm that memorizes manually classified probable links so that such record pairs don't return to the review queue in the future. You can read more about this feature here.
Add RESTful web service resource to manage synonyms
Fixed a bug where overriding the probabilistic matching algorithm for a specific vector pattern to not match such record pairs, was causing record pairs with that particular pattern to generate an exception
Add a new feature to the matching layer that allows the system to match two records in the case where the values for two fields have been transposed. For example, it is fairly common for the values of the first and last name to be transposed for a given record making it difficult to match the two records together
Add a configuration integration framework to OpenEMPI that now allows for data to flow into and out of OpenEMPI using a wide variety of data sources and sinks. For example, you can setup an instance of OpenEMPI to periodically load data from a database by issuing a query to retrieve the records
Added a new similarity metric that calculates the similarity between two numeric values such that numeric values that are closer together get a higher value
Added a new similarity metric that calculates the similarity between two dates such that dates that are closer together get a higher value
Modified the administrative application to immediately incorporate changes to custom fields so that there is no need to restart the service in order to proceed with subsequent configuration steps
Added a new web services endpoint that returns records that are similar to the record presented by the caller along with a weight indicating the relative similarity between the two records
Added the ability for a site to include a site-specific disclaimer message in the web application before the user is permitted to login
Fixed an issue where the flexible file loader would load records without setting a date for the date created field
Fixed the export process so that it is able to export records with identifiers that have no date created value
Fixed the review links page to allow for sorting by weight or date created to assist the users in locating the specific record pairs to be resolved first
Fixed the logging of the probabilistic algorithm for record links during the evaluation process to reduce the log level
Added support for stronger password encryption in the professional edition of OpenEMPI that utilizes the latest encoding algorithms
The instance can now be configured to hide encoded passwords from log files to allow for easier sharing of those files
A new service is now included that can be configured to periodically delete old records from the audit log.
Upgraded the connection pool for the relational database to the Hikari pool that provides much better performance under heavy loads
Made it easier to configure the sampling rate of record pairs that are used during the training phase of the probabilistic matching algorithm
Fixed an issue where exceptions are generated during shutdown when the instance has not been configured properly
Added support for the identifier domain transformed which can now be used in two different transformation modes
Added a new report that provides a summary of all the records that have been classified as a match during a period of time
Added full support for 2-way TLS encryption to the HL7 v2 service interface
Upgraded the embedded graph database OrientDB to the latest stable release
Developed better isolation between the notification service and the specific implementation to make it easier to support other messaging brokers in the future
Developed better integration between the PIX/PDQ service and the rest of the application to reduce the number of configuration files
Utilized a new interface in the embedded OrientDB database for smoothly shutting down the database
Made some of the operations that process record links configurable to better support sites with 10s of millions of links
Added support for both JSON and XML messages to the person-based REST API that didn't previously support JSON
Fixed the blocking service to load the latest configuration without requiring the server after configuration changes
Fixed a few minor issues with the single best record module
Added support for transitive closure of record pairs to resolve the issue with conflicting links being created with complex matching rule configurations
Added support for the findOrAdd service method to the RESTful web services interface which adds a record only if a matching one is not found in the system
Added a configuration parameter to support sites with tens of millions of links which need to be able to configure a larger block size when assigning global identifiers
Added a new transformation function which in forming a custom field extracts a value using a regular expression
Added support for the findByMatching and findByBlocking service methods through the RESTful web services interface
Added support for custom fields in thefindByMatchingand findByBlocking service methods
Fixed the deterministic matching algorithm to allow it to use distance metric parameters where present for certain metrics
Fixed the support of JSON as the input and output data format for some of the Person REST interface methods that didn't support it
Added support for auditing all events (including events to view a record) for HIPPA compliance
Added an interface to the Entity REST API to support paging through all the records in an instance
Added new transformation functions that can be used in the generation of custom fields
Added support for SQLServer as the relational database supporting OpenEMPI
Added support for MySQL as the relational database supporting OpenEMPI
Added the report artifacts as part of the distribution of the commercial edition of OpenEMPI
Improved the support for remote connections to the graph database when used in place of the default embedded mode
Fixed the support for asynchronous matching of records
Added a new global identifier generator module to support the requirements of a customer
Improved the vector configuration screen of the probabilistic matching algorithm by showing available sample record pairs per vector
Improved the performance of bulk import of data by eliminating the generation of notification events
Upgrade and improved the integration with a messaging system switching to ActiveMQ Artemis as the default JMS server
Performed technical refresh of underlying software such as the Spring framework, Hibernate, etc.
A user account can be associated with a specified domain which enables filtering of review workload to associated domain
Added support for fault-tolerance during operation of an instance of OpenEMPI through replication
Added support in the commercial edition of OpenEMPI for collecting metrics about the operation and performance of the system
Fixed an issue where generation of blocking key values generates huge blocks for single blocking field blocks with records that have blank values in the blocking field
Fixed the substring transformation function so that it does not fail if the bounds specified fall outside the range of the field value affected
Fixed a bug where an invalid field parameter used in the Entity REST API could cause a NullPointerException
Fixed a bug where extending the entity schema of the default person entity would cause the Person REST API to fail on certain queries
Fixed a bug where the export function of records from the system would be affected by the caching configuration of the export module
Fixed a bug where the bulk import of records from another instance would cause the sequence generator to get out of sync
Fixed a bug where generating a record link of match type wouldn't be persisted if a record link of probable link type already existed
[OPENEMPI-373] - Added the ability for the user to clear a field value through the Entity REST API
[OPENEMPI-372] - Bad data in a date field can cause blocking key value generation to fail and the blocking re-indexing process to end before processing every record
[OPENEMPI-371] - The substring transformation function sometimes fails if the bounds specified fall outside the range of the field value affected
[OPENEMPI-370] - In the entity REST API particular invalid values as field parameters can cause a NullPointerException
[OPENEMPI-369] - Export process times out when the filesystem on which the database is stored is very slow
[OPENEMPI-365] - Release of OrientDB incorporated in 3.3.6 broke some queries used as part of the report generation process
[OPENEMPI-364] - Added auditing of records viewed by a user on the administrative console to the log; temporary solution until auditing of viewed records is implemented in a manner consistent with the auditing of other events
[OPENEMPI-362] - Algorithm to find connected components is recursive and may run out of default stack space allocated in implementations where the system is not configured properly; algorithm was modified to operate in a non-recursive manner
[OPENEMPI-361] - Before running the "link all records" process the operation to clear existing links is not working properly for all matching algorithms
[OPENEMPI-357] - Exposed the method findPersonsById through the web service interface; it searches the repository by an identifier and returns all matching records
[OPENEMPI-352] - Editing the definition of a custom field fails with duplicate field in entity message
[OPENEMPI-351] - Sequence operations in OrientDB during concurrent updates can get out of synch and don't explicitly refresh their state with the database; caching causes the in-memory sequence to get out of synch with the database
[OPENEMPI-350] - Formatting a date for comparison purposes uses a non-thread safe call which may fail under intense workloads
[OPENEMPI-349] - When starting the database connection, the API currently used to establish initial connections uses default password instead of credentials provided
[OPENEMPI-348] - OrientDB released a new version 2.2.17 with a fix that may affect OpenEMPI in operation currently using 2.2.16
[OPENEMPI-347] Viewing linked records through the user interface's search results option, sometimes results in records that are not linked through a match link to appear in the list
[OPENEMPI-346] Under some circumstances a record may end up having multiple global identifiers. In some scenarios involving more than two records that are linked together using both match and probable links, either through an update or a link operation, a record may end up having multiple global identifiers. It appears that in a recent update to OrientDB, the symantics of the API changed such that when an edge is deleted, the vertices associated with this edge do not end up with a null value for their incoming and outgoing edge but rather end up preserving their link association to a non-existent link.
[OPENEMPI-345] Assigning global identifiers on an instance with hundred of thousands of links is slow. The process was modified to operate more efficiently in such environments.
[OPENEMPI-343] The flexible file loader uses a parsing library that is missing a null field value when it appears as the last field value in a record.
Upgraded the version of OrientDB to 2.2.16 since an issue was fixed in the 2.2.16 release that may impact the operation of OpenEMPI under certain circumstances.
[OPENEMPI-342] Blocking service indexing when the blocking service is not configured well creates block for all null values of blocking keys. This slows down the matching process in cases where the blocking value is null for many records
[OPENEMPI-341] Importing blocks of records sends notifications to interested parties and this slows down the process some with no real benefit. Disabling the generation of notifications helps improve the performance.
[OPENEMPI-340] The exporter process uses the Avro library in such a way that it is caching data from previous records into new records for some fields causing invalid data to be imported.
[OPENEMPI-336] After an import operation of records from a previous release, the record sequence is out of synch with the current record id.
[OPENEMPI-337] Update of records that generates a link of different state from existing one causes the link not to be saved. For example if through the user interface a link is updated from a probable link to a match link, the new state of the link is not saved properly.
[OPENEMPI-338] Update of blocking indexes in some cases doesn't properly create a new link block causing inefficiencies when records need to be evaluated for match status.
Upgraded to a more recent version of the underlying graph database. This upgrade provides a number of performance improvements but made it necessary to modify how records from OpenEMPI are persisted at a low level. This change to the persistence of record data requires that a user migrates their data from the 3.2.0 release to the 3.3.0 release using the export/import tools that were developed.
Upgraded to Apache Tomcat version 8.x from 7.x as the standard web application server for deploying OpenEMPI
Developed a tool for exporting data from OpenEMPI and another one for importing data. This pair of tool performs the transformation that is needed for upgrading from the 3.2.0 release (or earlier) to the 3.1.0 release. This tool is available with the commercial edition of OpenEMPI.
Extended the Web Services API to allow users to perform the workflow of importing data all the way through generating all links solely through web services calls. This feature allows customers that automate the process of linking data on a regular basis to perform the whole process programmatically without any manual intervention.
Began the process of migrating the implementation of distance metrics to a different library that is more up-to-date and continues to be maintained. The process will be completed in the next release of OpenEMPI.
[OPENEMPI-185] - Unlink person which hasviodedperson link cause exception
[OPENEMPI-279] - Merge operation does not generate update notifications
[OPENEMPI-293] - Metrics generated by the data profiler seem to be incorrect in some cases.
[OPENEMPI-301] - The find duplicate feature from the record update screen is not working
[OPENEMPI-319] - Sessions expiring are causing the PIX/PDQ to be restarted
[OPENEMPI-320] - Adding a record with an email address fails due to validation error that is not caught and break the UI
[OPENEMPI-321] - Sequence use is not working on an existing database
[OPENEMPI-323] - Issue with handling of massive import declaration
[OPENEMPI-325] - Unlinking record through the user interface doesn't properly update their global identifier
[OPENEMPI-328] - Update before global identifiers have been assigned causes NPE
Added Reporting Capabilities to the 3.2.0 release of the entity edition (Commercial Edition only). The users may now generate reports on the operation of the system. The reporting functionality was developed in an extensible manner so that new report types can be added over time. The current list of reports includes:
Data Profile Summary
Duplicate Summary Statistics
Potential Match Review Summary
Potential Match Review Detail
Added the ability to the user to be able to easily remove the global identifier assigned to all the records in the database. This feature is useful when first setting up an instance of OpenEMPI.
Enhanced the probabilistic matching algorithm with a new feature we call null scoring which improves the matching performance of the algorithm in the presence of null values in matching fields.
Added the ability to the user to be able to run the data profiling process against all records in the repository on demand instead of having to run it as a scheduled background process
Enhanced the performance oflong runningoperations for sites with millions of records. Operations such as assigning global identifiers, rebuilding the indexes of the blocking algorithms and running the matching algorithm against all record pairsdoesnot take advantage of the multiple processing nodes available in a clustered deployment of OpenEMPI (Commercial edition only). Modify the implementation of these operations to take advantages of all the nodes available on the cluster.
Added a new transformation function for custom field generation that changes the case of the associated field to have a certain case. A parameter of the transformation function specifies the case of the transformed field.
Added sequencing of transformation functions to the custom field generation process.Thisallows the user to define a custom field that is generated based onanothercustom fields which implies that transformation functions can now be composed to form much more complex functions from simpler ones.
Added the ability for the user to export all the records and links from the system. It is preferable that the export format is binary so that corruption of the file can be detected during subsequent loading of the file and ideally it should be easy to import the file for further processing in a cloud-based big data environment such as Hadoop.
The process of assigning global identifiers to all records of an instance can take a long time on an instance that has millions of records.
Modified the data profiler process when running against a file, to be able to specify the field delimiter along with the data types of the columns of the file instead of using the fixed field delimiter of the colon ':' character.
The user should be able to invoke the re-indexing of the blocking fields at the specific entity level instead of having to run the process against all entities currently defined on an instance of OpenEMPI.
IssueOPENEMPI-297: Under certain conditions, when trying to unlink two records from each other in the search result screen, the two records that are to be unlinked are not showing up side by side.
IssueOPENEMPI-298: In the search screen after searching for a record andselecingto view the list of records linked to a selected record, the selected record itself was showing up in the list.