Exporting Data
Exporting Data
The Export feature allows the user to export data from their instance of OpenEMPI for further processing. A user can export both the records stored in the system as well as the links that have been created to associate records with one-another.
In selecting a format for the exported file we were looking for a format that supported the following features:
- has support for a schema that describes the format of the records in the exported field
- can handle structured data (records can have one or more identifiers)
- is easy to parse and read into another system for further processing
We decided to use the Apache Avro format to serialize the exported data since it satisfies all the requirements listed above.
To export data from the system, from the Edit Entity Model screen simply select the "Export Records" option.
You will be prompted to choose whether you want to proceed or not. If you choose to continue, a job will be created to process the export request in the background. You can monitor the progress of the export job by viewing its status in the "Job Queue Entries" view. When the job completes, the system will have created three files in the "File Repository" directory of your instance (this value is specified through the file-repository-directory parameter in the mpi-config.xml configuration file).
- Records schema file: This file stores the schema definition for the format of the records in the records data file. It follows the file naming convention of "<entity name>-<date-and-time-of-creation>.avsc".
- Records data file: This file stores all the records in your OpenEMPI instance of the entity type selected for export. It follows the file naming convention of "<entity-name>-<date-and-time-of-creation>.avro".
- Links data file: This file stores all the links in your OpenEMPI instance. It follows the file naming convention of "recordLinks-<date-and-time-of-creation>.avro".
The data files can now be imported into other systems by utilizing the tools available for parsing and processing Avro files. Big data processing systems such as Apache Hadoop and Apache Spark provided support for Avro formatted files and can read the data exported from your OpenEMPI system by simply pointing your big-data application to the export files. Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be downloaded from the Apache Avro™ Releases page.
By using the Apache Avro tools software you can convert the Avro formatted data files into JSON format as follows:
java -jar avro-tools-1.8.1.jar tojson --pretty fileRepository/person-2017-08-14-06-15-41.avro { "RID" : "1", "maritalStatusCode" : null, "nationality" : null, "phoneExt" : null, "gender" : { "string" : "F" }, "deathTime" : null, "yearAndMonth" : { "string" : "1938-10" }, "birthOrder" : null, "race" : null, "birthPlace" : null, "postalCode" : { "string" : "86042" }, "ethnicGroup" : null, "motherName" : null, "prefix" : null, "deathInd" : null, "phoneAreaCode" : null, "phoneNumber" : { "string" : "2656203600" }, "country" : null, "countryCode" : null, "birthDay" : { "string" : "16" }, "city" : { "string" : "Qtas De Villa Blanca" }, "suffix" : null, "religion" : null, "address2" : { "string" : "Killarney" }, "StreetPhonetic" : { "string" : "FTSR" }, "birthMonth" : { "string" : "10" }, "birthYear" : { "string" : "1938" }, "mothersMaidenName" : null, "FamilyPhonetic" : { "string" : "PLKL" }, "state" : { "string" : "GA" }, "multipleBirthInd" : null, "test" : null, "fatherName" : null, "phoneCountryCode" : null, "email" : null, "familyName2" : null, "degree" : null, "ssn" : { "string" : "301592586" }, "middleName" : null, "language" : null, "CityPhonetic" : { "string" : "KTST" }, "DateUpdated" : null, "address1" : { "string" : "79 Fitzroy Street" }, "givenName" : { "string" : "Jake" }, "GivenPhonetic" : { "string" : "JK" }, "dateOfBirth" : { "string" : "1938-10-16T00:00:00.000-0500" }, "familyName" : { "string" : "Blackwell" }, "tfGivenName" : { "string" : "JK" }, "dateChanged" : { "string" : "2017-07-28T11:29:26.872-0400" }, "dateCreated" : "2017-07-28T10:23:34.041-0400", "dateVoided" : null, "entityVersionId" : 2, "userCreatedBy" : "admin", "userChangedBy" : { "string" : "admin" }, "userVoidedBy" : null, "recordId" : 1, "identifiers" : { "array" : [ { "identifierDomainId" : { "int" : 18 }, "identifierDomain" : "OpenEMPI", "identifierDomainName" : null, "userVoidedBy" : null, "dateCreated" : "2017-07-28T10:23:34.041-0400", "userCreatedBy" : "admin", "dateVoided" : null, "identifier" : "115cdf8b97d81314305a" }, { "identifierDomainId" : { "int" : 14 }, "identifierDomain" : "IHENA", "identifierDomainName" : null, "userVoidedBy" : null, "dateCreated" : "2017-07-28T10:23:34.041-0400", "userCreatedBy" : "admin", "dateVoided" : null, "identifier" : "rec-1662-org" } ] } }
Â
Â
Â
Â
Â