Running OrientDB in Distributed Mode
Running OrientDB in Distributed Mode
This section describes how to run OpenEMPI with OrientDB in distributed mode.
The configuration we will setup to demonstrate the process of configuring OrientDB to run in distributed mode in support of OpenEMPI will consist of two instances of the database where one is running as the primary instance and the second one as the replica.
The first step in this configuration is to install OpenEMPI on two different machines with the exact some configuration on both machines. The recommended approach is to first setup one machine with OpenEMPI, load the initial dataset that will populate your instance, configure and fine tune the algorithms (blocking, matching, field comparison, etc.) and migrate to the distributed configuration before going to operational mode. Once you have copied all the files of the instance from the primary machine to the machine hosting the replica, you need to apply the following configuration changes
1. Modify the setenv.sh file to enable distributed mode by adding the following parameter to the VMPARAMS environment variable
-Ddistributed=true
2. Name each of the instances in the orientdb-server-config.xml which should reside in the conf directory under the OPENEMPI_HOME directory. Each instance should have a unique nodeName. In the example below we set the node name on the primary instance to openempi1 and the node name on the replica to openempi2. You can use any names you like as long as they are unique to the instance.
... <handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin"> <parameters> <parameter name="nodeName" value="openempi1" /> ...
... <handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin"> <parameters> <parameter name="nodeName" value="openempi2" /> ...
3. Adjust the port listeners on each instance to match the IP address of the host each instance is running on. This also involves a change in the orientdb-server-config.xml file. In the example below the primary instance is running on host 192.168.77.21 and the replica on host 192.168.77.22.
... <listeners> <listener protocol="binary" ip-address="192.168.77.21" port-range="2424-2430" socket="default"/> <listener protocol="http" ip-address="192.168.77.21" port-range="2480-2490" socket="default"> <parameters> <!-- Connection's custom parameters. If not specified the global configuration ...
... <listeners> <listener protocol="binary" ip-address="192.168.77.22" port-range="2424-2430" socket="default"/> <listener protocol="http" ip-address="192.168.77.22" port-range="2480-2490" socket="default"> <parameters> <!-- Connection's custom parameters. If not specified the global configuration ...
4. In the next step we configure the discovery process for the distributed OrientDB instances to match the configuration of your network. This involves configuring the hazelcast.xml file in the conf directory under your OPENEMPI_HOME directory. The only changes we made were to configure the trusted-interfaces parameter to match the network on which the two instances are running and adjusted the interface parameter to specify which interface should be used for instance discovery. This is important for host that have multiple network interfaces.
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.0.xsd" xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <group> <name>orientdb</name> <password>orientdb</password> </group> <network> <port auto-increment="true">2434</port> <join> <multicast enabled="true"> <multicast-group>235.1.1.1</multicast-group> <multicast-port>2434</multicast-port> <trusted-interfaces> <interface>192.168.77.*</interface> </trusted-interfaces> </multicast> <tcp-ip enabled="false"> <member>192.168.1.245</member> <member>192.168.1.57</member> </tcp-ip> </join> <interfaces enabled="true"> <interface>192.168.77.21</interface> </interfaces> </network> <executor-service> <pool-size>16</pool-size> </executor-service> </hazelcast>
5. The last configuration change is to specify the distributed mode for each of the two instances. This involves changing the default-distributed-db-config.json file found in the same directory as the other two files.
{ "autoDeploy": true, "hotAlignment": true, "executionMode": "synchronous", "readQuorum": 1, "writeQuorum": "majority", "readYourWrites": true, "newNodeStrategy": "dynamic", "servers": { "openempi1": "MASTER", "openempi2": "REPLICA" }, "clusters": { "internal": { }, "index": { }, "*": { "servers": ["<NEW_NODE>"] } } }