CrateDB multi-node setup

CrateDB can run on a single node. However, in most environments, CrateDB is run as a cluster of three or more nodes.

For development purposes, CrateDB can auto-bootstrap a cluster when you run all nodes on the same host. Auto-bootstrapping requires zero configuration. However, in production environments, you must configure the bootstrapping process manually.

This guide shows you how to bootstrap a multi-node CrateDB cluster using the method of your choice.

Table of contents

Cluster bootstrapping

Single host auto-bootstrapping

If you start up several nodes (with default configuration) on a single host, they will automatically discover one another and form a cluster.

If you want to run CrateDB on your local machine for development or experimentation purposes, this is probably your best bootstrapping option.

Caution

Single host auto-bootstrapping is useful for development environments. However, for improved performance and resiliency, production CrateDB clusters should be run with one node per host machine. If you use multiple hosts, you must configure your cluster with manual bootstrapping.

Warning

If you start multiple nodes on different hosts with auto-bootstrapping enabled, you cannot, at a later point, bring those nodes together to form a single cluster–there is no way to merge CrateDB clusters without the risk of data loss.

If you have multiple nodes running on different hosts, you can check whether they have formed independent clusters by visiting the Admin UI (which runs on every node) and checking the cluster browser.

If you find yourself with multiple independent clusters and instead want to form a single cluster, follow these steps:

  1. Back up your data
  2. Shut down all the nodes
  3. Completely wipe each node by deleting the contents of the data directory under CRATE_HOME
  4. Follow the instructions in the next section (manual bootstrapping)
  5. Restart all the nodes and verify that they have formed a single cluster
  6. Restore your data

Tarball install

If you’re installing CrateDB using the tarball method, you can start a single host three-node cluster with auto-bootstrapping by following these instructions.

Unpack the tarball:

sh$ tar -xzf crate-*.tar.gz

Copy the expanded directory three times, one for each node:

sh$ cp -R crate-*/ node-01
sh$ cp -R crate-*/ node-02
sh$ cp -R crate-*/ node-03

Tip

Each directory will function as CRATE_HOME for that node

Because you want to run a multi-node cluster, you should configure the metadata gateway so that CrateDB knows how to recover its state safely. Ideally, for a three-node cluster, set gateway-expected-nodes to 3 and set gateway-recover-after-nodes to 3.

Note

Configuring the metadata gateway is a safeguarding mechanism that is useful for production clusters. It is not strictly necessary when running in development. However, the Admin UI will issue warnings if you have not configured the metadata gateway.

You can specify both settings in your configuration file, like so:

gateway:
  recover_after_nodes: 3
  expected_nodes: 3

Alternatively, you can configure this setting at startup with command-line options:

sh$ bin/crate \
    -Cgateway.expected_nodes=3 \
    -Cgateway.recover_after_nodes=3

Pick your preferred method of configuration and start up all three nodes by changing into each node directory and running the bin/crate script.

Caution

You must change into the appropriate node directory before running the bin/crate script.

When you run bin/crate, the script sets CRATE_HOME to your current directory. This directory must be the root of a CrateDB installation (e.g., node-01, node-02, or node-03).

Tip

Because you are supposed to run bin/crate as a daemon (i.e., a long-running process), the most straightforward way to run multiple nodes by hand for testing purposes is to start a new virtual console for each node.

For example:

  1. Start a virtual console. In that virtual console, change into the node-01 directory and run bin/crate. Leave this process running.

  2. Start a second virtual console. In that virtual console, change into the node2 directory and run bin/crate. Leave this process running.

  3. Start a third virtual console. In that virtual console, change into the node3 directory and run bin/crate. Leave this process running.

You should now have three concurrent bin/crate processes.

Visit the Admin UI on one of the nodes. Check the cluster browser to verify that the cluster has auto-bootstrapped with three nodes. You should see something like this:

The CrateDB Admin UI showing a multi-node cluster

Manual bootstrapping

To run a CrateDB cluster across multiple hosts, you must manually configure the bootstrapping process by telling nodes how to:

You must also configure the metadata gateway (as with auto-bootstrapping).

Discovery

With CrateDB 4.x and above, you can configure a list of nodes to seed the discovery process with the discovery.seed_hosts setting in your configuration file. This setting should contain one identifier per master-eligible node, like so:

discovery.seed_hosts:
  - node-01.example.com:4300
  - 10.0.1.102:4300
  - 10.0.1.103:4300

Alternatively, you can configure this at startup with a command-line option:

sh$ bin/crate \
        -Cdiscovery.seed_hosts=node-01.example.com,10.0.1.102,10.0.1.103

Note

You must configure every node with a list of seed nodes. Each node discovers the rest of the cluster via the seed nodes.

Tip

If you are using CrateDB 3.x or below, you can use the discovery.zen.ping.unicast.hosts setting instead of discovery.seed_hosts.

Unicast Host Discovery

Instead of configuring seed hosts manually (as above), you can configure CrateDB to fetch a list of seed hosts from an external source.

The currently supported sources are DNS, Microsoft Azure, and Amazon EC2.

Discovery via DNS

You can manage your seed hosts using DNS.

Configure the discovery.seed_providers setting in your configuration file like so:

discovery.seed_providers: srv

CrateDB will perform a DNS query using SRV records and use the results to generate a list of unicast hosts for node discovery.

Discovery on Amazon EC2

You can manage your seed hosts using the Amazon EC2 API.

Configure the discovery.seed_providers setting in your configuration file like so:

discovery.seed_providers: ec2

CrateDB will perform an Amazon EC2 API query and use the results to generate a list of unicast hosts for node discovery.

You can filter hosts based on:

Discovery on Microsoft Azure

You can manage your seed hosts using the Azure Virtual Machine API.

Configure the discovery.seed_providers setting in your configuration file like so:

discovery.seed_providers: azure

CrateDB will perform an Azure Virtual Machine API query and use the results to generate a list of unicast hosts for node discovery.

You can filter hosts based on:

Master node election

The master node is responsible for making changes to the global cluster state. The cluster elects the master node from the configured list of master-eligible nodes during master node election.

You can define the initial set of master-edible nodes with the cluster.initial_master_nodes setting in your configuration file. This setting should contain one identifier per master-eligible node, like so:

cluster.initial_master_nodes:
  - node-01.example.com
  - 10.0.1.102
  - 10.0.1.103

Alternatively, you can configure this at startup with a command-line option:

sh$ bin/crate \
        -Ccluster.initial_master_nodes=node-01.example.com,10.0.1.102,10.0.1.10

Warning

You don’t have to configure cluster.initial_master_nodes on every node. However, you must configure cluster.initial_master_nodes identically whenever you do configure it, otherwise CrateDB may form multiple independent clusters (which may result in data loss).

CrateDB requires a quorum of nodes before a master can be elected. A quorum ensures that the cluster does not elect multiple masters in the event of a network partition (also known as a split-brain scenario).

CrateDB (versions 4.x and above) will automatically determine the ideal quorum size. If you are using CrateDB versions 3.x and below, you must manually set the quorum size using the discovery.zen.minimum_master_nodes setting.

Note

For a three-node cluster, CrateDB will set the quorum size to three. Consequentially, you must declare all nodes to be master-eligible. Consult the quorum guide for detailed information about quorum size considerations.

If you configure fewer master-eligible nodes than the ideal quorum size, CrateDB will issue a warning (visible in the logs and the Admin UI).

Metadata gateway

Because you want to run a multi-node cluster, you must configure the metadata gateway so that CrateDB knows how to recover its state. For a three-node cluster, set gateway-expected-nodes to 3 and set gateway-recover-after-nodes to 3.

You can specify both settings in your configuration file, like so:

gateway:
  recover_after_nodes: 3
  expected_nodes: 3

Alternatively, you can configure this setting at startup with command-line options:

sh$ bin/crate \
    -Cgateway.expected_nodes=3 \
    -Cgateway.recover_after_nodes=3

Other settings

Cluster name

The cluster.name setting allows you to create multiple separate clusters. A node will refuse to join a cluster if the respective cluster names do not match.

By default, CrateDB sets the cluster name to crate for you.

You can override this behavior by configuring a custom cluster name using the cluster.name setting in your configuration file, like so:

cluster.name: my_cluster

Alternatively, you can configure this setting at startup with a command-line option:

sh$ bin/crate \
        -Ccluster.name=my_cluster

Node name

If you are manually bootstrapping a cluster, you must specify a list of master-eligible nodes (see below). To do this, you must refer to nodes by node name, hostname, or IP address.

By default, CrateDB sets the node name to a random value from the sys.summits table.

You can override this behavior by configuring a custom node name using the node.name setting in your configuration file, like so:

node.name: node-01

Alternatively, you can configure this setting at startup with a command-line option:

sh$ bin/crate \
        -Cnode.name=node-01

Master-eligibility

If you are manually bootstrapping a cluster, any nodes you list as master-eligible must have a node.master value of true. (This is the default value.)

Inter-node communication

By default, CrateDB nodes communicate with each other on port 4300. This port is known as the transport port, and it must be accessible from every node.

If you prefer, you can specify a port range instead of a single port number. Edit the transport.tcp.port setting in your configuration file, like so:

transport.tcp.port: 4350-4360

Tip

If you are running a node on Docker, you must configure CrateDB to publish the container’s external hostname and the external port number bound to the transport port. You can do that in your configuration file using the network.publish_host and transport.publish_port settings.

For example:

# External access
network.publish_host: node-01.example.com
transport.publish_port: 4321