Scaling CrateDB on Kubernetes

CrateDB and Docker are a great match thanks to CrateDB’s horizontally scalable shared-nothing architecture that lends itself well to containerization.

Kubernetes is an open-source container orchestration system for the management, deployment, and scaling of containerized systems.

Together, Docker and Kubernetes are a fantastic way to deploy and scale CrateDB.

Note

This guide assumes you have already deployed CrateDB on Kubernetes.

Consult the Kubernetes deployment guide for more help.

See also

A guide to deploying CrateDB on Kubernetes.

The official CrateDB Docker image.

Table of contents

Kubernetes reconfiguration

You can scale your CrateDB cluster by increasing or decreasing the configured number of replica pods in your StatefulSet controller to the desired number of CrateDB nodes.

Using an imperative command

You can issue an imperative command to make the configuration change, like so:

sh$ kubectl scale statefulsets crate --replicas=4

Note

This makes it easy to scale quickly, but your cluster configuration is not reflected in your version control system.

Using version control

If you want to version control your cluster configuration, you can edit the StatefulSet controller configuration file directly.

Take this example configuration snippet:

kind: StatefulSet
apiVersion: "apps/v1"
metadata:
  # This is the name used as a prefix for all pods in the set.
  name: crate
spec:
  serviceName: "crate-set"
  # Our cluster has three nodes.
  replicas: 4
[...]

The only thing you need to change here is the replicas value.

You can then save your edits and update Kubernetes, like so:

sh$ kubectl replace -f crate-controller.yaml --namespace crate
statefulset.apps/crate replaced

Here, we’re assuming a configuration file named crate-controller.yaml and a deployment that uses the crate namespace.

If your StatefulSet uses the default rolling update strategy, this command will restart your pods with the new configuration one-by-one.

Note

If you are making changes this way, you probably want to update the CrateDB configuration at the same time. Consult the next section for details.

Warning

If you use a regular replace command, pods are restarted, and any persistent volumes will still be intact.

If, however, you pass the --force option to the replace command, resources are deleted and recreated, and the pods will come back up with no data.

CrateDB reconfiguration

CrateDB needs to be configured appropriately for the number of nodes in the CrateDB cluster.

Warning

Failing to update CrateDB configuration after a rescale operation can result in data loss.

You should take particular care if you are reducing the size of the cluster because CrateDB must recover and rebalance shards as the nodes drop out.

Clustering behavior

Note

The following only applies to CrateDB versions 3.x and below.

The discovery.zen.minimum_master_nodes setting is no longer used in CrateDB versions 4.x and above.

The discovery.zen.minimum_master_nodes setting affects metadata master election.

This setting can be changed while CrateDB is running, like so:

SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = 5

If you are using a controller configuration like the example given in the Kubernetes deployment guide, you can make this reconfiguration by altering the discovery.zen.minimum_master_nodes command option.

Changes to the Kubernetes controller configuration can then be deployed using kubectl replace as shown in the previous subsection, Using Version Control.

Caution

If discovery.zen.minimum_master_nodes is set to more than the current number of nodes in the cluster, the cluster will disband. On the other hand, a number that is too small might lead to a split-brain scenario.

Accordingly, it is important to adjust this number carefully when scaling CrateDB.

Recovery behavior

CrateDB has two settings that depend on cluster size and determine how cluster metadata is recovered during startup:

The values of these settings must be changed via Kubernetes. Unlike with clustering behavior reconfiguration, you cannot change these values using CrateDB’s runtime configuration capabilities.

If you are using a controller configuration like the example given in the Kubernetes deployment guide, you can make this reconfiguration by altering the EXPECTED_NODES environment variable and the recover_after_data_nodes command option.

Changes to the Kubernetes controller configuration can then be deployed using kubectl replace as shown in the previous subsection, Using Version Control.

Note

You can scale a CrateDB cluster without updating these values, but the CrateDB Admin UI will display node check failures.

However, you should only do this on a production cluster if you need to scale to handle a load spike quickly.