Run CrateDB on Kubernetes

CrateDB and Docker are a great match thanks to CrateDB’s horizontally scalable shared-nothing architecture that lends itself well to containerization.

Kubernetes is an open-source container orchestration system for the management, deployment, and scaling of containerized systems.

Together, Docker and Kubernetes are a fantastic way to deploy and scale CrateDB.

Note

While Kubernetes works with a variety of container technologies, this document only covers its use with Docker.

See also

A complimentary blog post miniseries that walks you through the process of setting up your first CrateDB cluster on Kubernetes.

A lower-level introduction to running CrateDB on Docker.

A guide to scaling CrateDB on Kubernetes.

The official CrateDB Docker image.

Table of Contents

Prerequisites

This document assumes familiarity with Kubernetes.

Before continuing you should already have a Kubernetes cluster up-and-running with at least one master node and one worker node.

See also

You can use kubeadm to bootstrap a Kubernetes cluster by hand.

Alternatively, cloud services such as Azure Kubernetes Service or the Amazon Kubernetes Service can do this for you.

Managing Kubernetes

There are lots of different ways to manage a Kubernetes deployment. Which one makes sense for you will depend on your situation.

This section shows you three basic commands you can use to create and update a resource.

You can create a resource like so:

sh$ kubectl create -f crate-controller.yaml --namespace crate
statefulset.apps/crate-controller created

Here, we are creating a StatefulSet controller in the crate namespace using a configuration file named crate-controller.yaml.

You can update the resource after editing the configuration file, like so:

sh$ kubectl replace -f crate-controller.yaml --namespace crate
statefulset.apps/crate replaced

If your StatefulSet uses the default rolling update strategy, this command will restart your pods with the new configuration one-by-one.

Warning

If you use a regular replace command, pods are restarted, and any persistent volumes will still be intact.

If, however, you pass the --force option to the replace command, resources are deleted and recreated, and the pods will come back up with no data.

Configuration

This section provides four Kubernetes configuration snippets that can be used to create a three-node CrateDB cluster.

Services

A Kubernetes pod is ephemeral and so are its network addresses. Typically, this means that it is inadvisable to connect to pods directly.

A Kubernetes service allows you to define a network access policy for a set of pods. You can then use the network address of the service to communicate with the pods. The network address of the service remains static even though the constituent pods may come and go.

For our purposes, we define two services: an internal service and an external service.

Internal Service

CrateDB uses the internal service for node discovery via DNS and inter-node communication.

Here’s an example configuration snippet:

kind: Service
apiVersion: v1
metadata:
  name: crate-internal-service
  labels:
    app: crate
spec:
  # A static IP address is assigned to this service. This IP address is
  # only reachable from within the Kubernetes cluster.
  type: ClusterIP
  ports:
    # Port 4300 for inter-node communication.
  - port: 4300
    name: crate-internal
  selector:
    # Apply this to all nodes with the `app:crate` label.
    app: crate

External Service

The external service provides a stable network address for external clients.

Here’s an example configuration snippet:

kind: Service
apiVersion: v1
metadata:
  name: crate-external-service
  labels:
    app: crate
spec:
  # Create an externally reachable load balancer.
  type: LoadBalancer
  ports:
    # Port 4200 for HTTP clients.
  - port: 4200
    name: crate-web
    # Port 5432 for PostgreSQL wire protocol clients.
  - port: 5432
    name: postgres
  selector:
    # Apply this to all nodes with the `app:crate` label.
    app: crate

Note

In production, a LoadBalancer service type is typically only available on hosted cloud platforms that provide externally managed load balancers. However, an ingress resource can be used to provide internally managed load balancers.

For local development, Minikube provides a LoadBalancer service.

Controller

A Kubernetes pod is a group of one or more containers. Pods are designed to provide discrete units of functionality.

CrateDB nodes are self-contained, so we don’t need to use more than one container in a pod. We can configure our pods as a single container running CrateDB.

Pods are designed to be fungible computing units, meaning they can be created or destroyed at will. This, in turn, means that:

  • A cluster can be scaled in or out by destroying or creating pods
  • A cluster can be healed by replacing pods
  • A cluster can be rebalanced by rescheduling pods (i.e., destroying the pod on one Kubernetes node and recreating it on a new node)

However, CrateDB nodes that leave and then want to rejoin a cluster must retain their state. That is, they must continue to use the same name and must continue to use the same data on disk.

For this reason, we use the StatefulSet controller to define our cluster, which ensures that CrateDB nodes retain state across restarts or rescheduling.

The following configuration snippet defines a controller for a three-node CrateDB cluster:

kind: StatefulSet
apiVersion: "apps/v1"
metadata:
  # This is the name used as a prefix for all pods in the set.
  name: crate
spec:
  serviceName: "crate-set"
  # Our cluster has three nodes.
  replicas: 3
  selector:
    matchLabels:
      # The pods in this cluster have the `app:crate` app label.
      app: crate
  template:
    metadata:
      labels:
        app: crate
    spec:
      # InitContainers run before the main containers of a pod are
      # started, and they must terminate before the primary containers
      # are initialized. Here, we use one to set the correct memory
      # map limit.
      initContainers:
      - name: init-sysctl
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      # This final section is the core of the StatefulSet configuration.
      # It defines the container to run in each pod.
      containers:
      - name: crate
        # Use the CrateDB 3.0.5 Docker image.
        image: crate:3.0.5
        # Pass in configuration to CrateDB via command-line options.
        # Notice that we are configuring CrateDB unicast host discovery
        # using the SRV records provided by Kubernetes.
        command:
          - /docker-entrypoint.sh
          - -Ccluster.name=${CLUSTER_NAME}
          - -Cdiscovery.zen.minimum_master_nodes=2
          - -Cdiscovery.zen.hosts_provider=srv
          - -Cdiscovery.srv.query=_crate-internal._tcp.crate-internal-service.${NAMESPACE}.svc.cluster.local
          - -Cgateway.recover_after_nodes=2
          - -Cgateway.expected_nodes=${EXPECTED_NODES}
          - -Cpath.data=/data
        volumeMounts:
              # Mount the `/data` directory as a volume named `data`.
            - mountPath: /data
              name: data
        resources:
          limits:
            # How much memory each pod gets.
            memory: 512Mi
        ports:
          # Port 4300 for inter-node communication.
        - containerPort: 4300
          name: crate-internal
          # Port 4200 for HTTP clients.
        - containerPort: 4200
          name: crate-web
          # Port 5432 for PostgreSQL wire protocol clients.
        - containerPort: 5432
          name: postgres
        # Environment variables passed through to the container.
        env:
          # This is variable is detected by CrateDB.
        - name: CRATE_HEAP_SIZE
          value: "256m"
          # The rest of these variables are used in the command-line
          # options.
        - name: EXPECTED_NODES
          value: "3"
        - name: CLUSTER_NAME
          value: "my-crate"
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
  volumeClaimTemplates:
    # Use persistent storage.
    - metadata:
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

See also

CrateDB supports configuration via command-line options and node discovery via DNS.

Configure memory by hand for optimum performance.

You must set memory map limits correctly. Consult the bootstrap checks documentation for more information.

Persistent Volume

As mentioned in the Controller section, CrateDB containers must be able to retain state between restarts and rescheduling. Stateful containers can be achieved with persistent volumes.

There are many different ways to provide persistent volumes, and so the specific configuration will depend on your setup.

Microsoft Azure

You can create a StorageClass for Azure Managed Disks with a configuration snippet like this:

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  # The name chosen here can be used to create volume claims.
  name: azure-premium-managed-disk
  labels:
    storage-tier: premium
    volume-type: ssd
    addonmanager.kubernetes.io/mode: Reconcile
provisioner: kubernetes.io/azure-disk
parameters:
  kind: Managed
  storageaccounttype: Premium_LRS

You can then use this in your controller configuration with something like this:

[...]
  volumeClaimTemplates:
    - metadata:
        name: persistant-data
      spec:
        # This will create one 100GB read-write Azure Managed Disks volume
        # for every CrateDB pod.
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: azure-premium-managed-disk
        resources:
          requests:
            storage: 100g