Run CrateDB on Docker

CrateDB and Docker are a great match thanks to CrateDB’s horizontally scalable shared-nothing architecture that lends itself well to containerization.

This document covers the essentials of running CrateDB on Docker.

Note

If you’re just getting started with CrateDB and Docker, check out our introductory guides for spinning up your first CrateDB instance.

See also

Guides for running CrateDB on Docker Cloud and Kubernetes.

The official CrateDB Docker image.

Table of contents

Quick start

Creating a cluster

To get started with CrateDB and Docker, we will create a three node cluster on your dev machine. The cluster will run on a dedicated network and will require the first two nodes, crate01 and crate02, to vote which one is the master. The third node, crate03, will simply join the cluster with no vote.

To create the user-defined network, run the command:

sh$ docker network create crate

You should then be able to see something like this:

sh$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
1bf1b7acd66f        bridge              bridge              local
51cebbdf7d2b        crate               bridge              local
5b8e6fbe9ab6        host                host                local
8baa149b6986        none                null                local

Any CrateDB container put into the crate network will be able to resolve other CrateDB containers by name. Each container will run a single node, itself identified by node name. For our purposes, container crate01 will run cluster node crate01, container crate02 will run cluster node crate02 and container crate03 will run cluster node crate03.

You can then create your first CrateDB container/node, like this:

sh$ docker run --rm -d \
      --name=crate01 \
      --net=crate \
      -p 4201:4200 \
      --env CRATE_HEAP_SIZE=2g \
      crate -Cnetwork.host=_site_ \
            -Cnode.name=crate01 \
            -Cdiscovery.seed_hosts=crate02,crate03 \
            -Ccluster.initial_master_nodes=crate01,crate02 \
            -Cgateway.expected_nodes=3  -Cgateway.recover_after_nodes=3

Breaking the command down:

  • Creates and runs a container called crate01 (–name) in detached mode (-d). The container will automatically be removed on exit (–rm), and all its internal data will be lost. If you would like to avoid this, you can mount a dedicated volume (-v) for the container (each container would need its own dedicated folder on your dev machine, see Docker Compose as reference)
  • Puts the container into the crate network and maps port 4201 on your host machine to port 4200 on the container (admin UI)
  • Defines the environment variable CRATE_HEAP_SIZE which is used by CrateDB to allocate 2G for its heap
  • Runs the command crate inside the container with parameters:
    • network.host: The _site_ value results in the binding of the CrateDB process to a site-local IP address, e.g. 192.168.0.1
    • node.name: Defining the node’s name as crate01 (used by master election)
    • discovery.seed_hosts: This parameter lists the other hosts in the cluster. The format is a comma separated list of host:port entries, where port defaults to setting transport.tcp.port. Each node must contain the name of all the other hosts in this list. Notice also that any node in the cluster might be started at any time, and thus this will lead to us seeing connection exceptions in the log files, however eventually all nodes will be running and interconnected
    • cluster.initial_master_nodes: Defines the list of master-eligible node names which will participate in the vote of the first master (first bootstrap). If this parameter was not defined, it would mean that it is expected that the node will join an already formed cluster. Beyond the first election, this parameter ceases to bear meaning
    • gateway.expected_nodes and gateway.recover_after_nodes: We have three nodes in the cluster and we want the cluster’s state to be recovered once all nodes are started

Note

If this command aborts with an error, consult the Troubleshooting section for help.

Check the node is up with docker ps and you should see something like this:

sh$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                             NAMES
f79116373877        crate               "/docker-entrypoin..."   16 seconds ago      Up 15 seconds       4300/tcp, 5432-5532/tcp, 0.0.0.0:4201->4200/tcp   crate01

You can have a look at the container’s logs in tail mode like this:

sh$ docker logs -f crate01

Note

To exit the logs view press ctrl+C.

You can visit the admin UI in your browser with this URL:

http://localhost:4201/

Select the Cluster icon from the left hand navigation, and you should see a page that lists a single node.

Let’s add the second node, crate02, to the cluster:

sh$ docker run --rm -d \
      --name=crate02 \
      --net=crate \
      -p 4202:4200 \
      --env CRATE_HEAP_SIZE=2g \
      crate -Cnetwork.host=_site_ \
            -Cnode.name=crate02 \
            -Cdiscovery.seed_hosts=crate01,crate03 \
            -Ccluster.initial_master_nodes=crate01,crate02 \
            -Cgateway.expected_nodes=3 -Cgateway.recover_after_nodes=2

Notice here that:

  • We updated the container and node name to crate02
  • We updated the port mapping, so that port 4202 on your host is mapped to 4200 on the container
  • We have set parameter discovery.seed_hosts to contain the other hosts of the cluster
  • cluster.initial_master_nodes: In our scenario, we only want nodes crate01 and crate02 to participate in the election of first master, thus we leave it unchanged
  • gateway.expected_nodes and gateway.recover_after_nodes: We have three nodes in the cluster and we want the cluster’s state to be recovered once all nodes are started

Now, if you go back to the admin UI you already have open, or visit the admin UI of the node you just created (located at http://localhost:4202/) you should see two nodes.

You can now add crate03 like this:

sh$ docker run --rm -d \
      --name=crate03 \
      --net=crate -p 4203:4200  \
      --env CRATE_HEAP_SIZE=2g \
      crate -Cnetwork.host=_site_ \
            -Cnode.name=crate03 \
            -Cdiscovery.seed_hosts=crate01,crate02 \
            -Cgateway.expected_nodes=3 -Cgateway.recover_after_nodes=2

Notice here that:

  • We updated the container and node name to crate03
  • We updated the port mapping, so that port 4203 on your host is mapped to 4200 on the container
  • We have set parameter discovery.seed_hosts to contain the other hosts of the cluster
  • cluster.initial_master_nodes: In our scenario, we only want nodes crate01 and crate02 to participate in the election of first master, thus we don’t need this setting
  • gateway.expected_nodes and gateway.recover_after_nodes: We have three nodes in the cluster and we want the cluster’s state to be recovered once all nodes are started

Success! You just created a three nodes CrateDB cluster with Docker.

Note

This is only an quick start example and you will notice some failing checks in the admin UI. For a cluster that you intend to use seriously, you should, at the very least, configure the Metadata Gateway and Discovery mechanisms.

Troubleshooting

The most common issue when running CrateDB on Docker is a failing bootstrap check because the memory map limit is too low. This can be adjusted on the host system.

If the limit cannot be adjusted on the host system, the memory map limit check can be bypassed by passing the -Cnode.store.allow_mmapfs=false option to the crate command:

sh$ docker run -d --name=crate01 \
      --net=crate -p 4201:4200 --env CRATE_HEAP_SIZE=2g \
      crate -Cnetwork.host=_site_ \
            -Cnode.store.allow_mmapfs=false

Caution

This will result in degraded performance.

You can also start a single node without any bootstrap checks by passing the -Cdiscovery.type=single-node option:

sh$ docker run -d --name=crate01 \
      --net=crate -p 4201:4200 \
      --env CRATE_HEAP_SIZE=2g \
      crate -Cnetwork.host=_site_ \
            -Cdiscovery.type=single-node

Note

This means that the node cannot form a cluster with any other nodes.

Taking it further

CrateDB settings are set using the -C flag, per the above examples.

Check out the Docker docs for more Docker specific features CrateDB can leverage.

CrateDB Shell

The CrateDB Shell, crash, is bundled with the Docker image.

If you wanted to run crash inside a user-defined network called crate and connect to three hosts named crate01, crate02, and crate03 (i.e. the example covered in the Creating a Cluster section) you could run:

$ docker run --rm -ti \
    --net=crate crate \
    crash --hosts crate01 crate02 crate03

Docker Compose

Docker’s Compose tool allows developers to configure complex Docker-based applications that can then be started with one docker-compose up command.

Read about Docker Compose specifics here.

crate01:
  image: crate
  container_name: crate01
  hostname: crate01
  ports:
    - 4201:4200
  net: crate
  volumes:
    - /tmp/crate/01:/data
  command: >
    crate -Cnetwork.host=_site_
    -Cdiscovery.seed_hosts=crate02,crate03
    -Ccluster.initial_master_nodes=crate01,crate02,crate03
    -Cgateway.expected_nodes=3 -Cgateway.recover_after_nodes=2
  environment:
    - CRATE_HEAP_SIZE=2g

crate02:
  image: crate
  container_name: crate02
  hostname: crate02
  ports:
    - 4202:4200
  net: crate
  volumes:
    - /tmp/crate/02:/data
  command: >
    crate -Cnetwork.host=_site_
    -Cdiscovery.seed_hosts=crate01,crate03
    -Ccluster.initial_master_nodes=crate01,crate02,crate03
    -Cgateway.expected_nodes=3 -Cgateway.recover_after_nodes=2
  environment:
    - CRATE_HEAP_SIZE=2g

crate03:
  image: crate
  container_name: crate03
  hostname: crate03
  ports:
    - 4203:4200
  net: crate
  volumes:
    - /tmp/crate/03:/data
  command: >
    crate -Cnetwork.host=_site_
    -Cdiscovery.seed_hosts=crate01,crate02
    -Ccluster.initial_master_nodes=crate01,crate02,crate03
    -Cgateway.expected_nodes=3 -Cgateway.recover_after_nodes=2
  environment:
    - CRATE_HEAP_SIZE=2g

In this example, we create three CrateDB instances with the ports manually allocated, a file system volume per instance, an environment variable set for the heap size, a dedicated network and a set of configuration parameters (-C). In this case, the start order of the containers is not deterministic and we want all three containers to be up and running before the election for the master is performed.

Best Practices

One container per host

For performance reasons, we strongly recommend that you only run one container per host machine.

If you are running one container per machine, you can map the container ports to the host ports so that the host acts like a native installation. You can do that like so:

$ docker run -d -p 4200:4200 -p 4300:4300 -p 5432:5432 crate \
    crate -Cnetwork.host=_site_

Persistent data directory

Docker containers are ephemeral, meaning that containers are expected to come and go, and any data inside them is lost when the container goes away. For this reason, it is required that you mount a persistent data directory on your host machine to the /data directory inside the container, like so:

$ docker run -d -v /srv/crate/data:/data crate \
    crate -Cnetwork.host=_site_

Here, /srv/crate/data is an example path, and should be replaced with the path to your host machine data directory.

See the Docker volume documentation for more help.

Custom configuration

If you want to use a custom configuration, it is strongly recommended that you mount configuration files on the host machine to the appropriate path inside the container. That way, your configuration isn’t lost when the container goes away.

Here’s how you would mount crate.yml:

$ docker run -d \
    -v /srv/crate/config/crate.yml:/crate/config/crate.yml crate \
    crate -Cnetwork.host=_site_

Here, /srv/crate/config/crate.yml is an example path, and should be replaced with the path to your host machine crate.yml file.

Troubleshooting

The official CrateDB Docker image ships with a liveness healthcheck configured.

This healthcheck will flag a problem when the CrateDB process might have crashed or hanged inside the container without terminating.

If you use Docker Swarm and you are experiencing trouble starting your Docker containers, try to deactivate the health check.

You can do that by editing your Docker Stack YAML file:

healthcheck:
  disable: true

Resource constraints

It is very important that you set resource constraints when you are running CrateDB inside Docker. Current versions of the JVM are unable to detect that they are running inside a container, and as a result, the detected limits are wrong.

Bootstrap checks

By using CrateDB with Docker, CrateDB binds by default to any site-local IP address on the system e.g. 192.168.0.1. On bootstrap this performs a number of checks during. The settings listed in Bootstrap Checks must be addressed on the Docker host system to start CrateDB successfully and going to production.

Memory

You must calculate and explicitly set the maximum memory the container can use. This is dependant on your host system, and typically, you want to make this as high as possible.

You must then calculate the appropriate heap size (typically half the container memory limit, see CRATE_HEAP_SIZE for details) and pass this to CrateDB, which in turn passes it to the JVM.

Don’t worry about configuring swap. CrateDB does not use swap.

CPU

You must calculate and explicitly set the maximum number of CPUs the container can use. This is dependant on your host system, and typically, you want to make this as high as possible.

Combined configuration

If the container should use a maximum of 1.5 CPUs and a maximum of 2 GB memory, and if the appropriate heap size should be 1 GB, you could configure everything at once like so:

$ docker run -d \
    --cpus 1.5 \
    --memory 2g \
    --env CRATE_HEAP_SIZE=1g \
    crate \
    crate -Cnetwork.host=_site_