CrateDB scaling introduction

CrateDB is designed to be easy to scale and adding a new node to a cluster is as simple as starting your first CrateDB instance.

If you have followed any of our installation guides you likely now have one CrateDB instance running with some test data, for the sake of this tutorial, we will assume it’s the public Twitter data.

Here is my one node CrateDB cluster, not a large amount of data, but it’s running in a Docker container on my local machine, so enough for demonstration purposes.

Open the CrateDB admin UI at SERVER_IP:4200/admin and you should see something like the following, note the NODES: 1 indicator.

A 1 Node Cluster

Let’s add a second CrateDB node to the cluster, this will depend on how you installed CrateDB, to start a new CrateDB instance, run the same command as starting your first. For example:


docker run -P -d crate \


sudo service crate start

Find our how to start nodes with other installations.

Within a couple of seconds, the second node will have joined the cluster and the NODES: count should have increased to 2. Depending on the complexity and quantity of data and hardware setup you will also notice the cluster state changes to yellow momentarily as it rebalances and resynchronizes data across the cluster.

A 2 Node Cluster

Add another node to the cluster using the same steps as above. You will notice the same happening as before, but this time quickly switch to the TABLES tab after adding the third node. Here you can see that CrateDB also provides information about how data is rebalancing at a table and shard level as well as at a cluster level.

A 3 Node Cluster

Try removing a node from the cluster by killing a CrateDB process or stopping a docker container with docker stop container_id. You will notice that again the cluster will rebalance and re-sync across the remaining nodes.

Try running a simple query on the cluster, such as:

SELECT count(*) AS quantity, user['verified']
FROM tweets
GROUP BY user['verified']
ORDER BY quantity DESC  limit 100;

Add and remove nodes to the cluster and try issuing the same query each time, you should notice no change in the result and very little difference in speed.

Related links