CrateDB provides an easy way to perform a cluster upgrade with zero downtime.
Zero downtime upgrades are only possible if you are using a stable version of CrateDB and are upgrading to a new patch version (see versions).
If you are upgrading to a testing version or are upgrading to a new major version or a new minor version, you must perform a full cluster restart.
To perform a rolling upgrade of a cluster, one node at a time has to be stopped
graceful stop procedure.
This procedure will disable a node, which will cause it to reject any new requests but will make sure that any pending requests are finished.
Due to the distributed execution of requests, some client requests might fail during a zero downtime upgrade.
This process will ensure a certain data availability. Which can either be
full and can be configured using SET / RESET.
full, all shards currently located on the node will be moved to the
other nodes in order to stop gracefully. Using this setting the cluster will
green the whole time.
primaries, only the primaries will be moved to other nodes. Using
this setting means that the cluster will go into the
yellow warning state
if a node that has been stopped contained replicas that are then unavailable.
none, there is no data-availability guarantee. The node will stop,
possibly leaving the cluster in the critical
red state if the node
contained a primary that has no replicas that can take over.
If you are upgrading from 0.54 to 0.55, please read the accompanying notes for the 0.55 release.
full minimum data availability is configured the cluster needs to
contain enough nodes to hold the number of replicas that are configured even if
one node is missing.
So for example if there are only two nodes in a cluster and a table has one
replica configured the
graceful stop procedure will not succeed
and abort as it won’t be possible to relocate the replicas.
If a table has a range configured as number of replicas this will take into account the upper number of replicas.
So, with two nodes and 0-1 replicas, the
graceful stop procedure will abort.
In short: for the
full graceful stop to work the following has to be true:
number_of_nodes > max_number_of_replicas + 1
primaries minimum data availability is used, take care that there
are still enough replicas in the cluster after a node has been stopped so that
a write-consistency can be guaranteed.
By default write or delete operations only succeed if a quorum (> replicas / 2 + 1) of active shards is available.
If only 1 replica is configured one active shard suffices in order for write and delete operations to succeed.
If a data node is stopped, CrateDB will wait up to one minute before replicas are re-assigned.
This is done to prevent unnecessary re-distribution among the cluster in case the node comes back online. Once a node that hold replicas comes back online they’re synced with the primaries to get up to date again.
If for some reason an upgrade requires a longer node downtime, consider increasing the 1 minute timeout by changing the unassigned.node_left.delayed_timeout setting using SET / RESET.
Before upgrading, you should back up your data.
crate process supports multiple signals (see Signal Handling) that
invoke different shutdown procedures. To stop a CrateDB process gracefully you
need to send the
sh$ kill -USR2 `cat /path/to/crate.pid`
If your system uses SysVinit (e.g. Debian Wheezy), you can also use
service crate graceful-stop.
Depending on the size of your cluster this might take a while. You might want to check your server logs to see if the graceful stop process is progressing well. In case of an error or a timeout, the node will stay up, signaling the error in its log files (or wherever you put your log messages).
Using the default settings the node will shut down by moving all primary shards off the node first. This will ensure that no data is lost. However, the cluster health will most likely turn yellow, because replicas that lived on that node will be missing.
If you want to ensure green health, you need to change the
cluster.graceful_stop.min_availability setting to
full. This will move
all shards off the node before shutting down.
Keep in mind that reallocating shards might take some time depending on the
number of shards and the amount and size of records (and/or blob data). For
that reason you should set the
timeout setting to a reasonable time. By
default the shutdown process aborts and the cluster will start distributing
shards evenly again. If you want to force a shutdown after the timeout, even if
the reallocating is not finished, you can set the
force setting to
A forced stop does not ensure the minimum data availability defined in the settings and may result in temporary or even permanent loss of data!
cluster.graceful_stop.min_availability=full there have to be
enough nodes in the cluster to move shards or else the graceful shutdown
procedure will fail!
For example, if there are 4 nodes and 3 configured replicas, there will not be enough nodes to to fulfill the required replicas.
Also, if there is not enough disk space on other nodes to move the shards to the graceful stop procedure will fail.
You can also set the
reallocate setting to
false if you want to ensure
that a node cannot be stopped if the cluster would need to move shards off the
node. This can prevent long waiting times until the shutdown process runs
into timeout during reallocation.
By default, only the
graceful stop command considers the cluster settings
described at Graceful Stop.
If you want to observe the reallocation process triggered by a
primaries graceful-stop, you can issue the following sql queries regularly.
Get the number of shards remaining on your deallocating node:
cr> SELECT count(*) as remaining_shards from sys.shards ... where _node['name'] = 'your_node_name'; +------------------+ | remaining_shards | +------------------+ | 0 | +------------------+ SELECT 1 row in set (... sec)
Get some more details about what shards are remaining on your node:
cr> SELECT schema_name as schema, table_name as "table", id, "primary", state ... FROM sys.shards ... WHERE _node['name'] = 'your_node_name' AND schema_name IN ('blob', 'doc') ... ORDER BY schema, "table", id, "primary", state; +--------+-------+----+---------+-------+ | schema | table | id | primary | state | +--------+-------+----+---------+-------+ ... SELECT ... rows in set (... sec)
In the case of
primaries availability, only the primary shards of tables
with zero replicas will be reallocated. Use this query to find out which shards
to look for:
cr> SELECT table_schema as schema, table_name as "table" ... FROM information_schema.tables ... WHERE number_of_replicas = 0 and table_schema in ('blob', 'doc') ... ORDER BY schema, "table" ; +--------+-------...+ | schema | table ...| +--------+-------...+ ... +--------+-------...+ SELECT ... rows in set (... sec)
If you observe the graceful-stop process using the admin UI, you might see the cluster turning red for a small instant when a node finally shuts down. This is due to the way the admin UI determines the cluster state.
If a query fails due to a missing node, the admin UI may falsely consider the cluster to be in a critical state.
After the node is stopped you can safely upgrade your CrateDB installation. Depending on your installation and operating system you can do it by downloading the latest tarball or just use the package manager.
Example for RHEL/YUM:
$sh yum update -y crate
If you are in doubt how to upgrade an installed package, please refer to the man pages of your operating system or package manager.
Once the upgrade process is completed you can start the CrateDB process again by either invoking the bin/crate executable from the tarball directly:
Or using the service manager of your operating system.
Example for RHEL/YUM:
sh$ service crate start
Repeat step two, three, and four for all other nodes.