Crate provides an easy way to perform a cluster upgrade with zero downtime.
To perform a rolling upgrade of a cluster, one node at a time has to be stopped
graceful stop procedure.
This procedure will disable a node, which will cause it to reject any new requests but will make sure that any pending requests are finished.
feature or major versions are often not compatible and upgrading requires a full cluster restart.
The exception are hotfix updates which are always safe. For example updating from 0.48.1 to 0.48.2 or even directly to 0.48.3 is safe.
Most of the time incompatibility between feature or major versions is caused by changes in the binary serialization format which is used for communication between the nodes but might also be caused by other changes like lucene index format changes.
Please check the changelog to see if a feature or major version upgrade is safe. If there is no note indicating that the zero downtime upgrade is safe it should be considered unsafe.
Nonetheless, due to the distributed execution of requests, some client requests might fail during this procedure.
This process will ensure a certain data availability. Which can either be
full and can be configured using SET / RESET.
full, all shards currently located on the node
will be moved to the other nodes in order to stop gracefully.
Using this setting the cluster will stay
primaries, only the primaries will be moved to other nodes.
Using this setting means that the cluster will go into the
warning state if a node that has been stopped contained replicas
that are then unavailable.
none, there is no data-availability guarantee.
The node will stop, possibly leaving the cluster in the critical
if the node contained a primary that has no replicas that can take over.
full minimum data availability is configured the cluster needs to
contain enough nodes to hold the number of replicas that are configured even if
one node is missing.
So for example if there are only two nodes in a cluster and a table has one
replica configured the
graceful stop procedure will not succeed
and abort as it won’t be possible to relocate the replicas.
If a table has a range configured as number of replicas this will take into account the upper number of replicas.
So, with two nodes and a number of replicas of ‘0-1’ will cause the
graceful stop procedure to abort.
In short: In order for the
full graceful stop to work the following has to be true:
number_of_nodes > max_number_of_replicas + 1
primaries minimum data availability is used, take care
that there are still enough replicas in the cluster after a node has been
stopped so that a write-consistency can be guaranteed.
By default write or delete operations only succeed if a quorum ( > replicas / 2 + 1 ) of active shards is available.
If only 1 replica is configured one active shard suffices in order for write and delete operations to succeed.
If a data node is stopped, Crate will wait up to 1 minute before replicas are re-assigned. This is done to prevent unnecessary re-distribution among the cluster in case the node comes back online. Once a node that hold replicas comes back online they’re synced with the primaries to get up to date again.
If for some reason an upgrade requires a longer node downtime, consider increasing the 1 minute timeout by changing the unassigned.node_left.delayed_timeout setting using SET / RESET.
crate process supports multiple signals (see Signal Handling) that
invoke different shutdown procedures. To stop a Crate process gracefully you
need to send the
sh$ kill -USR2 `cat /path/to/crate.pid`
If your system uses SysVinit (e.g. Debian Wheezy), you can also use
service crate graceful-stop.
Depending on the size of your cluster this might take a while. You might want to check your server logs to see if the graceful stop process is progressing well. In case of an error or a timeout, the node will stay up, signaling the error in its log files (or wherever you put your log messages).
Using the default settings the node will shut down by moving all primary shards off the node first. This will ensure that no data is lost. However, the cluster health will most likely turn yellow, because replicas that lived on that node will be missing.
If you want to ensure green health, you need to change the
full. This will move all shards off the node before shutting down.
Keep in mind that reallocating shards might take some time depending
on the number of shards and the amount and size of records (and/or blob data).
For that reason you should set the
timeout setting to a reasonable
time. By default the shutdown process aborts and the cluster will start
distributing shards evenly again. If you want to force a shutdown after
the timeout, even if the reallocating is not finished, you can set the
force setting to
A forced stop does not ensure the minimum data availability defined in the settings and may result in temporary or even permanent loss of data!
cluster.graceful_stop.min_availability=full there have to be
enough nodes in the cluster to move shards or else the graceful shutdown
procedure will fail!
For example, if there are 4 nodes and 3 configured replicas, there will not be
enough nodes to to fulfill the required replicas.
Also, if there is not enough disk space on other nodes to move the shards to the graceful stop procedure will fail.
You can also set the
reallocate setting to
false if you want to ensure
that a node cannot be stopped if the cluster would need to move shards off the
node. This can prevent long waiting times until the shutdown process runs
into timeout during reallocation.
By default, only the
graceful stop command considers the cluster settings
described at Graceful Stop.
If you want to observe the reallocation process triggered by a
primaries graceful-stop, you can issue the following sql queries regularly.
Get the number of shards remaining on your deallocating node:
cr> SELECT count(*) as remaining_shards from sys.shards ... where _node['name'] = 'your_node_name'; +------------------+ | remaining_shards | +------------------+ | 0 | +------------------+ SELECT 1 row in set (... sec)
Get some more detail about what shards are remaining on your node:
cr> SELECT schema_name as schema, table_name as "table", id, "primary", state ... FROM sys.shards ... WHERE _node['name'] = 'your_node_name' AND schema_name IN ('blob', 'doc') ... ORDER BY schema, "table", id, "primary", state; +--------+-------+----+---------+-------+ | schema | table | id | primary | state | +--------+-------+----+---------+-------+ ... SELECT ... rows in set (... sec)
In case of
primaries availability only the primary shards of tables with
0 replicas will be reallocated. Use this query to find out which shards to look for:
cr> SELECT schema_name as schema, table_name as "table" ... FROM information_schema.tables ... WHERE number_of_replicas = 0 and schema_name in ('blob', 'doc') ... ORDER BY schema, "table" ; +--------+-------...+ | schema | table ...| +--------+-------...+ ... +--------+-------...+ SELECT ... rows in set (... sec)
If you observe the graceful-stop process using the admin-ui, you might see the cluster turning red for a small instant when a node finally shuts down. This is due to the way the admin ui determines the cluster state. If a query fails due to a missing node, the admin-ui may falsely consider the cluster to be in a critical state.
After the node is stopped you can safely upgrade your Crate installation. Depending on your installation and operating system you can do it by downloading the latest tarball or just use the package manager.
Example for RHEL/YUM:
$sh yum update -y crate
If you are in doubt how to upgrade an installed package, please refer to the man pages of your operating system / package manager.
Once the upgrade process is completed you can start the Crate process again by either invoking the bin/crate executable from the tarball directly:
or using the service manager of your operating system.
Example for RHEL/YUM:
sh$ service crate start
Repeat step 2, 3 and 4 for all other nodes.
Even the safest upgrade process might go wrong at some point. So make sure you have your data backed up!