Optimization

Table of contents

Introduction

In CrateDB every table (or if partitioned every partition) consists of segments. When inserting/deleting/updating data new segments are created following as an append-only strategy, which gives the advantage of fast writes but on the other hand can result into a big number of segments. As the number of segments increases the read operations become slower since more segments need to be visited. Moreover each segment consumes file handles, memory and CPU. CrateDB solves this problem by merging segments automatically in the background. Small segments are merged into bigger segments, which, in turn, are merged into even bigger segments. Furthermore any deleted rows and documents are not copied to the new bigger segment during this process.

If required one or more tables or table partitions can be optimized explicitly in order to improve performance. A few parameters can also be configured for the optimization process, like the max number of segments you wish to have when optimization is completed, or if you only wish to merge segments with deleted data, etc. See OPTIMIZE for detailed description of parameters.

cr> OPTIMIZE table locations;
OPTIMIZE OK, 1 row affected (... sec)

Note

System tables cannot be optimized.

Multiple table optimization

If needed, multiple tables can be defined comma-separated in a single SQL request. The result message is printed if the request on every given table is completed.

cr> OPTIMIZE TABLE locations, parted_table;
OPTIMIZE OK, 2 rows affected (... sec)

Note

If one or more tables or partitions do not exist, none of the given tables/partitions are optimized and an error is returned. The error returns only the first non-existent table/partition.

Partition optimization

Additionally it is possible to define a specific PARTITION of a partitioned table which should be optimized (see Partitioned tables).

By using the PARTITION clause in the optimize statement a separate request for a given partition can be performed. That means that only specific partitions of a partitioned table are optimized. For further details on how to create an optimize request on partitioned tables see the SQL syntax and its synopsis (see OPTIMIZE).

cr> OPTIMIZE TABLE parted_table PARTITION (day='2014-04-08');
OPTIMIZE OK, 1 row affected (... sec)

In case the PARTITION clause is omitted all partitions will be optimized. If a table has many partitions this should be avoided due to performance reasons.

Segments upgrade

In case that some or all of the segments of a table or a table partition are created with an older version of the storage engine, then with the use of OPTIMIZE, these segments can be upgraded to the current version of the storage engine.

cr> OPTIMIZE TABLE locations WITH (upgrade_segments=true);
OPTIMIZE OK, 1 row affected (... sec)