Version 3.0.0

Released on 2018/05/16.

Note

If you are upgrading a cluster, you must be running CrateDB 2.0.4 or higher before you upgrade to 3.0.0.

We recommend that you upgrade to the latest 2.3 release before moving to 3.0.0.

You cannot perform a rolling upgrade to this version. Any upgrade to this version will require a full restart upgrade.

When restarting, CrateDB will migrate indexes to a newer format. Depending on the amount of data, this may delay node start-up time.

Please consult the Upgrade Notes before upgrading.

Warning

Tables that were created prior to upgrading to CrateDB 2.x will not function with 3.0 and must be recreated before moving to 3.0.x.

You can recreate tables using COPY TO and COPY FROM while running a 2.x release into a new table, or by inserting the data into a new table.

Before upgrading, you should back up your data.

Table of contents

Changelog

Breaking Changes

  • Dropped support for tables that have been created with CrateDB prior to version 2.0. Tables which require upgrading are indicated in the cluster checks, including visually shown in the Admin UI, if running the latest 2.2 or 2.3 release. The upgrade of tables needs to happen before updating CrateDB to this version. This can be done by exporting the data with COPY TO and importing it into a new table with COPY FROM. Alternatively you can use INSERT with query.

  • Data paths as defined in path.data must not contain the cluster name as a folder. Data paths which are not compatible with this version are indicated in the node checks, including visually shown in the Admin UI, if running the latest 2.2 or 2.3 release.

  • The region setting for CREATE REPOSITORY has been removed. It is automatically inferred but can still be manually specified by using the endpoint setting.

  • Store level throttling settings indices.store.throttle.* have been removed.

  • The gateway recovery table setting recovery.initial_shards has been removed. Nodes will recover their unassigned local primary shards immediately after restart.

  • The discovery setting discovery.type has been removed. To enable EC2 discovery, the discovery.zen.hosts_provider setting must be set to ec2.

  • Dropped support for reading AWS credentials used for S3 and EC2 discovery from environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as well as Java system properties aws.accessKeyId and aws.secretKey.

  • EC2 cloud.aws.* settings have been renamed to discovery.ec2.*.

  • The setting that controls system call filters bootstrap.seccomp has been has been renamed to bootstrap.system_call_filter.

  • The columns number_of_shards, number_of_replicas, and self_referencing_column_name in information_schema.tables changed to return NULL for non-sharded tables.

  • Adapted queries in the Admin UI to be compatible with CrateDB 3.0 and greater.

  • For HTTP authentication, support was dropped for the X-User header, used to provide a username, which has been deprecated in 2.3.0. in favour of the standard HTTP Authorization header.

  • The error_trace GET parameter of the HTTP endpoint only allows true and false in lower case. Other values are not allowed any more and will result in a parsing exception.

  • The _node column on sys.shards and sys.operations has been renamed to node, is now visible by default and has been trimmed to only include node['id'] and node['name']. In order to get all information a join query can be used with sys.nodes.

Changes

  • CrateDB is now based on Elasticsearch 6.1.4 and Lucene 7.1.0.

  • Multiple Admin UI improvements.

  • Added a new tab for views in the Admin UI which lists available views and their properties.

  • Updated the bundled CrateDB Shell (crash) to version 0.24.0 which adds support for default schema for connections.

  • Added support in the PostgreSQL Wire Protocol’s SimpleQuery mode to process a query string which contains multiple queries delimited by semicolons.

  • Added support for DEALLOCATE statement which is used by certain PostgreSQL Wire Protocol clients (e.g. libpq) to deallocate a prepared statement and release its resources.

  • Added support for ordering on analysed columns and partition columns.

  • Added support for views which can be created using the new CREATE VIEW statement and dropped using the DROP VIEW statement. Views are listed in information_schema.views and they show up in information_schema.tables as well as information_schema.columns.

  • Enterprise: Added the VIEW privilege class which can be used to grant/deny access to views.

  • Added support for INSERT INTO ... ON CONFLICT DO NOTHING. The statement ignores insert values which would cause duplicate keys.

  • Added support for ON CONFLICT clause in insert statements. INSERT INTO ...  ON CONFLICT (pk_col) DO UPDATE SET col = val is identical to INSERT INTO ... ON DUPLICATE KEY UPDATE col = val. The special EXCLUDED table can be used to refer to the insert values: INSERT INTO ... ON CONFLICT (pk_col) DO UPDATE SET col = EXCLUDED.col

  • DEPRECATED: The ON DUPLICATE KEY UPDATE clause has been deprecated in favor of the ON CONFLICT DO UPDATE SET clause.

  • Implemented the Block Hash Join algorithm which is now used for Equi-Joins.

  • Added new sys.health system information table to expose the health of all tables and table partitions.

  • Added new cluster.routing.allocation.disk.watermark.flood_stage setting, that controls at which disk usage indices should become read-only to prevent running out of disk space. There is also a new node check that indicates whether the threshold is exceeded.

  • Added a new bengali language analyzer and a bengali_normalization token filter.

  • Add max_token_length parameter to whitespace tokenizer.

  • Added new tokenizers simple_pattern and simple_pattern_split which allow to tokenize text for the fulltext index by a regular expression pattern.

  • Added support for CSV file inputs in COPY FROM statements. Input type is inferred using the file’s extension or can be set using the optional WITH clause and specifying the format.

  • Fully qualified column names including a schema name will no longer match on table aliases.

  • The default user if enterprise is disabled changed from null to crate. This causes entries in sys.jobs to show up with crate as username. Functions like user will also return crate if enterprise is enabled but the user module is not available.

  • Display the node information (name and id) of jobs in the sys.jobs table.

  • Changed the primary key constraints of the information schema tables table_constraints, referential_constraints, table_partitions, key_column_usage, columns, and tables to be SQL compliant.

  • Arrays can now contain mixed types if they’re safely convertible. JSON libraries tend to encode values like [0.0, 1.2] as [0, 1.2], this caused an error because of the strict type match we enforced before.

  • Implemented constraint_schema and table_schema in information_schema.key_column_usage correctly and documented the full table schema.

  • Statistics for jobs and operations are enabled by default. If you don’t need any statistics, please set stats.enabled to false.

  • Changed BEGIN and SET SESSION to no longer require DQL permissions on the CLUSTER level.

  • Added epoch argument to the EXTRACT function which returns the number of seconds since Jan 1, 1970. For example: extract(epoch from '1970-01-01T00:00:01') returns 1.0 seconds.

  • Enable logging of JVM garbage collection times that help to debug memory pressure and garbage collection issues. GC log files are stored separately to the standard CrateDB logs and the files are log-rotated.

  • CrateDB will now by default create a heap dump in case of a crash caused by an out of memory error. This makes it necessary to account for the additional disk space requirements.

  • Implemented a Ready node status JMX metric expressing if the node is ready for processing SQL statements.

  • Implemented a NodeInfo JMX MBean to expose useful information (id, name) about the node.

  • Fixed path of log file name in rotation pattern in log4j2.properties. It now writes into the correct logging directory instead of the parent directory.

  • ALTER TABLE <name> OPEN will now wait for all shards to become active before returning to be consistent with the behaviour of other statements.

  • Added note about the newly available JMX HTTP Exporter to the monitoring documentation section.

  • The first argument (field) of the EXTRACT function has been limited to string literals and identifiers, as it was documented.

Upgrade Notes

Configuration Changes

There are a few configuration changes that you should be aware of before restarting the nodes.

Removed Settings

  • All store level throttle settings (under indices.store.throttle.*) have been removed, and should be removed from your node configuration.

  • Similarly, the recovery.initial_shards configuration option has been removed, and should also be removed from your configuration.

Renamed Settings

  • The discovery.type setting which was previously used to specify whether a cluster should use DNS discovery or the EC2 API, has been removed. Configuring the use of the EC2 API has now been moved to the discovery.zen.hosts_provider setting.

  • The bootstrap.seccomp setting, which controls system call filters, has been renamed to bootstrap.system_call_filter.

Altered Settings

  • The path.data setting specifies the path or paths where the CrateDB node should store its table data and cluster metadata.

    In CrateDB 3.0.0 and later, this path must not contain the cluster name as a directory. For example, if you have set cluster.name: abcdef, the setting path.data: /mnt/abcdef/data would be incompatible. Moving or renaming the directory, such as to /mnt/data, and altering your path.data setting accordingly will allow you to continue using the node’s data.

    Data paths that are incompatible with 3.0.0 will be indicated visually in the Admin UI if you are running the latest 2.2.x or 2.3.x release.

Other Changes

  • The CREATE REPOSITORY statement for creating backup repositories has been changed.

    Previously, when using Amazon S3 for backup storage, bucket regions had to be configured explicitly. Bucket regions are now inferred automatically.

    If you want to override this, you can use the endpoint parameter.

  • Previously, the X-User HTTP header could be used to provide a username. This head is now deprecated in favour of the standard HTTP Authorization header.

  • The _node column in the sys.shards and sys.operations tables has been renamed to node.

    Additionally, node object now only includes id and name of the node, i.e. node['id'] and node['name'].

    To get the full node information, use node['id'] to join the sys.nodes table.