Version 4.0.0

Released on 2019/06/25.

Note

If you are upgrading a cluster, you must be running CrateDB 3.0.4 or higher before you upgrade to 4.0.0.

We recommend that you upgrade to the latest 3.3 release before moving to 4.0.0.

An upgrade to Version 4.0.0 requires a full restart upgrade.

When restarting, CrateDB will migrate indexes to a newer format. Depending on the amount of data, this may delay node start-up time.

Please consult the Upgrade Notes before upgrading.

Warning

Tables that were created prior CrateDB 3.x will not function with 4.x and must be recreated before moving to 4.x.x.

You can recreate tables using COPY TO and COPY FROM or by inserting the data into a new table.

Before upgrading, you should back up your data.

Table of Contents

Upgrade Notes

Discovery Changes

This version of CrateDB uses a new cluster coordination (discovery) implementation which improves resiliency and master election times. A new voting mechanism is used when a node is removed or added which makes the system capable of automatically maintaining an optimal level of fault tolerance even in situations of network partitions. This eliminates the need of the easily miss configured minimum_master_nodes setting. Additionally a rare resiliency failure, recorded as Repeated cluster partitions can cause cluster state updates to be lost can no longer occur.

Due to this some discovery settings are added, renamed and removed.

Old Name New Name
New, required on upgrade. cluster.initial_master_nodes
discovery.zen.hosts_provider discovery.seed_providers
discovery.zen.ping.unicast.hosts discovery.seed_hosts
discovery.zen.minimum_master_nodes Removed
discovery.zen.ping_interval Removed
discovery.zen.ping_timeout Removed
discovery.zen.ping_retries Removed
discovery.zen.publish_timeout Removed

Caution

The cluster.initial_master_nodes setting is required to be set at production (non loopback bound) clusters on upgrade, see the setting documentation for details.

Note

Only a single port value is allowed for each discovery.seed_hosts setting entry. Defining a port range as it was allowed but ignored in previous versions under the old setting name discovery.zen.ping.unicast.hosts, will be rejected.

Note

CrateDB will refuse to start when it encounters an unknown setting, like the above mentioned removed ones. Please make sure to adjust your crate.yml or CMD arguments upfront.

Breaking Changes

General

  • Renamed CrateDB data types to the corresponding PostgreSQL data types.
Current Name New Name
short smallint
long bigint
float real
double double precision
byte char
string text
timestamp timestamp with time zone
See Data types for more detailed information. The old data type names, are registered as aliases for backward comparability.
  • Changed the ordering of columns to be based on their position in the CREATE TABLE statement. This was done to improve compatibility with PostgreSQL and will affect queries like SELECT * FROM or INSERT INTO <table> VALUES (...)
  • Changed the default Column policy on tables from dynamic to strict. Columns of type object still default to dynamic.
  • Removed the implicit soft limit of 10000 that was applied for clients using HTTP.
  • Dropped support for Java versions < 11

Removed Settings

  • Removed the deprecated setting cluster.graceful_stop.reallocate.
  • Removed the deprecated http.enabled setting. HTTP is now always enabled and can no longer be disabled.
  • Removed the deprecated license.ident setting. Licenses must be set using the SET LICENSE statement.
  • Removed the deprecated license.enterprise setting. To use CrateDB without any enterprise features one should use the CrateDB Community Edition instead.
  • Removed the experimental enable_semijoin session setting. As this defaulted to false, this execution strategy cannot be used anymore.
  • Removed the possibility of configuring the AWS S3 repository client via the crate.yaml configuration file and command line arguments. Please, use the CREATE REPOSITORY statement parameters for this purpose.
  • Removed HDFS repository setting: concurrent_streams as it is no longer supported.
  • The zen1 related discovery settings mentioned in Discovery Changes.

System table changes

  • Changed the layout of the version column in the information_schema.tables and information_schema.table_partitions tables. The version is now displayed directly under created and upgraded. The cratedb and elasticsearch sub-category has been removed.
  • Removed deprecated metrics from sys.nodes:
Metric name
fs['disks']['reads']
fs['disks']['bytes_read']
fs['disks']['writes']
fs['disks']['bytes_written']
os['cpu']['system']
os['cpu']['user']
os['cpu']['idle']
os['cpu']['stolen']
process['cpu']['user']
process['cpu']['system']
  • Renamed column information_schema.table_partitions.schema_name to table_schema.
  • Renamed information_schema.columns.user_defined_type_* columns to information_schema_columns.udt_* for SQL standard compatibility.
  • Changed type of column information_schema.columns.is_generated to STRING with value NEVER or ALWAYS for SQL standard compatibility.

Removed Functionality

  • The Elasticsearch REST API has been removed.
  • Removed the deprecated ingest framework, including the MQTT endpoint.
  • Removed the HTTP pipelining functionality. We are not aware of any client using this functionality.
  • Removed the deprecated average duration and query frequency JMX metrics. The total counts and sum of durations as documented in QueryStats MBean should be used instead.
  • Removed the deprecated ON DUPLICATE KEY syntax of INSERT statements. Users can migrate to the ON CONFLICT syntax.
  • Removed the index thread-pool and the bulk alias for the write thread-pool. The JMX getBulk property of the ThreadPools bean has been renamed too getWrite.
  • Removed deprecated nGram, edgeNGram token filter and htmlStrip char filter, they are superseded by ngram, edge_ngram and html_strip.
  • Removed the deprecated USR2 signal handling. Use ALTER CLUSTER DECOMISSION instead. Be aware that the behavior of sending USR2 signals to a CrateDB process is now undefined and up to the JVM. In some cases it may still terminate the instance but without clean shutdown.

Deprecations

Changes

SQL Standard and PostgreSQL compatibility improvements

Users and Access Control

  • Mask sensitive user account information in sys.repositories for repository types: azure, s3.
  • Restrict access to log entries in sys.jobs and sys.jobs_log to the current user. This doesn’t apply to superusers.
  • Added a new Administration Language (AL) privilege type which allows users to manage other users and use SET GLOBAL. See Privileges.

Repositories and Snapshots

Performance and resiliency improvements

  • Exposed the _seq_no and _primary_term system columns which can be used for Optimistic Concurrency Control. By introducing _seq_no and _primary_term, the following resiliency issues were fixed:

  • Predicates like abs(x) = 1 which require a scalar function evaluation and cannot operate on table indices directly are now candidates for the query cache. This can result in order of magnitude performance increases on subsequent queries.

  • Routing awareness attributes are now also taken into consideration for primary key lookups. (Queries like SELECT * FROM t WHERE pk = 1)

  • Changed the circuit breaker logic to measure the real heap usage instead of the memory reserved by child circuit breakers. This should reduce the chance of nodes running into an out of memory error.

  • Added a new optimization that allows to run predicates on top of views or sub-queries more efficiently in some cases.

Others

  • Added support for dynamical reloading of SSL certificates. See Configuring the Keystore.
  • Added minimum_index_compatibility_version and minimum_wire_compatibility_version to sys.version to expose the current state of the node’s index and wire protocol version as part of the sys.nodes table.
  • Upgraded to Lucene 8.0.0, and as part of this the BM25 scoring has changed. The order of the scores remain the same, but the values of the scores differ. Fulltext queries including _score filters may behave slightly different.
  • Added a new _docid system column.
  • Added support for subscript expressions on an object column of a sub-relation. Examples: select a['b'] from (select a from t1) or select a['b'] from my_view where my_view is defined as select a from t1.