Version 4.1.0

Released on 2020/01/15.

Note

If upgrading a cluster, you must be running CrateDB 4.0.2 or higher before you upgrade to 4.1.0.

We recommend that you upgrade to the latest 4.0 release before moving to 4.1.0.

A rolling upgrade to 4.1.0 from 4.0.2+ is supported.

Before upgrading, you should back up your data.

Table of Contents

Breaking Changes

  • Changed arithmetic operations *, + and - of types integer and bigint to throw an exception instead of rolling over from positive to negative or the other way around.
  • Changed how columns of type geo_point are being communicated to PostgreSQL clients: Before clients were told that those columns are double arrays, now they’re correctly mapped to the PostgreSQL point type. This means that applications using clients like JDBC will have to be adapted to use PgPoint. (See Geometric DataTypes in JDBC)
  • Changed the behavior of unnest to fully unnest multi dimensional arrays to their innermost type to be compatible with PostgreSQL.

Deprecations

Changes

Resiliency improvements

  • Allow user to limit the number of threads on a single shard that may be merging at once via the merge.scheduler.max_thread_count table parameter.
  • Some ALTER TABLE operations now internally invoke a single cluster state update instead of multiple cluster state updates, making it more resilient as there is no longer a window where the cluster state could be inconsistent.
  • Changed the default garbage collector that is being used from Concurrent Mark Sweep to G1GC. This should lead to shorter GC pauses.
  • Added a dynamic bulk sizing mechanism that should prevent INSERT INTO ... FROM query operations to run into out of memory errors if the individual records of a table are large.
  • Added the cluster.routing.allocation.total_shards_per_node setting.

Performance improvements

  • Optimized SELECT DISTINCT .. LIMIT n queries. On high cardinality columns this type of queries can now execute up to 200% faster and use less memory.
  • The optimizer now utilizes internal statistics to approximate the number of rows returned by various parts of a query plan. This should result in more efficient execution plans for joins.
  • Reduced recovery time by sending file-chunks concurrently. It applies only for when transport communication is secured or compressed. The number of chunks is controlled by the indices.recovery.max_concurrent_file_chunks setting.
  • Added an optimization that allows to run WHERE clauses on top of derived tables containing table functions more efficiently in some cases.
  • Allow user to control how table data is stored and accessed on a disk via the store.type table parameter and node.store.allow_mmap node setting.
  • Changed the default table data store type from mmapfs to hybridfs.

SQL Standard and PostgreSQL compatibility improvements

Window function extensions

  • Added support for the lag and lead window functions as enterprise features.
  • Added support for ROWS frame definitions in the context of window functions window definitions.
  • Added support for the named window definition. It allows a user to define a list of window definitions in the WINDOW clause that can be referenced in OVER clauses.
  • Added support for offset PRECEDING and offset FOLLOWING window definitions.

Functions and operators

  • Added support for the ALL clause for array and subquery comparisons.
  • Added a PG_GET_KEYWORDS table function.
  • Extended CONCAT to do implicit casts, so that calls like SELECT 't' || 5 are supported.
  • Added support for casting values of type object to text. This will cause the object to be converted to a JSON string.
  • Added support for casting to geo_point, geo_shape and object array data types. For example: cast(['POINT(2 3)','POINT(1 3)'] AS array(geo_point))
  • Added the PG_TYPEOF system function.
  • Added the INTERVAL datatype and extended generate_series(start, stop, [step]) to work with timestamps and the new INTERVAL type
  • Added LPAD and RPAD scalar functions.
  • Added the LTRIM and RTRIM scalar functions.
  • Added LEFT and RIGHT scalar functions.
  • Added TIMEZONE scalar function.
  • Added AT TIME ZONE syntax.
  • Added support for the operator ILIKE, the case insensitive complement to LIKE.
  • Added support for CIDR notation comparisons through special purpose operator << associated with type ip. Statements like 192.168.0.0 << 192.168.0.1/24 are true, select ip from ips_table where ip << 192.168.0.1/24 returns matching ip addresses.

New statements and clauses

Observability improvements

  • Added a failures column to the sys.snapshots table.
  • Improved the error messages that were returned if a relation or schema is not found. They now may include suggestions for similarly named tables. This should make typos more apparent and can help users figure out that they were missing double quotes in case the table names contain upper case letters.
  • Added a seq_no_stats and a translog_stats column to the sys.shards table.
  • Added new system table sys.segments which contains information about the Lucene segments of a shard.
  • Added a node column to sys.jobs_log.
  • Statements containing limits, filters, window functions or table functions will now be labelled accordingly in Jobs metrics.

Others

  • Changed the default for write.wait_for_active_shards from ALL to 1. This will improve the out of box experience as it allows a subset of nodes to become unavailable without blocking write operations. See the documentation for more details about the implications.
  • Added phonetic token filter with following encoders: metaphone, double_metaphone, soundex, refined_soundex, caverphone1, caverphone2, cologne, nysiis, koelnerphonetik, haasephonetik, beider_morse, daitch_mokotoff.
  • Removed a restriction for predicates in the WHERE clause involving partitioned by columns which could result in a failure response with the message logical conjunction of the conditions in the WHERE clause which involve partitioned columns led to a query that can't be executed.
  • Support implicit object creation in update statements. E.g. UPDATE t SET obj['x'] = 10 will now implicitly set obj to {obj: {x: 10}} on rows where obj was null.
  • Added the codec parameter to CREATE TABLE to control the compression algorithm used to store data.
  • The node argument of the REROUTE commands of ALTER TABLE can now either be the id or the name of a node.
  • Added support for the PostgreSQL array string literal notation.