CrateDB 4.4 is now stable and ready to use

2021-01-25, by Mathias Fußenegger

After more than seven years of development, CrateDB has matured into database that powers many IoT use cases. As adoption continues to increase, we see customers using CrateDB for new innovative projects that push it to its limits, which motivates us to improve our database further and further.

At the beginning of every release development cycle, we consider our customers' feedback and select two or three high-level items to focus on. From there, we get granular, thinking deeply about how to improve these areas and try different solutions.

For the 4.4 release, we worked on:

  • Performance and stability
  • Diagnostics
  • SQL standard and PostgreSQL compatibility

Let’s take a closer look at each one.

Performance and stability

Query improvements

The need for excellent performance and stability is the main reason our users choose CrateDB. Because of that, we continue to invest time in analyzing access patterns and innovating ways to improve performance even more.

Working with our users, we noticed that their table definitions often had a lot more columns than the tables we used in our tests. After extending our set of tests, we optimized an access pattern where users query a subset of columns from a table with many columns. This led to a performance improvement of up to 17% in some tests.

Another fix was related to the query optimizer.

CrateDB includes a query optimizer that attempts to choose an optimal execution plan for each query. In CrateDB, one of the cheapest ways to access a record is by doing a primary key lookup. A primary key lookup can be used if the WHERE clause of a query filters on the primary key columns; however, we saw that some users implemented a soft-delete pattern in their application—they had a column that indicates whether a record is deleted in the table, filtering on the primary key column and this deleted flag. This access pattern couldn't utilize the primary key lookup execution plan, and therefore, CrateDB had to resort to a more expensive execution plan.

In CrateDB 4.4, we remedied that, adding the notion of a filtered primary key lookup.

Faster replication

Sharding and built-in replication are key features of our database. Sharding allows CrateDB to scale seamlessly, while replication provides high availability.

If you're operating clusters with a very large number of nodes, you know the chances of some of them failing at some point (or the chances of you needing to take them temporarily offline for maintenance) are not small. You want to make sure you don't have under-replicated records on your cluster.

With CrateDB 4.4, we were able to include numerous improvements in this regard, many of which originate from Elasticsearch. As a result, the replication performance is now improved: after any maintenance work or node failure, your cluster will now get fully replicated faster than ever.

Diagnostic improvements

Monitoring a system is essential to ensure it is running smoothly.

In CrateDB 4.4, we exposed a whole new set of shard-related information via JMX. This allows users to monitor their system with tools like Prometheus.

SQL standard and PostgreSQL compatibility

Compatibility is at the heart of CrateDB. Performance, stability, observability -- these properties do not matter if your database doesn’t support your use case or if you don't have a client for your language of choice.

Because of that, we keep adding new features and improving our compatibility with PostgreSQL and standard SQL.

In CrateDB 4.4, we added the numeric type, which can be used at runtime to enable aggregations like sum on fields where the result wouldn't fit into the bigint range. Version 4.4 also includes  new scalar and window functions.

For a complete overview of the changes, check out the release notes.

Want to try CrateDB 4.4?

Newsletter

Stay up to date

Sign up here to keep informed about CrateDB product news,
events, how-to articles, and community update.