High Level Architecture

< Back to Overview

Crate’s architecture is based on the NoSQL architecture, but features Standard SQL. It is a shared-nothing, distributed database that supports documents & relations with a dynamic schema. It is extremely simple to install and use, with auto-sharding, auto-partitioning and auto-replication. This enables realtime search & aggregations with the benefit of the ability to horizontally scale any crate deployment. It offers read-after-write consistency, balanced memory-disk usage and is ideal for microservices such as Docker. Crate is an open-source database and is licensed under Apache 2.0.

Indexing: by default, Crate indexes all fields, storing data in columns,which is highly optimized for filtering and aggregations. There are no locks needed on tables in order to add new columns or even nested objects.

Masterless: a Crate cluster is masterless with a set of equal nodes. It can be deployed anywhere: on notebooks, on-premise, private clouds and most public clouds. You can maximize cost and efficiency deploying Crate nodes on cheap consumer grade hardware/instances and still have a highly available system.

Made for Micro-Services: using official Docker containers allows you to quickly and easily deploy Crate nodes, as well as one-click-to-deploy on GCE and official Amazon images on AWS. Often Crate is used side-by-side with Hadoop clusters (storing raw-data forever), streaming hot data into Crate for ad-hoc-queries in milliseconds, using partitioning to move out old data, while adding new data and dynamically growing.

For more architecture and technical details, read this Technical Overview.


Crate has been deployed in production with the following scale:

  • Billions of inserts/updates per day (while serving user-facing realtime queries)
  • 100+ nodes (AWS and on-premise)
  • 100s TB of data
  • 1.5 Mio inserts per second (and 4.5 Mio inserts per second in a test environment)
  • Lots of queries per second for applications used by tens of thousands of users in parallel (practical example: 330.000 inserts (tweets, 1kb) per second, while serving user-facing full text search queries <400ms, running on 8 cheap commodity servers ($2.000 each) with 64GB, consumer SSD)
  • Multiple availability zones of CSPs

A little disclaimer: comparisons depend on generalizations by their very nature. Let us know if you think we didn’t get something right and get in contact.


Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form