CrateDB is based on a NoSQL architecture, but features standard SQL. It is a shared-nothing, distributed database that supports documents and relationships with a dynamic schema. It is extremely simple to install and use, with auto-sharding, auto-partitioning and auto-replication. This enables realtime search & aggregations with the benefit of being able to horizontally scale any CrateDB deployment. It offers read-after-write consistency, balanced memory-disk usage and is ideal for microservices such as Docker. CrateDB is an open-source database and is licensed under Apache 2.0.
Indexing: by default, CrateDB indexes all fields, storing data in columns, which are highly optimized for filtering and aggregations. There are no locks needed on tables in order to add new columns or even nested objects.
Masterless: a CrateDB cluster is masterless with a set of equal nodes. It can be deployed anywhere: on notebooks, on-premises, private clouds and public clouds. You can maximize cost and efficiency deploying CrateDB nodes on inexpensive, commodity servers and still have a fast, highly available system.
Made for Microservices: using official Docker containers allows you to quickly and easily deploy CrateDB nodes. CratedDB is also available via one-click-to-deploy on GCE and official Amazon images on AWS.
For more architecture and technical details, read this Technical Overview.
CrateDB has been deployed in production with the following scale:
- Billions of inserts/updates per day (while serving user-facing realtime queries)
- 100+ nodes (AWS and on-premises)
- 100s of terabytes (TB) of data
- 1.5 million inserts per second (40,000 per second, per CrateDB node)
- Lots of queries per second for applications used by tens of thousands of users in parallel (practical example: 330.000 inserts (1KB each) per second, while serving user-facing, full-text search queries <400ms, running on 8 commodity servers ($2.000 each) with 64GB, consumer SSD)
- Multiple availability zones of cloud service providers