CrateDB is an open source SQL database with a ground-breaking distributed design:
Customers often use CrateDB to store and query machine data. This is because CrateDB makes it easy and economical to handle the velocity, volume, and diversity of machine and log data. In fact, customers have reported CrateDB ingesting millions of data points per second, while also querying terabytes of data in real time…20x faster than their previous database and on 75% less database hardware.
Growing a database should be easy, and it is with CrateDB. Automatic data rebalancing and a shared-nothing architecture enable you to scale simply. Just add new machines to create and grow a CrateDB cluster. There’s no need to know how to redistribute data on the cluster because CrateDB does it for you.
CrateDB’s distributed SQL query engine features columnar field caches, and a more modern query planner. These give CrateDB the unique ability to perform aggregations, JOINs, sub-selects, and ad-hoc queries at in-memory speed. CrateDB also integrates native, full-text search features, which enable you to store and query structured or unstructured data together. Therefore, you no longer have to use separate SQL and Search databases to manage tabular and non-tabular data.
Even if things go wrong in your data center, CrateDB keeps running. Automatic replication of data across your cluster and rolling software updates help ensure hardware failures and scheduled maintenance do not interrupt access to data. In addition, CrateDB clusters are self healing, so when nodes are added to the cluster, CrateDB automatically loads them with data.
Analytic data is often loaded in batches, with transactional locks and other overhead. By contrast, CrateDB eliminates locking overhead to enable massive write performance (e.g. 40.000+ inserts per second per node on commodity hardware). Furthermore, CrateDB can deliver millisecond-speed query performance, even when writes are in action.
CrateDB supports both relational data, as well as nested JSON-documents All nested JSON attributes can be included in any SQL command. In addition, CrateDB provides BLOB storage so you can store and retrieve BLOBs like pictures, videos, or large unstructured files – providing a fully distributed cluster solution for BLOB storage.
Time series data is important for identifying trends and anomalies. CrateDB makes time series analysis fast and easy with automatic table partitions, which are like virtual tables that can be queried, moved or deleted. Partitioning data by time intervals delivers very fast time series query performance.
Location is important for many machine data analyses. For this reason, CrateDB can store and query geographical information using the geo_point and geo_shape types. You can control geographic index precision and resolution for faster query results, and also run exact queries with scalar functions like intersects, within, and distance.
Unlike many other SQL databases, CrateDB schemas are totally flexible. You can add columns anytime without slowing performance or downtime. This is great for agile development and fast deployment.
CrateDB is eventually consistent, but offers transactional semantics. CrateDB is consistent at the row level, so each row is either fully written or not. By offering read-after-write consistency we allow synchronous real-time access to single records, immediately after they are written.
Even though CrateDB does not support ACID transactions with rollbacks etc, it offers Optimistic Concurrency Control by providing an internal versioning, that allows detection and resolution of write conflicts.
CrateDB can save incremental snapshots of your database to storage. Snapshots contain the state of the tables in a CrateDB cluster at the time the snapshot was created, and can be restored into the cluster anytime.
On the other hand, CrateDB may not be a good choice if you require: