The distributed open source SQL database for machine data
What is CrateDB?
CrateDB is a distributed SQL database built on top of a NoSQL foundation. It combines the familiarity of SQL with the scalability and data flexibility of NoSQL, enabling developers to:
- Use SQL to process any type of data, structured or unstructured
- Perform SQL queries at realtime speed, even JOINs and aggregates
- Scale simply
Customers often use CrateDB to store and query machine data. This is because CrateDB makes it easy and economical to handle the velocity, volume, and diversity of machine and log data. In fact, customers have reported CrateDB ingesting millions of data points per second, while also querying terabytes of data in real-time… 20x faster than their previous database and on 75% less database hardware.
Growing a database should be easy, and it is with CrateDB. Automatic data rebalancing and a shared-nothing architecture enable you to scale simply. Just add new machines to create and grow a CrateDB cluster. There’s no need to know how to redistribute data on the cluster because CrateDB does it for you.
Distributed SQL queries,
aggregations, and search
CrateDB’s distributed SQL query engine features columnar field caches, and a more modern query planner. These give CrateDB the unique ability to perform aggregations, JOINs, sub-selects, and ad-hoc queries at in-memory speed. CrateDB also integrates native, full-text search features, which enable you to store and query structured or unstructured data together. Therefore, you no longer have to use separate SQL and Search databases to manage tabular and non-tabular data.
Even if things go wrong in your data center, CrateDB keeps running. Automatic replication of data across your cluster and rolling software updates help ensure hardware failures and scheduled maintenance do not interrupt access to data. In addition, CrateDB clusters are self healing, so when nodes are added to the cluster, CrateDB automatically loads them with data.
Real-time data ingestion
Analytic data is often loaded in batches, with transactional locks and other overhead. By contrast, CrateDB eliminates locking overhead to enable massive write performance (e.g. 40.000+ inserts per second per node on commodity hardware). Furthermore, CrateDB can deliver millisecond-speed query performance, even when writes are in action.
Any data and BLOBs
CrateDB supports both relational data, as well as nested JSON-documents. All nested JSON attributes can be included in any SQL command. In addition, CrateDB provides BLOB storage so you can store and retrieve BLOBs like pictures, videos, or large unstructured files – providing a fully distributed cluster solution for BLOB storage.
Time series analysis
Time series data is important for identifying trends and anomalies. CrateDB makes time series analysis fast and easy with automatic table partitions, which are like virtual tables that can be queried, moved or deleted. Partitioning data by time intervals delivers very fast time series query performance.
Location is important for many machine data analyses. For this reason, CrateDB can store and query geographical information using the geo_point and geo_shape types. You can control geographic index precision and resolution for faster query results, and also run exact queries with scalar functions like intersects, within, and distance.
Unlike many other SQL databases, CrateDB schemas are totally flexible. You can add columns anytime without slowing performance or downtime. This is great for agile development and fast deployment.
CrateDB is eventually consistent, but offers transactional semantics. CrateDB is consistent at the row level, so each row is either fully written or not. By offering read-after-write consistency we allow synchronous real-time access to single records, immediately after they are written.
Even though CrateDB does not support ACID transactions with rollbacks etc, it offers Optimistic Concurrency Control by providing an internal versioning, that allows detection and resolution of write conflicts.
CrateDB can save incremental snapshots of your database to storage. Snapshots contain the state of the tables in a CrateDB cluster at the time the snapshot was created, and can be restored into the cluster anytime.
Openness and flexibility
• Run CrateDB anywhere, in your data center or in the cloud
• Connect to CrateDB from most any language, SQL application or SQL BI tool
• Extend CrateDB functionality by writing your own plug ins
• Deploy CrateDB as a container on Docker, Kubernetes, or others
• Use CrateDB for free, under the Apache 2.0 open source license.
Where can you run CrateDB?
CrateDB is available in the cloud, at the edge, and on-premise to fit everyone's needs.
When is CrateDB a
good choice for you?
Enterprises and startups often use CrateDB to power real-time machine data monitoring and analytics dashboards. However, CrateDB is a good choice for any application if you require:
A horizontally scalable, relational SQL database with integrated search
The economics and ease of an open source SQL database
Fast search, aggregations, or ad-hoc queries
You need to query data in real time while writing data simultaneously
You have huge amounts of data (hundreds of terabytes)
A highly available database with zero downtime
When is CrateDB not
a good choice for you?
On the other hand, CrateDB may not be a good choice if you require:
Strong (ACID) transactional consistency
Highly normalized schemas with many tables and many joins
All the power of CrateDB in the cloud
Deploy CrateDB on Azure or AWS and experience the advantages of running a distributed database in the cloud
CrateDB vs others
Find out how CrateDB compares to databases like MongoDB, TimescaleDB or InfluxDB