Overview

WHAT is Crate?

Crate is an open source, highly scalable, shared-nothing distributed SQL database. Crate offers the scalability and performance of a modern No-SQL database with the power of Standard SQL. Crate’s distributed SQL query engine lets you use the same syntax that already exists in your applications or integrations, and have queries seamlessly executed across the crate cluster, including any aggregations, if needed.

Crate is masterless and simple to install, operate and use. It handles transactional and analytical needs in one single database. Crate has been designed from the ground up to support the huge scale of Web, mobile and IoT applications. Love containers? Crate runs perfectly as stateful container.

Core Features

Scalable

Crate is horizontally scalable, meaning that you can grow it by adding machines, without a need to re-shuffle, re-index or move data around. Its shared-nothing and masterless architecture means that all nodes are identical, scaling from single-node to multiple nodes with a simple click.

Distributed SQL & Real-time search

Crate’s distributed SQL query engine takes a standard SQL query, deconstructs it and executes it in a distributed fashion on the crate cluster (think mapreduce in real time). It then collects the results and performs aggregations, if needed, resulting in super-fast query performance, with only your original SQL query to go by. Crate offers powerful native full text search features and its distributed SQL query engine allows search and aggregations (group by…) in milliseconds.

Highly available

Loosing disks, a node, or even multiple nodes in a Crate cluster are automatically repaired. No operational effort required. Crate´s masterless, shared nothing architecture (all nodes are equal, no master, no roles) also makes scaling up and down super easy. Tweakable automatic sharding and replication allow worry-free backend operations. Self-healing after node failure and rolling updates make sure your queries run always and forever. New nodes join automatically the cluster, and the data gets balanced among available nodes. If parts of data get lost, Crate will re-create the data from the replication shards automatically.

Real-time ingestion

Most analytic workloads get ingested in batch loads, often with transactional locks and other overhead. Crate allows lock-free ingestion with massive write performance (e.g. with peaks of up to 4.5mio (IoT) inserts per second, or 40.000+ inserts per second per node on commodity hardware). It still supports millisecond-speed full text search queries, even when the writes are in action.

Any data and BLOBs

Crate´s columnar store supports both relational data, as well as nested json-documents, where all nested attributes can be included in any SQL command. In addition, Crate provides BLOB storage so you can persistently store and retrieve BLOBs – typically pictures, videos or large unstructured files -  providing a fully distributed cluster solution for BLOB storage.

Timeseries

Since Crate allows the (automatic) creation of partitions (virtual tables) of any table, it effectively gives you separate partitions under the hood, with high performance. They  can be queried, moved or deleted like a single table. This makes it perfect for timeseries data (partition by hour, day, week…) and other means of optimizing sharded data.

Geospatial

Store and query geographical information of many kinds using the geo_point and geo_shape types. For fast results use geographic indices with given precision as resolution, or run exact queries with scalar functions like intersects, within, distance.

Dynamic schemas

Contrary to many other scale-out solutions Crate´s schemas are totally flexible. You can add columns anytime without any penalty or re-indexing requirements. This is great for agile development and fast deployments.

Transactional

Crate is eventually consistent but offers transactional semantics. Crate is consistent at row level, so each row is either fully written or not. By offering read-after-write consistency we allow synchronous real time access to single records, immediately after they were written.

Example: after posting a form and having written its record, the record will be consistent and immediately available when selecting this primary key across the whole cluster. However if you want to calculate a sum (or other aggregation query) this record may not yet be included in the aggregation, but only a few milliseconds later. Even though Crate does not support ACID transactions with rollbacks etc, it offers Optimistic Concurrency Control by providing an internal versioning, that allows detection and resolution of write conflicts. Just a note: “eventually consistent” means that it is “consistent after a while” and not “perhaps it is consistent and perhaps it isn’t”. Crate is always consistent, it just may take a few milliseconds for that to happen.

Backups

Create repositories (on fs, hdfs, s3, url) to store, manage and restore snapshots. Incremental snapshots can be created anytime and represent the state of the tables in a Crate cluster at the time the Snapshot was created, and can be restored into the cluster anytime.

Open and flexible

Plugin-Architecture - Want to be faster than Crate´s roadmap, or require some application specific functionality? Expand the functionality of Crate by writing your own Plugins.

Microservices - The masterless nature of Crate´s architecture allow to run it perfect in ephemeral environments such as Docker, Kubernetes, CoreOS, Mesosphere. Crate is like ether - an omnipresent, persistent layer for your data, serving all your app containers. Scale your database with your app servers.

Use any language - With drivers provided by Crate (JDBC, Ruby, Python, PHP, ODBC etc.) and drivers from the community (Ado, Erlang etc.) you can use almost any language to work with Crate.

Open Source - Crate is fully written in Java and licensed under the Apache 2.0 License and in addition offers Enterprise licensing (SLA, Indemnification, Bug fix escalation etc).

What Can I Use Crate For?

Enterprises and startups have deployed Crate clusters to power real time analytics (take decisions as data comes in), real time dashboards (network traffic, security events), IoT-backends (sensor data, telemetry data), ad-tech (web traffic), telecom applications  (call logs, CDRs) and user-facing Web and Mobile apps (large tables with fast growing and dynamic data).

Our main use-case are:

IoT backends, realtime analytics / interactive realtime dashboards and SQL on Elasticsearch

Generally speaking Crate fits well if:

  • You require a relational SQL database with document support, highly available and horizontally scalable
  • Your applications & dashboards require fast search and aggregations in a dynamic environment with fast changing queries.
  • You need to query data in real time while writing data simultaneously.
  • You have huge amounts of data (trillions of records in hundreds of TBs)
  • Your database must be highly available never go down
  • You want to start small and scale out horizontally as you grow
  • You want to be faster, more agile and save money on licenses and hardware

Crate isn’t a good choice if you have strong consistency requirements (ACID) and very complex relational schemas (e.g. highly normalized with many tables and many joins).  

SCHEDULE A 1-ON-1 DEMO WITH A DATABASE ENGINEER

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form