People ask us whether Crate is “just” a SQL layer on top of Elasticsearch. Although we use Elasticsearch as a framework (eg for cluster management, node discovery & communication, …) Crate has completely replaced the Elasticsearch query engine with its own distributed SQL query engine. Crate is a database and as a result has some notable differences. Crate is a distributed, highly scalable SQL database that runs on one machine or better a cluster of machines in the cloud or on premise. Crate comes in one complete install package. It includes solid established open source components (Elasticsearch, Lucene, Netty) and extends those with added core functionalities like read/write support, SQL language, dashboard, and query console. Here are some of the differences between Crate.IO, as a database, and using Elasticsearch:
0.46 Crate fully supports array types, whereas Elasticsearch does not strictly distinguish between arrays and core types (a string type can be string or string array depending how you insert it), Crate does strictly distinguish between these and handles them differently. Therefore Crate has support for recursively guessing the inner type of arrays and storing it correctly in the internal mapping even if they are inserted into a dynamic table schema.
One of the design goals behind Crate was to be more than a database, offering a complete solution that covers blob storage as well, and offering the opportunity to replace expensive network storage solutions with cheap commodity hardware. Crateâ€™s blob storage functions include sharding, replication, and rebalancing. You can read about them here. Additionally, this can enable storing data on a self-hosted cluster rather than on the public cloud.
Elasticsearch, on the other hand, doesnâ€™t support blob storage. Typically, it is used together with GridFS or HDFS for blob storage.
Elasticsearch currently supports the HyperLogLog aggregations, whereas Crate supports accurate aggregations. Also Elasticsearch scatters the queries to all nodes, gathering the responses and doing the aggregations afterwards which results in high memory consumption on the node that is handling the client request (and so doing the aggregation).
Crate distributes the collected results to the whole cluster using a simple modulo based hashing, and as a result uses the complete memory of the cluster for merging. Think of it as some kind of distributed map/reduce.
Post-aggregation filtering is implemented in Crate, a feature Elasticsearch can only do with certain limitations (https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-bucket-selector-aggregation.html)
In Crate, different types of JOINs are supported: CROSS JOIN of two tables to generate a combination of all rows (Cartesian Product) and INNER JOIN which applies certain conditions to the joined rows – including geographical filtering! For more information see the documentation.
Crate supports the creation of partitioned tables, which transparently partition your data based on a defined column, like e.g. Hive does.
Elasticsearch supports creating aliases of a table which can be used to achieve the same, but itâ€™s up to the application developer to implement this.
COPY FROM /
TO sql statements for exporting or importing data in a JSON form, making it easily processable with common tools if needed.
With Crate its possible to update one or multiple documents using a query (
Elasticsearch only supports updates of a single document using its
_id field value.
Crate supports data insertion using a query instead of values. Elasticsearch does not support this out-of-the-box. This feature can also be used to restructure the tables data, renaming a field, changing a field’s data type or converting a normal table into a partitioned one.
Crate is shipped with an integrated open source Admin-UI, showing the cluster, node and table state, a simple â€œGetting Started” twitter import example, notifications of news and new available Crate.IO releases and also a SQL console.
Update 21.12.2015: A lot has happened since July 2014, so we have updated our blog post to show off some new features!