Impala, Drill, Spark, Presto)
Crate works well side-by-side with Hadoop, actually many of the use-cases of Crate involve collecting (and long--term storing) raw data in Hadoop and ingesting a compressed data set into Crate for real time processing.
Although Crate may seem similar to SQL-on-Hadoop, in reality it is in an altogether different category. Its core design features are entirely different:
- It is an always-on database and not a “SQL translation” layer
- It supports read and write in real time at the same time on all data
- It allows real time adhoc queries on the full data set (all data in Crate is “hot”), contrary to solutions that just read a data-subset for real time processing, or run batch-style queries.
A general comparison is difficult as it depends a lot on requirements of the use-case. However here are some general comments:
- SQL-on-Hadoop engines are characterized by pulling (usually) a data subset from the underlying storage layer into the computing layer. While SparkSQL, as an example, also offers powerful real time queries there is a delay that results from pulling the data, and of course only the pulled data can be processed.
- Crate provides a REST API to query with Standard SQL, including distributed JOINs. This provides a similar convenience like SQL-on-Hadoop solutions, but on the complete data set and in real time. Because whatever you do: Hadoop-based solutions are always batch-oriented and never real time on the whole data set. Crate does not yet fully support the ANSI SQL-92 feature set, but you can ask us if a special need arises.
- Ingestion power. The underlying HDFS of Hadoop is a serious bottleneck when writing data and making it available for queries. Crate instead is designed for massive parallel real time ingestion of data, offering read-after-write consistency. That means after a successful write, the record is consistently available via GET immediately. Often SQL-on-Hadoop solutions are read-oriented, where Crate offers read and write.
- Despite being a distributed, scalable, always-on database Crate “feels and operates” like a simple SQL database and from a SQL-on-Hadoop point of view offers:
- write using the same SQL interface like read
- no need for an underlying Hadoop Cluster
- much simpler setup and operation