Crate vs SQL on Hadoop

< Back to Overview

(e.g. Impala, Drill, Spark, Presto)

Crate works well side-by-side with Hadoop, actually many of the use-cases of Crate involve collecting (and long--term storing) raw data in Hadoop and ingesting a compressed data set into Crate for real time processing.

Although Crate may seem similar to SQL-on-Hadoop, in reality it is in an altogether different category. Its core design features are entirely different:

  • ‍It is an always-on database and not a “SQL translation” layer
  • It supports read and write in real time at the same time on all data
  • It allows real time adhoc queries on the full data set (all data in Crate is “hot”), contrary to solutions that just read a data-subset for real time processing, or run batch-style queries.

A general comparison is difficult as it depends a lot on requirements of the use-case. However here are some general comments:

  • ‍SQL-on-Hadoop engines are characterized by pulling (usually) a data subset from the underlying storage layer into the computing layer. While SparkSQL, as an example,  also offers powerful real time queries there is a delay that results from pulling the data, and of course only the pulled data can be processed.
  • Crate provides a REST API to query with Standard SQL, including distributed JOINs. This provides a similar convenience like SQL-on-Hadoop solutions, but on the complete data set and in real time. Because whatever you do: Hadoop-based solutions are always batch-oriented and never real time on the whole data set. Crate does not yet fully support the ANSI SQL-92 feature set, but you can ask us if a special need arises.
  • Ingestion power. The underlying HDFS of Hadoop is a serious bottleneck when writing data and making it available for queries. Crate instead is designed for massive parallel real time ingestion of data, offering read-after-write consistency. That means after a successful write, the record is consistently available via GET immediately. Often SQL-on-Hadoop solutions are read-oriented, where Crate offers read and write.
  • Despite being a distributed, scalable, always-on database Crate “feels and operates” like a simple SQL database and from a SQL-on-Hadoop point of view offers:
  • write using the same SQL interface like read
  • no need for an underlying Hadoop Cluster
  • much simpler setup and operation


A little disclaimer: comparisons depend on generalizations by their very nature. Let us know if you think we didn’t get something right and get in contact.

SCHEDULE A 1-ON-1 DEMO WITH A DATABASE ENGINEER

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form