We love Elasticsearch for its simplicity and beauty. We have worked with this fantastic search engine since its very beginning, and we know it inside and out. In 2012 our CTO Jodok spoke in Berlin about using Elasticsearch “Quering 24 billion records in 900ms“. We liked their concept of simple setup, sharding, full text search and high availability and just wanted to have an operational database that feels like that. That’s why we started CrateDB in the first place and why Lucene/Elasticsearch is included as library in CrateDB.
The biggest difference? Elasticsearch is a search engine and CrateDB is a database.
CrateDB features its unique Distributed SQL Query Engine which extends the equivalent functions in Elasticsearch. Everything gets handled through the SQL interface in a distributed fashion. Generally, CrateDB should perform at least the speeds as Elasticsearch does, but on distributed aggregations it can be significantly faster. CrateDB not only searches data distributed, but also can compute the search result in a distributed fashion before returning it to the asking node to deliver faster performance.
CrateDB isn’t just a translating SQL commands into Elasticsearch language. It is not a layer, but a implementation as a SQL database product. The Elasticsearch query engine has been replaced with CrateDB´s Distributed SQL Query Engine, specifically a SQL-parser, analyzer, query planer and execution engine.
More details on the differences:
Use-cases we see often: companies run a RDMBS (MySQL etc) and have added Elasticsearch to be able to do high speed full text search. Or they combine their operational relational database with Elasticsearch to be able to run fast analytics. In such scenarios a migration of all the data into one single CrateDB cluster gives you best of both worlds if you prefer SQL as query language. Store relational data, documents and blobs in CrateDB and use it operationally with your apps. And in parallel use the same CrateDB cluster to run realtime analytics and dashboards using SQL. No hassle with syncing, all data is hot and highly available.
CrateDB mainly uses Elasticsearch for Cluster-State, Node-Discovery/Management,
Sharding, Replication and storing data in Lucene.