How we chose the Facebook Presto SQL Parser for Crate

Author
admin
Filed under
Date
June 4, 2014

Click here to download CrateDB

Download Now »

1981 DeLorean DMC-12

Crate is an elastic SQL Data Store. We created it to provide a quick and powerful massively scalable backend for data intensive (or analytics) apps.

One of our early design choices was to use SQL as the interface language. As we started implementing it, we decided to not re-invent the wheel by writing our own SQL parser from scratch and began looking for a viable open source project with a permissive license, which could provide a parser package we can use. It wasn’t simple to find.

We thought of factoring out the Derby parser from the Apache Derby project, and then stumbled upon the Akiban SQL Parser. FoundationDB just took over the project and made it a sub-project of their product. This gave us the confidence that the project is still actively developed and probably will be in the future.

Yet after a while we discovered that the Derby code did not use “modern” Java features (such as generics) which made the code difficult to reuse. Another issue was the internal mapping of statements with a lot of integer constants and maps thereof; this was very difficult to extend and maintain. However, Derby isn’t to blame for its codebase – it has been a database solution for Java since the early days, and is therefore under constraints such as backwards compatibility etc. Crate only supports Java 7 and later, which allowed us to use all recent Java additions in our code.

We started hunting for other parser solutions. The search was over when Facebook open sourced their Presto project, in late 2013. Facebook Presto is a distributed query engine for big data. It was made to enable running fast interactive queries against large data sets.

Presto’s codebase, especially after working with the Derby parser, was like experiencing time travel. Once we stepped out of our DeLorean, we immediately saw what a modern SQL-Parser looks like. For us, the main points in favor of the Presto switch were:

Currently we’ve only integrated the Presto parser by copying it into our project. But we hope to find the time to contribute our changes to the upstream Presto project.

Back to topAll Blog