As ClearVoice was preparing for its launch, it was looking for a backend database for its application. It had used MySQL as its development database, but since the amount of data being stored was about to grow immensely, ClearVoice planned to de-normalize its data schema and store it in a distributed fashion, using MongoDB and Elasticsearch, Solr or another Lucene-based solution.
“We didn’t want to use a sharded and clustered MySQL database or Hadoop since maintaining it would have been labor intensive and use up our engineering resources”, says Jeff Nappi, ClearVoice’s Director of Engineering.
Jeff then discovered CrateDB in Hacker News and decided to check it out, especially since ClearVoice was using the SQLAlchemy ORM for Python and that is an official CrateDB Python driver.
“I started looking at CrateDB and was impressed by the quality of the code. We submitted the pull request and got a response over the weekend. Switching from MySQL to CrateDB took a couple of days. Denormalizing the data was simple as we had already planned to flatten out the data beforehand and weren’t using many JOINs. It was very convenient to integrate CrateDB into our existing code base even though it was written for a different database”.
Jeff also mentions that the ability to use SQL as CrateDB’s interface language was an added benefit since SQL reduces the learning curve when developers need to interface with distributed systems and was much simpler than writing complex queries in JSON. “The fact that CrateDB uses SQL lowers the barriers to entry when using distributed search. And on top of that, with CrateDB you can replace MongoDB and Elasticsearch with one scalable package.”
Today, CrateDB holds 20 million records and is expected to grow into billions in the near future. Its primary function is the search index powering ClearVoice.