MemSQL (as any in-memory DB) is great when your data size will fit into memory. Despite some mechanics to store overflowing data on disk, in reality this will negatively impact the performance and usability. While the raw query performance may be better with MemSql (as long as the data is in memory), Crate.io is very close to that, but provides an ideal solution for all use cases where datasets will grow over time and a cost effective solution is required for storage as well as querying by adding commodity servers to match data growth.
Some core differences:
- Crate includes powerful native full text search (Lucene powered)
- Crate has a masterless, shared nothing architecture, where MemSQL nodes have roles
- MemSQL replication is per database, Crate allows per table and per partition.
- MemSQL replication is non-transparent, Crate is fully transparent
- Durability is set globally in MemSQL. Crate’s durability can be set per table and partition.
- Restarting MemSQL can be a challenge since all data is kept in-memory plus the translog on disk. When memory is full, data gets dumped. After restart, MemSQL needs to read the translog, import all the logs and re-index the full data set. Crate’s architecture is resilient to node loss and restarts, even when the full cluster has to be restarted.
- MemSQL is transactional like Crate (read commit, documents are atomic)
- Joins: MemSQL offers similar joins as Crate. When data sets get really big (beyond physical memory) Crate´s architecture will be beneficial as it will keep its speed independent of the cluster size. Crate supports geospatial JOINS directly through its GeoJSON support / Geo Search.
- Crate has integrated blob support for images/videos etc. MemSQL does not.
- MemSQL is an In-Memory-DB, with either a columnar disk-based storage engine or an in-memory row-based storage, whereas the former operates on disk level but with the drawback of having a column-based storage layout (great for some queries but not all) and the latter runs in-memory which risks making the table read-only if no more memory is available. These facts make selecting the storage engine a crucial step even before any data is inserted. This may be quite inflexible and may impact performance. Crate holds its field caches in memory and indexes (as rows) on disk - this gives Crate almost the same performance as an in-memory DB, but with full persistence on disk for a more general workload. The field cache is always in sync and does not grow with the size of the data, only with its cardinality.
- Crate is Open Source under Apache 2.0 and is not proprietary.