Multi-model Database
CrateDB is an open source, multi-model and distributed database that offers high performance, scalability and flexibility. You can easily query data with SQL. CrateDB supports structured, semi-structured, and unstructured data types: tables, JSON, time series, vectors, geospatial data, and binary objects, simplifying data management and reducing the need for multiple databases and complex synchronizations.
Open-source multi-model database with SQL
Hyper-fast. Queries in milliseconds.
/* Table definition to store truck reports on delivery routes */
CREATE TABLE truck_report (
truck_id TEXT,
ts TIMESTAMP,
status SMALLINT,
details TEXT INDEX using fulltext with (analyzer = 'english'),
location GEO_POINT,
extra OBJECT AS (
product_ids ARRAY(TEXT)
),
PRIMARY KEY (truck_id, ts)
);
/* Table definition to store product details */
CREATE TABLE product (
product_id TEXT PRIMARY KEY,
name TEXT,
description TEXT INDEX using fulltext with (analyzer = 'english'),
dimensions OBJECT AS (
length DOUBLE PRECISION,
width DOUBLE PRECISION
),
images_urls ARRAY(TEXT),
tags ARRAY(TEXT),
image_features FLOAT_VECTOR(1024),
additional_info OBJECT(IGNORED)
);
+--------------+---------------+--------+---------------------------------------+-----------------------------------------+-------------------------------------------+
| truck_id | ts | status | details | location | extra |
+--------------+---------------+--------+---------------------------------------+-----------------------------------------+-------------------------------------------+
| truck11A28 | 1701608750000 | 12 | unknown issues detected - retry later | [13.467379929497838, 52.50462996773422] | {"product_ids": ["product1"] } |
| truck11A28 | 1701608700000 | 1 | | [13.520951000102300, 52.52981911123839] | {} |
| truck102GG | 1701608700000 | 43 | several unknown issues identified | [12.700487001203748, 52.80146022102753] | {} |
| truck3C20Z | 1701605100000 | 2 | | [12.467379929497838, 51.50462996773422] | {"product_ids": ["product7", "product1"]} |
| truck001ZB | 1701608700000 | 2 | | [13.568370029497838, 52.50462996773422] | {} |
+--------------+---------------+--------+---------------------------------------+-----------------------------------------+-------------------------------------------+
+------------+------------+-------------------------+--------------------------------+-----------------------------------------------------------------------------------+-----------------------------+------------------------------------------------------------+-----------------+
| product_id | name | description | dimensions | images_urls | tags | image_features | additional_info |
+------------+------------+-------------------------+--------------------------------+-----------------------------------------------------------------------------------+-----------------------------+------------------------------------------------------------+-----------------+
| product1 | red desk | study desk for children | {"height": 72, "width": 67.2} | ["cdn.images.com/photo/desk-up-1538412", "cdn.freeimages.com/photo/desk-1538412"] | ["red", "soft", "children"] | [-0.3, 0.555555, 0.1, 0.14, 1.0, ... , 0.0, -0.33333, 0.0] | NULL |
| product7 | teddy bear | green fluffy teddy bear | {"height": 32, "width": 14.55} | ["cdn.images.com/photo/tb-up22-1538412", "cdn.freeimages.com/photo/tb-1538412"] | ["green", "fluffy", "toy"] | [0.2, 0.21, -0.3333, -0.7, 0.0, ... , 0.25, 0.33333, 0.92] | NULL |
+------------+------------+-------------------------+--------------------------------+-----------------------------------------------------------------------------------+-----------------------------+------------------------------------------------------------+-----------------+
/* Based on image features vector, retrieve the most similar products to a given item */
SELECT name, _score
FROM product
WHERE KNN_MATCH(image_features, [0.14, 1, 0, ..., 0.555, 1], 2)
/* embedding for a pink dinosaur */
ORDER BY _score DESC;
+-----------------+-------------+
| name | _score |
+-----------------+-------------+
| purple dinosaur | 0.813950193 |
| green dinosaur | 0.785511027 |
+-----------------+-------------+
/* Get the name of the products transported by trucks that reported their location
whithin a region */
SELECT distinct truck_id, product_id, name
FROM truck_report
JOIN product ON product_id = ANY(extra['product_ids'])
WHERE WITHIN(
location,
'POLYGON(
(13.376072 52.541463,
13.443165 52.540162,
13.445887 52.504546,
13.380545 52.505848,
13.376072 52.541463)
)'
)
AND ts > (CURRENT_TIMESTAMP - INTERVAL '15 minutes');
+-------------+------------+------------+
| truck_id | product_id | name |
+-------------+------------+------------+
| truck11A28 | product1 | red desk |
| truck3C20Z | product1 | red desk |
| truck3C20Z | product7 | teddy bear |
+-------------+------------+------------+
/* Based on the truck reports, retrieve the most recent reported issues
for each truck_id */
SELECT truck_id, MAX(ts), MAX_BY(details, ts) AS issue_details
FROM truck_report
WHERE MATCH(details, 'issue')
GROUP BY truck_id
LIMIT 3;
+--------------+---------------+---------------------------------------+
| truck_id | max(ts) | issue_details |
+--------------+---------------+---------------------------------------+
| truck11A28 | 1701608750000 | unknown issues detected - retry later |
| truck102GG | 1701608700000 | several unknown issues identified |
| truck2220Z | 1701608520000 | delayed delivery - route issue |
+--------------+---------------+---------------------------------------+
/* Based on delivery truck reports, this query returns the number of reports per day
that were sent within Berlin's city center region */
SELECT date_bin('24 hours'::interval, ts, 0) AS day,
COUNT(*) AS num_reports
FROM truck_report
WHERE WITHIN(
location,
'POLYGON(
(13.362382 52.540331,
13.365300 52.498027,
13.453191 52.498549,
13.448727 52.539913,
13.362382 52.540331)
)'
)
AND ts > (CURRENT_TIMESTAMP - INTERVAL '15 days')
GROUP BY 1
ORDER BY 1 DESC;
+---------------+-------------+
| day | num_reports |
+---------------+-------------+
| 1701648000000 | 25 |
| 1701561600000 | 280 |
| 1701475200000 | 1060 |
| 1701388800000 | 2970 |
| 1701302400000 | 2844 |
| 1701216000000 | 2462 |
| 1701129600000 | 2331 |
| 1701043200000 | 2573 |
| 1700956800000 | 582 |
| 1700870400000 | 1404 |
| 1700784000000 | 2791 |
| 1700697600000 | 1953 |
| 1700611200000 | 2046 |
| 1700524800000 | 3177 |
| 1700438400000 | 2446 |
+---------------+-------------+
Multiple data models
Handle multiple data models seamlessly within the same database, including structured (table), unstructured (vector, text), semi-structured (JSON) and binary (BLOB). It can also handle multiple formats, such as geospatial, time series, and log data. All of this is natively supported by a storage engine built on top of Apache Lucene.
CrateDB's flexible data modeling allows for the collection and storage of a wide range of data types. It can accept any type of data format without needing specific development and collect data from various sources, such as enterprise application data (CRM, ERP), analytics data, streams data, sensor data, API data, and data from data lakes or other databases.
Dynamic schema
Modify and adapt data structures seamlessly with dynamic schema. Its adaptability allows data models to evolve without constraints, ensuring a seamless transition as data needs change over time.
CrateDB's versatile data modeling also enables the storage of complex objects and nested objects without human intervention. New data types and formats can be added on the fly, adapting to changing needs and requirements and eliminating the need for multiple databases to be configured for synchronization. See Dynamic Database Schemas >
SQL query language
Scalability
Scale effortlessly to meet growing demands with CrateDB's horizontal scaling capabilities. Whether handling increased data volumes or user traffic, CrateDB ensures scalability by adding more nodes or enhancing individual node resources.
Multiversion concurrency control
Experience eventually consistent and efficient data access with multi-version concurrency control. Atomic operations and durability guarantees on record level ensure that writes to documents either succeed entirely or have no effect, queries by primary key always return the latest results.
Indexing and query optimization
Optimize performance with efficient indexing in CrateDB. Its robust indexing and query optimization mechanisms, combined with automatic indexing, columnar storage, and highly efficient tree structures, enhance performance across various data models and use cases.
Data replication and distribution
Easy and seamless integrations
Integrate with various programming languages, frameworks, and tools with native SQL and the PostgreSQL Wire Protocol. Drivers and libraries are available for many programming languages, as well as a REST API.
View a sample list of integrations >
Community and support
Unlocking the Power of Multi-Model Data Management with CrateDB
In this white paper, we explore the challenges that arise when using multiple technologies to handle different data types and how CrateDB can consolidate and streamline these processes.
Modeling data in CrateDB
Learn from Karyn Azevedo, Solution Engineer at CrateDB, how to choose the right data type for your use case.
Timestamp: 5:31 - 18:37