Blob Support

Crate includes support to store binary large objects. By utilizing Crate’s cluster features the files can be replicated and sharded just like regular data.

Creating a table for blobs

Before adding blobs a blob table must be created. Lets use the crate shell crash to issue the SQL statement:

sh$ crash -c "create blob table myblobs clustered into 3 shards with (number_of_replicas=1)"
CREATE OK, 1 row affected (... sec)

Now crate is configured to allow blobs to be management under the /_blobs/myblobs endpoint.

Note

CrateDB allows to set multiple paths in the data.path setting. But in the case of blob tables only the first entry will be used.

Custom location for storing blob data

It is possible to define a custom directory path for storing blob data which can be completely different than the normal data path. Best use case for this is storing normal data on a fast SSD and blob data on a large cheap spinning disk.

The custom blob data path can be set either globally by config or while creating a blob table. The path can be either absolute or relative and must be creatable/writable by the user Crate is running as. A relative path value is relative to CRATE_HOME.

Blob data stored under this path will have the following layout:

/custom/blob/path/indices/.blob_<INDEX_NAME>/<SHARD_ID>/blobs

Note

This custom layout differs from the default layout. It doesn’t include cluster name or node number. Therefore it is important that multiple CrateDB processes do not have shared access to the same blobs.path folder. (This applies for multiple CrateDB processes on the same machines, or if blobs.path points to a shared file system like NFS)

Global by config

Just uncomment or add following entry at the crate config in order to define a custom path globally for all blob tables:

blobs.path: /path/to/blob/data

Also see Configuration.

Per blob table setting

It is also possible to define a custom blob data path per table instead of global by config. Also per table setting take precedence over the config setting. See CREATE BLOB TABLE for details.

Creating a blob table with a custom blob data path:

sh$ crash -c "create blob table myblobs clustered into 3 shards with (blobs_path='/tmp/crate_blob_data')" # doctest: +SKIP
CREATE OK, 1 row affected (... sec)

Altering a blob table

The number of replicas a blob table has can be changed using the ALTER BLOB TABLE clause:

sh$ crash -c "alter blob table myblobs set (number_of_replicas=0)"
ALTER OK, 1 row affected (... sec)

Uploading

To upload a blob the sha1 hash of the blob has to be known upfront since this has to be used as the id of the new blob. For this example we use a fancy python one-liner to compute the shasum:

sh$ python -c 'import hashlib;print(hashlib.sha1("contents".encode("utf-8")).hexdigest())'
4a756ca07e9487f482465a99e8286abc86ba4dc7

The blob can now be uploaded by issuing a PUT request:

sh$ curl -isSX PUT '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7' -d 'contents'
HTTP/1.1 201 Created
Content-Length: 0

If a blob already exists with the given hash a 409 Conflict is returned:

sh$ curl -isSX PUT '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7' -d 'contents'
HTTP/1.1 409 Conflict
Content-Length: 0

List

To list all blobs inside a blob table a SELECT statement can be used:

sh$ crash -c "select digest, last_modified from blob.myblobs"
+------------------------------------------+---------------+
| digest                                   | last_modified |
+------------------------------------------+---------------+
| 4a756ca07e9487f482465a99e8286abc86ba4dc7 | ...           |
+------------------------------------------+---------------+
SELECT 1 row in set (... sec)

Note

To query blob tables it is necessary to always specify the schema name blob.

Download

To download a blob simply use a GET request:

sh$ curl -sS '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
contents

Note

Since the blobs are sharded throughout the cluster not every node has all the blobs. In case that the GET request has been sent to a node that doesn’t contain the requested file it will respond with a 307 Temporary Redirect which will lead to a node that does contain the file.

If the blob doesn’t exist a 404 Not Found error is returned:

sh$ curl -isS '127.0.0.1:4200/_blobs/myblobs/e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e'
HTTP/1.1 404 Not Found
Content-Length: 0

To determine if a blob exists without downloading it, a HEAD request can be used:

sh$ curl -sS -I '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 200 OK
Content-Length: 8
Accept-Ranges: bytes
Expires: Thu, 31 Dec 2037 23:59:59 GMT
Cache-Control: max-age=315360000

Note

The cache headers for blobs are static and basically allows clients to cache the response forever since the blob is immutable.

Delete

To delete a blob simply use a DELETE request:

sh$ curl -isS -XDELETE '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 204 No Content
Content-Length: 0

If the blob doesn’t exist a 404 Not Found error is returned:

sh$ curl -isS -XDELETE '127.0.0.1:4200/_blobs/myblobs/4a756ca07e9487f482465a99e8286abc86ba4dc7'
HTTP/1.1 404 Not Found
Content-Length: 0

Deleting a blob table

Blob tables can be deleted similar to normal tables (again using the crate shell here):

sh$ crash -c "drop blob table myblobs"
DROP OK, 1 row affected (... sec)