Introducing Partial Exports for Faster Backups

Michael Beer
December 23, 2015

Just in time for Christmas, Crate's built-in COPY TO statement now supports "Partial Exports". With this feature it's possible to export parts of table contents to one or more files. This saves time during exporting and reduces backup size. This feature is available from Crate 0.54.0.

Use Case

In this demo I will show how to use partial exports in an effective way, we will assume the following case:

We have a large table in Crate (200 million records with 33 GB size) that stores the information of every user who visits our website. The table schema looks like this:

CREATE TABLE crate.uservisits (
   "UserAgent" STRING,
   "cCode" STRING,
   "duration" INTEGER,
   "sourceIP" STRING,
   "visitDate" TIMESTAMP

To get a feeling for the data, let's print a small set:

cr> select * from crate.uservisits limit 5;
| UserAgent                                                                                              | cCode | duration | sourceIP        |    visitDate |
| Fudybefib/5.7                                                                                          | CHE   |       10 | | 399358800000 |
| Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)                                                     | EST   |        5 |   | 240728400000 |
| Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2)                                                     | SAU   |       10 |   | 635493600000 |
| Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/525.13 (KHTML like Gecko) Chrome/xxx              | DZA   |        2 |   | 913788000000 |
| Mozilla/5.0 (Windows; U; Windows NT 5.2)AppleWebKit/525.13 (KHTML like Gecko) Version/3.1Safari/525.13 | KOR   |        8 | | 646722000000 |
SELECT 5 rows in set (0.006 sec)

If we want to backup the full table and export a huge amount of data we can use Crate's COPY TO command. This will consume lots of your valuable time and backup disk space, with running a full backup of the table in Crash taking nearly 5 minutes.

cr> copy crate.uservisits to directory '/tmp' with (compression='gzip');
COPY OK, 133970799 rows affected (252.432 sec)

Using Partial Export

Why should we save the complete table if we can just save newly inserted data from the last year? Let's use the partial export feature to backup only the newly registered users that have visited our site.

We extend the COPY TO statement with a WHERE clause that uses the same syntax as in SELECT statements. So the COPY TO statement for this use-case looks like this:

cr> copy uservisits_cj where "visitDate" <= CURRENT_TIMESTAMP and "visitDate" >= '2014-01-01' to directory '/tmp/' with (compression='gzip');
COPY OK, 3178494 rows affected (6.145 sec)

Great! The results shows that only 2% of the full table data got exported and this resulted in a much quicker backup time.

This is a simple example of using partial exports in a common use-case. We are looking forward to hearing your experiences with this new super-cool-feature.

Read more in the official documentation.

Back to topAll posts