COPY TO

Export table contents to files on crate node machines.

Synopsis

COPY table_ident [ PARTITION ( partition_column = value [ , ... ] ) ]
                 [ ( column [ , ...] ) ]
                 [ WHERE condition ]
                 TO [DIRECTORY] output_uri
                 [ WITH ( copy_parameter [= value] [, ... ] ) ]

Description

The COPY TO command exports the contents of a table to one or more files. Each node of the cluster which contains a shard of the table will export the contents of its shards.

The created files are JSON formatted and contain one table row per line.

If the DIRECTORY keyword is given, the URI is treated as a directory path. This will generate one or more files in the given directory, named to prevent filename conflicts.

Implications of shard level export

Exporting the data on a shard level is the fastest possible way to export data in Crate.

There are quirks users should be aware of when using COPY TO:

Since data is exported and written per shard there might be concurrent writes to the same file if a single node contains more than one shard. This could cause corruption in the file being written. The same holds true when data is written to a shared file system such as Amazon S3 or NFS.

To prevent this from happening writing a separate file per shard is required. To do this the DIRECTORY keyword can be used. An example is given in Exporting Data.

Note

Currently only user tables can be exported. System tables like sys.nodes and blob tables don’t work with the COPY TO statement.

Parameters

table_ident:The name (optionally schema-qualified) of the table to be exported.
column:(optional) A list of column expressions that should be exported.

Note

Declaring columns changes the output to JSON list format, which is currently not supported by the COPY FROM statement.

WHERE Clause

WHERE clauses follow the same syntax as used in SELECT statements allowing for partial exports that meet certain criteria. (see WHERE Clause for more information).

Output URI

The output_uri can be any expression evaluating to a string. The resulting string should be a valid URI of one of the supporting schemes:

  • file://
  • s3://[<accesskey>:<secretkey>@]<bucketname>/<path>

If no scheme is given (e.g.: ‘/path/to/file’) the default uri-scheme file:// will be used.

Note

If the s3 scheme is used without specifying any credentials an attempt is made to read these information from the AWS_ACCESS_KEY_ID and AWS_SECRET_KEY environment variables. In addition to that the Java System properties aws.accessKeyId and aws.secretKey are also used as a fallback.

Note

A secretkey provided by Amazon Web Service can contain characters such as ‘/’, ‘+’ or ‘=’. Such characters must be URI encoded. The same encoding as in s3 applies.

Note

Versions prior to 0.51.x use HTTP for connections to S3. Since 0.51.x these connections are using the HTTPS protocol. Please make sure you update your firewall rules to allow outgoing connections on port 443.

PARTITION Clause

If the table is partitioned this clause can be used to only export data from a specific partition.

The exported data doesn’t contain the partition columns or values as they are not part of the partitioned tables.

[ PARTITION ( partition_column = value [ , ... ] ) ]
partition_column:The name of the column by which the table is partitioned. All partition columns that were part of the PARTITIONED BY Clause of the CREATE TABLE statement must be specified.
value:The columns value.

Note

If COPY TO is used on a partitioned table without the PARTITION clause, the partition columns and values will be included in the rows of the exported files. If a partition column is a generated column, it will not be included even if the PARTITION clause is missing.

WITH Clause

The optional WITH clause can specify parameters for the copy statement.

[ WITH ( copy_parameter [= value] [, ... ] ) ]

Possible copy_parameters are:

compression

Define if and how the exported data should be compressed. Per default no compression is applied.

Possible values for the compression setting are:

gzip:The exported data is compressed with gzip.

format

Optional parameter to override default output behavior.

Possible values for the format settings are:

json_object:Output file contains one independently JSON serialized table row per line. This is the default behavior if no columns are defined. Use this format to import with COPY FROM.
json_array:Output file contains on each line a JSON array storing column values of one row. This is the default behavior if columns are defined.