Migrating from MySQL

There are various ways to migrate your existing data from MySQL to Crate. However, these methods may differ in performance. A fast and reliable way to migrate is to use Crate’s existing export and import tools.

This best practice example shows how to migrate your data without any hassle.

We are assuming we have an existing table foo in MySQL with the following schema:

CREATE TABLE foo (
  id integer primary key,
  name varchar(255),
  date datetime,
  fruits set('apple', 'pear', 'banana')
) CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Sampledata:

INSERT INTO foo (id, name, date, fruits)
  VALUES (1, 'foo', '2014-10-31 09:22:56', 'apple,banana'),
    (2, 'bar', null, 'pear');

Exporting Data from MySQL

MySQL does not support JSON as an output format, but it is possible to write to a file using the csv format.

Example:

SELECT id, name
  INTO OUTFILE '/tmp/dump.csv'
    CHARACTER SET utf8
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    LINES TERMINATED BY '\n'
  FROM foo;

Make sure fields are seperated by a comma (,), that values are quoted with a pair of " and that lines are terminated by the newline character \n.

Note that we specify utf8 as encoding for writing into the outfile. This is important because Crate requires utf8 encoding.

You may need to set the character encoding of the client and mysqld in the my.cnf, too:

[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8

All values written to csv are quoted and therefore interpreted as strings when read from the convert script later.

Date/Time Types

The standard output for date/time types in MySQL is a string, However, Crate uses a long type to store timestamps (millisecond precision). To prevent problems with dates that have timezone information, use MySQL’s builtin UNIX_TIMESTAMP function to convert date, time, datetime, timestamp and year types into UNIX timestamps directly in the SELECT statement.

The output of this function must be multiplied by 1000 (converting s to ms) to obtain the correct long value that can be used for importing into Crate.

SELECT UNIX_TIMESTAMP(datetime_column)*1000 from table_name;

Final export query:

SELECT id, name, UNIX_TIMESTAMP(date)*1000, fruits
  INTO OUTFILE '/tmp/dump.csv'
    CHARACTER SET utf8
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    LINES TERMINATED BY '\n'
  FROM foo;

Converting CSV to JSON

Use the Python script (csv2json.py) to convert the exported CSV file into a Crate compatible JSON file. The source file needs to be utf8 encoded.

The script requires csv input from stdin and prints the converted lines to stdout. Each line is a separate JSON string.

The --columns argument specifies the column names and column types for the conversion.

The format for this argument is:

column_name:column_type

where column_name must match the column name and column_type must match the JSON type that the value should be converted to. Supported column_type values are:

string
short
int
integer
long
float
double
array
timestamp (same as long)

For our example the usage looks like:

$ cat /tmp/dump.csv | python mysql2crate.py - \
      --columns id:int name:str date:timestamp fruits:array > /tmp/import.json

For help use the -h option:

$ python mysql2crate.py -h

Importing Data into Crate

The column names need to match the names specified by the --columns argument.

CREATE TABLE foo_imported (
  id int,
  name string,
  date timestamp,
  fruits array(string)
);

Finally use the COPY FROM statement to import your newly generated json file into Crate.

COPY foo_imported FROM '/tmp/import.json' WITH (bulk_size=1000);

Note

Read Importing Huge Datasets into Crate for more information how to import huge datasets into Crate.