There are various ways to migrate your existing data from MySQL to Crate. However, these methods may differ in performance. A fast and reliable way to migrate is to use Crate’s existing export and import tools.
This best practice example shows how to migrate your data without any hassle.
We are assuming we have an existing table
foo in MySQL with the following schema:
cr> CREATE TABLE foo ( ... id integer primary key, ... name varchar(255), ... date datetime, ... fruits set('apple', 'pear', 'banana') ... ) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
cr> INSERT INTO foo (id, name, date, fruits) ... VALUES (1, 'foo', '2014-10-31 09:22:56', 'apple,banana'), ... (2, 'bar', null, 'pear');
MySQL does not support
JSON as an output format, but it is possible to
write to a file using the
cr> SELECT id, name ... INTO OUTFILE '/tmp/dump.csv' ... CHARACTER SET utf8 ... FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ... LINES TERMINATED BY '\n' ... FROM foo;
Make sure fields are seperated by a comma (
,), that values are quoted with a pair of
" and that lines are terminated by the newline character
Note that we specify
utf8 as encoding for writing into the outfile. This is important because Crate requires
You may need to set the character encoding of the client and
mysqld in the
[client] default-character-set=utf8 [mysqld] character-set-server=utf8
All values written to
csv are quoted and therefore interpreted as strings
when read from the convert script later.
The standard output for date/time types in MySQL
is a string, However, Crate uses a
long type to store timestamps (millisecond precision).
To prevent problems with dates that have timezone information, use MySQL’s builtin
UNIX_TIMESTAMP function to convert
year types into UNIX timestamps directly in the
The output of this function must be multiplied by
ms) to obtain the correct
long value that can be used for importing into Crate.
cr> SELECT UNIX_TIMESTAMP(datetime_column)*1000 from table_name;
Final export query:
cr> SELECT id, name, UNIX_TIMESTAMP(date)*1000, fruits ... INTO OUTFILE '/tmp/dump.csv' ... CHARACTER SET utf8 ... FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ... LINES TERMINATED BY '\n' ... FROM foo;
Use the Python script (csv2json.py) to convert the exported CSV file
into a Crate compatible JSON file. The source file needs to be
The script requires
csv input from
stdin and prints the converted lines
stdout. Each line is a separate JSON string.
--columns argument specifies the column names and column types for the conversion.
The format for this argument is:
column_name must match the column name and
column_type must match the JSON type that the value should be converted to. Supported
column_type values are:
string short int integer long float double array timestamp (same as long)
For our example the usage looks like:
sh$ cat /tmp/dump.csv | python mysql2crate.py - \ ... --columns id:int name:str date:timestamp fruits:array > /tmp/import.json
For help use the
sh$ python mysql2crate.py -h
The column names need to match the names specified by the
cr> CREATE TABLE foo_imported ( ... id int, ... name string, ... date timestamp, ... fruits array(string) ... );
Finally use the
COPY FROM statement to import your newly generated json file into Crate.
cr> COPY foo_imported FROM '/tmp/import.json' WITH (bulk_size=1000);
Read Importing Huge Datasets into Crate for more information how to import huge datasets into Crate.
The tools provided by csvkit allow you to directly insert CSV data into Crate via SQLAlchemy, using Crate’s native driver to create the table, guess the corresponding data types, and insert any data found in the CSV file.
The simplest way to do so is like this:
sh$ csvsql --db crate://localhost:4200 --insert /tmp/dump.csv