Lexical Structure

An SQL input consists of a sequence of commands each of which is a sequence of tokens, terminated by a semicolon (;).

The syntax of a command defines its set of valid tokens. A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol.

Table of contents

String literal

String literals are defined as an arbitrary sequence of characters that are delimited with single quotes ' as defined in ANSI SQL, for example 'This is a string'.

Escape strings

The escape character in CrateDB is the single-quote '. A character gets escaped when adding a single-quote before it. For example a single quote character within a string literal can be included by writing two adjacent single quotes, e.g. 'Jack''s car'.

Note

Two adjacent single quotes are not equivalent to the double-quote character ".

String literals with C-Style escapes

In addition to the escaped character ', CrateDB supports C-Style escaped string sequences. Such a sequence is constructed by prefixing the string literal with the letter E or e, for example, e'hello\nWorld'. The following escaped sequences are supported:

Escape Sequence Interpretation
\b backspace
\f form feed
\n newline
\r carriage return
\t tab
\o, \oo, \ooo (o = [0-7]) octal byte value
\xh, xhh (h = [0-9,A-F,a-f]) hexadecimal byte value
\uxxxx, \Uxxxxxxxx (x = [0-9,A-F,a-f]) 16 or 32-bit hexadecimal Unicode character value

For instance, the escape string literal e'\u0061\x61\141' is equivalent to the 'aaa' string literal.

cr> select e'\u0061\x61\141' as col1;
+------+
| col1 |
+------+
| aaa  |
+------+
SELECT 1 row in set (... sec)

Any other character following a backslash is taken literally. Thus, to include a backslash character \, two adjacent backslashes need to be used (i.e. \\).

cr> select e'aa\\nbb' as col1;
+--------+
| col1   |
+--------+
| aa\nbb |
+--------+
SELECT 1 row in set (... sec)

Finally, a single quote can be included in an escape string literal by also using the escape backslash character: \', in addition to the single-quote described in String Literals.

cr> select e'aa\'bb' as col1;
+-------+
| col1  |
+-------+
| aa'bb |
+-------+
SELECT 1 row in set (... sec)

Key words and identifiers

The table bellow lists all reserved key words in CrateDB. These need to be quoted if used as identifiers.

ADD ALL ALTER AND
ANY ARRAY AS ASC
BETWEEN BOOLEAN BOTH BY
BYTE CALLED CASE CAST
COLUMN CONSTRAINT CREATE CROSS
CURRENT_DATE CURRENT_SCHEMA CURRENT_TIME CURRENT_TIMESTAMP
CURRENT_USER DEFAULT DELETE DENY
DESC DESCRIBE DIRECTORY DISTINCT
DOUBLE DROP ELSE END
ESCAPE EXCEPT EXISTS EXTRACT
FALSE FIRST FLOAT FOR
FROM FULL FUNCTION GRANT
GROUP HAVING IF IN
INDEX INNER INPUT INSERT
INT INTEGER INTERSECT INTO
IP IS JOIN LAST
LEADING LEFT LICENSE LIKE
LIMIT LONG MATCH NATURAL
NOT NULL NULLS OBJECT
OFFSET ON OR ORDER
OUTER PERSISTENT PRIMARY RECURSIVE
REPLACE RESET RETURNS REVOKE
RIGHT SELECT SESSION_USER SET
SHORT SOME STRATIFY STRING
SUBSTRING TABLE THEN TRAILING
TRANSIENT TRIM TRUE TRY_CAST
UNBOUNDED UNION UPDATE USER
USING WHEN WHERE WITH

Tokens such as my_table, id, name, or data in the example below are identifiers, which identify names of tables, columns, and other database objects.

Example:

CREATE TABLE my_table (
  id INTEGER,
  name STRING,
  data OBJECT
) WITH (number_of_replicas = 0);

Note

Key words and unquoted identifiers are case insensitive.

This means that:

select foo from t;

is equivalent to:

select Foo from t;

or:

select FOO from t;

A widely used convention is to write key words in uppercase and identifiers in lowercase, such as

ALTER TABLE foo ADD COLUMN new_column INTEGER;
INSERT INTO foo (id, name) VALUES (1, 'bar');

Quoted identifiers can contain an arbitrary sequence of charactes enclosed by double quotes ("). Quoted identifiers are never keywords, so you can use "update" as a table or column name.

Special characters

Some non-alphanumeric characters do have a special meaning. For their usage please refer to the sections where the respective syntax elements are described.

Semicolon:The semicolon (;) terminates an SQL statement. It cannot appear anywhere else within the command, except within a string or quoted identifier.
Comma:The comma (,) is used in various syntactical elements to separate elements of a list.
Brackets:Square brackets ([]) are used to select elements of arrays and objects, e.g. arr[1] or obj['key'].
Asterisk:The asterisk (*) is used in some contexts to denote all columns of a table. As an argument in global aggregate functions it has the meaning of any field, e.g. COUNT(*).
Period:The period (.) is used for numeric values and to separate schema and table names, e.g. blob.my_blob_table.

Comments

An SQL statement can contain comments. Single line comments start with a double dash (--) and end at the end of that line. Multi line comments start with /* and end with */.

Example:

/*
 * Retrieve information about all tables in the 'doc' schema.
 */
SELECT *
  FROM information_schema.tables
  WHERE table_schema = 'doc'; -- query information schema for doc tables