Selecting data¶
Selecting (i.e., retrieving) data from CrateDB can be done by using an SQL
SELECT statement. The response to a SELECT
query includes the column names of the result, the result rows as a
two-dimensional array of values, the row count, and the execution time.
See also
Table of contents
Introduction¶
A simple select:
cr> select id, name from locations order by id limit 2;
+----+-------------------+
| id | name |
+----+-------------------+
| 1 | North West Ripple |
| 2 | Outer Eastern Rim |
+----+-------------------+
SELECT 2 rows in set (... sec)
If the ‘*’ operator is used, all columns defined in the schema are returned for each row:
cr> select * from locations order by id limit 2;
+----+-------------------+--------------+--------+----------+-------------...-+-----------+
| id | name | date | kind | position | description ... | landmarks |
+----+-------------------+--------------+--------+----------+-------------...-+-----------+
| 1 | North West Ripple | 308534400000 | Galaxy | 1 | Relative to ... | NULL |
| 2 | Outer Eastern Rim | 308534400000 | Galaxy | 2 | The Outer Ea... | NULL |
+----+-------------------+--------------+--------+----------+-------------...-+-----------+
SELECT 2 rows in set (... sec)
Aliases can be used to change the output name of the columns:
cr> select name as n
... from locations
... where name = 'North West Ripple';
+-------------------+
| n |
+-------------------+
| North West Ripple |
+-------------------+
SELECT 1 row in set (... sec)
FROM
clause¶
The FROM
clause is used to reference the relation this select query is
based upon. Can be a single table, many tables, a view, a JOIN or another SELECT statement.
Tables and views are referenced by schema and table name and can optionally be
aliased. If the relation t
is only referenced by name, CrateDB assumes the
relation doc.t
was meant. Schemas that were newly created using
CREATE TABLE must be referenced explicitly.
The two following queries are equivalent:
cr> select name, position from locations
... order by name desc nulls last limit 2;
+-------------------+----------+
| name | position |
+-------------------+----------+
| Outer Eastern Rim | 2 |
| North West Ripple | 1 |
+-------------------+----------+
SELECT 2 rows in set (... sec)
cr> select doc.locations.name as n, position from doc.locations
... order by name desc nulls last limit 2;
+-------------------+----------+
| n | position |
+-------------------+----------+
| Outer Eastern Rim | 2 |
| North West Ripple | 1 |
+-------------------+----------+
SELECT 2 rows in set (... sec)
A table can be aliased for the sake of brevity too:
cr> select name from doc.locations as l
... where l.name = 'Outer Eastern Rim';
+-------------------+
| name |
+-------------------+
| Outer Eastern Rim |
+-------------------+
SELECT 1 row in set (... sec)
DISTINCT
clause¶
If DISTINCT is specified, one unique row is kept. All other duplicate rows are removed from the result set:
cr> select distinct date from locations order by date;
+---------------+
| date |
+---------------+
| 308534400000 |
| 1367366400000 |
| 1373932800000 |
+---------------+
SELECT 3 rows in set (... sec)
Note
Using DISTINCT is only supported on Primitive types.
WHERE
clause¶
A simple where clause example using an equality operator:
cr> select description from locations where id = '1';
+---------------------------------------...--------------------------------------+
| description |
+---------------------------------------...--------------------------------------+
| Relative to life on NowWhat, living on... a factor of about seventeen million. |
+---------------------------------------...--------------------------------------+
SELECT 1 row in set (... sec)
Comparison operators¶
These Comparison operators are supported and can be used for all simple data types.
For strings a lexicographical comparison is performed based on the Lucene TermRangeQuery:
cr> select name from locations where name > 'Argabuthon' order by name;
+------------------------------------+
| name |
+------------------------------------+
| Arkintoofle Minor |
| Bartledan |
| Galactic Sector QQ7 Active J Gamma |
| North West Ripple |
| Outer Eastern Rim |
+------------------------------------+
SELECT 5 rows in set (... sec)
For details please refer to the Apache Lucene site.
Number and date field comparison behave as expected from standard SQL.
The following example uses one of the supported ISO date formats:
cr> select date, position from locations where date <= '1979-10-12' and
... position < 3 order by position;
+--------------+----------+
| date | position |
+--------------+----------+
| 308534400000 | 1 |
| 308534400000 | 2 |
+--------------+----------+
SELECT 2 rows in set (... sec)
For a detailed explanation of the supported ISO date formats please refer to the joda date_optional_time site.
For custom date types, or defined date formats in the object mapping the corresponding format should be used for a comparison. Otherwise the operation may fail.
Regular expressions¶
Operators for matching using regular expressions.
Operator | Description | Example |
---|---|---|
~ |
Matches regular expression, case sensitive | 'foo' ~ '.*foo.*'
|
~* |
Matches regular expression, case insensitive | 'Foo' ~* '.*foo.*'
|
!~ |
Does not match regular expression, case sensitive | 'Foo' !~ '.*foo.*'
|
!~* |
Does not match regular expression, case insensitive | 'foo' !~* '.*bar.*'
|
The ~
operator can be used to match a string against a regular expression.
It returns true
if the string matches the pattern, false
if not, and
NULL
if string is NULL
.
To negate the matching, use the optional !
prefix. The operator returns
true
if the string does not match the pattern, false
otherwise.
The regular expression pattern is implicitly anchored, that means that the whole string must match, not a single subsequence. All unicode characters are allowed.
If using PCRE features in the regular expression pattern, the operator uses
the regular expression engine of the Java standard library java.util.regex
.
If not using PCRE features in the regular expression pattern, the operator uses Lucene Regular Expressions, which are optimized for fast regular expression matching on Lucene terms.
Lucene Regular Expressions are basically POSIX Extended Regular
Expressions without the character classes and with some extensions, like a
metacharacter #
for the empty string or ~
for negation and others. By
default all Lucene extensions are enabled. See the Lucene documentation for
more details.
Note
Since case-insensitive matching using ~*
or !~*
implicitly uses the
regular expression engine of the Java standard library, features of Lucene
Regular Expressions do not work there.
Examples:
cr> select name from locations where name ~ '([A-Z][a-z0-9]+)+'
... order by name;
+------------+
| name |
+------------+
| Aldebaran |
| Algol |
| Altair |
| Argabuthon |
| Bartledan |
+------------+
SELECT 5 rows in set (... sec)
cr> select 'matches' from sys.cluster where
... 'gcc --std=c99 -Wall source.c' ~ '[A-Za-z0-9]+( (-|--)[A-Za-z0-9]+)*( [^ ]+)*';
+-----------+
| 'matches' |
+-----------+
| matches |
+-----------+
SELECT 1 row in set (... sec)
cr> select 'no_match' from sys.cluster where 'foobaz' !~ '(foo)?(bar)$';
+------------+
| 'no_match' |
+------------+
| no_match |
+------------+
SELECT 1 row in set (... sec)
Comparison operators¶
LIKE (ILIKE)
¶
CrateDB supports the LIKE
and ILIKE
operators. These operators can
be used to query for rows where only part of a columns value should match
something. The only difference is that, in the case of ILIKE
, the
matching is case insensitive.
For example to get all locations where the name starts with ‘Ar’ the following queries can be used:
cr> select name from locations where name like 'Ar%' order by name asc;
+-------------------+
| name |
+-------------------+
| Argabuthon |
| Arkintoofle Minor |
+-------------------+
SELECT 2 rows in set (... sec)
cr> select name from locations where name ilike 'ar%' order by name asc;
+-------------------+
| name |
+-------------------+
| Argabuthon |
| Arkintoofle Minor |
+-------------------+
SELECT 2 rows in set (... sec)
The following wildcard operators are available:
% | A substitute for zero or more characters |
_ | A substitute for a single character |
The wildcard operators may be used at any point in the string literal. For example a more complicated like clause could look like this:
cr> select name from locations where name like '_r%a%' order by name asc;
+------------+
| name |
+------------+
| Argabuthon |
+------------+
SELECT 1 row in set (... sec)
In order so search for the wildcard characters themselves it is possible to escape them using a backslash:
cr> select description from locations
... where description like '%\%' order by description asc;
+-------------------------+
| description |
+-------------------------+
| The end of the Galaxy.% |
+-------------------------+
SELECT 1 row in set (... sec)
Caution
Queries with a like/ilike clause can be quite slow. Especially if the clause starts with a wildcard character. Because in that case CrateDB has to iterate over all rows and can’t utilize the index. For better performance consider using a fulltext index.
NOT
¶
NOT
negates a boolean expression:
[ NOT ] boolean_expression
The result type is boolean.
expression | result |
---|---|
true | false |
false | true |
null | null |
Warning
CrateDB handles the case of NOT (NULL)
inconsistently. The above is only
true when the NOT
appears in a SELECT
clause or a WHERE
clause
that operates on system tables. The result of NOT (NULL)
in a
WHERE
clause that operates on user tables will produce
inconsistent but deterministic results (NULL
or TRUE
)
depending on the specifics of the clause. This does not adhere to
standard SQL three-valued-logic and will be fixed in a future release.
IN
¶
CrateDB also supports the binary operator IN
, which allows you to verify
the membership of left-hand operand in a right-hand set of expressions. Returns
true
if any evaluated expression value from a right-hand set equals
left-hand operand. Returns false
otherwise:
cr> select name, kind from locations
... where (kind in ('Star System', 'Planet')) order by name asc;
+---------------------+-------------+
| name | kind |
+---------------------+-------------+
| | Planet |
| Aldebaran | Star System |
| Algol | Star System |
| Allosimanius Syneca | Planet |
| Alpha Centauri | Star System |
| Altair | Star System |
| Argabuthon | Planet |
| Arkintoofle Minor | Planet |
| Bartledan | Planet |
+---------------------+-------------+
SELECT 9 rows in set (... sec)
The IN
construct can be used in Subquery expressions or
Array comparisons.
IS NULL
¶
Returns TRUE
if expr
evaluates to NULL
. Given a column reference it
returns TRUE
if the field contains NULL
or is missing.
Use this predicate to check for NULL
values as SQL’s three-valued logic
does always return NULL
when comparing NULL
.
expr: | Expression of one of the supported Data types supported by CrateDB. |
---|
cr> select name from locations where inhabitants is null order by name;
+------------------------------------+
| name |
+------------------------------------+
| |
| Aldebaran |
| Algol |
| Allosimanius Syneca |
| Alpha Centauri |
| Altair |
| Galactic Sector QQ7 Active J Gamma |
| North West Ripple |
| Outer Eastern Rim |
| NULL |
+------------------------------------+
SELECT 10 rows in set (... sec)
cr> select count(*) from locations where name is null;
+----------+
| count(*) |
+----------+
| 1 |
+----------+
SELECT 1 row in set (... sec)
IS NOT NULL
¶
Returns TRUE
if expr
does not evaluate to NULL
. Additionally, for
column references it returns FALSE
if the column does not exist.
Use this predicate to check for non-NULL
values as SQL’s three-valued logic
does always return NULL
when comparing NULL
.
expr: | Expression of one of the supported Data types supported by CrateDB. |
---|
cr> select name from locations where inhabitants['interests'] is not null;
+-------------------+
| name |
+-------------------+
| Arkintoofle Minor |
| Bartledan |
| Argabuthon |
+-------------------+
SELECT 3 rows in set (... sec)
cr> select count(*) from locations where name is not null;
+----------+
| count(*) |
+----------+
| 12 |
+----------+
SELECT 1 row in set (... sec)
ANY (array)
¶
The ANY (or SOME) function allows you to query elements within arrays.
For example, this query returns any row where the array
inhabitants['interests']
contains a netball
element:
cr> select inhabitants['name'], inhabitants['interests'] from locations
... where 'netball' = ANY(inhabitants['interests']);
+---------------------+------------------------------+
| inhabitants['name'] | inhabitants['interests'] |
+---------------------+------------------------------+
| Minories | ["netball", "short stories"] |
| Bartledannians | ["netball"] |
+---------------------+------------------------------+
SELECT 2 rows in set (... sec)
This query combines the ANY
function with the LIKE
operator:
cr> select inhabitants['name'], inhabitants['interests'] from locations
... where '%stories%' LIKE ANY(inhabitants['interests']);
+---------------------+------------------------------+
| inhabitants['name'] | inhabitants['interests'] |
+---------------------+------------------------------+
| Minories | ["netball", "short stories"] |
+---------------------+------------------------------+
SELECT 1 row in set (... sec)
This query passes a literal array value to the ANY
function:
cr> select name, inhabitants['interests'] from locations
... where name = ANY(ARRAY['Bartledan', 'Algol'])
... order by name asc;
+-----------+--------------------------+
| name | inhabitants['interests'] |
+-----------+--------------------------+
| Algol | NULL |
| Bartledan | ["netball"] |
+-----------+--------------------------+
SELECT 2 rows in set (... sec)
This query selects any locations with at least one (i.e., ANY) population figure above 100:
cr> select name, information['population'] from locations
... where 100 < ANY (information['population'])
... order by name;
+-------------------+---------------------------+
| name | information['population'] |
+-------------------+---------------------------+
| North West Ripple | [12, 163] |
| Outer Eastern Rim | [5673745846] |
+-------------------+---------------------------+
SELECT 2 rows in set (... sec)
Note
It is possible to use ANY
to compare values directly against the
properties of object arrays, as above. However, this usage is discouraged
as it cannot utilize the table index and requires the equivalent of a table
scan.
The ANY
construct can be used in subquery expressions and array comparisons.
Negating ANY
¶
Negating the ANY
operator does not behave like other comparison operators.
The following query negates ANY
using !=
to return all rows where
inhabitants['interests']
has at least one array
element that is not netball
:
cr> select inhabitants['name'], inhabitants['interests'] from locations
... where 'netball' != ANY(inhabitants['interests']);
+---------------------+------------------------------+
| inhabitants['name'] | inhabitants['interests'] |
+---------------------+------------------------------+
| Minories | ["netball", "short stories"] |
| Argabuthonians | ["science", "reason"] |
+---------------------+------------------------------+
SELECT 2 rows in set (... sec)
Note
When using the != ANY(<array_col>))
syntax, the default maximum size
of the array can be 8192. To be use larger arrays, you must configure the
indices.query.bool.max_clause_count setting as appropriate on each node.
Negating the same query with a preceding not
returns all rows where
inhabitants['interests']
has no netball
element:
cr> select inhabitants['name'], inhabitants['interests'] from locations
... where not 'netball' = ANY(inhabitants['interests']);
+---------------------+--------------------------+
| inhabitants['name'] | inhabitants['interests'] |
+---------------------+--------------------------+
| Argabuthonians | ["science", "reason"] |
+---------------------+--------------------------+
SELECT 1 row in set (... sec)
This behaviour applies to:
LIKE
andNOT LIKE
- All other comparison operators (excluding
IS NULL
andIS NOT NULL
)
Note
When using the NOT
with ANY
, the performance of the query may be
poor because special handling is required to implement the 3-valued
logic. For better performance, consider using the ignore3vl function.
Additionally, When using NOT
with LIKE ANY
or NOT LIKE ANY
,
the default maximum size of the array can be 8192. To be use larger arrays,
you must configure the indices.query.bool.max_clause_count setting as appropriate on each node.
Nonscalar data types¶
Arrays¶
CrateDB supports nonscalar arrays. It is possible to select and query array elements.
For example, you might insert an array like so:
cr> insert into locations (id, name, position, kind, landmarks)
... values (14, 'Frogstar', 4, 'Star System',
... ['Total Perspective Vortex', 'Milliways']
... );
INSERT OK, 1 row affected (... sec)
The result:
cr> select name, landmarks from locations
... where name = 'Frogstar';
+----------+-------------------------------------------+
| name | landmarks |
+----------+-------------------------------------------+
| Frogstar | ["Total Perspective Vortex", "Milliways"] |
+----------+-------------------------------------------+
SELECT 1 row in set (... sec)
The individual array elements can be selected from the landmarks
column
with landmarks[n]
, where n
is the integer array index, like so:
cr> select name, landmarks[1] from locations
... where name = 'Frogstar';
+----------+--------------------------+
| name | landmarks[1] |
+----------+--------------------------+
| Frogstar | Total Perspective Vortex |
+----------+--------------------------+
SELECT 1 row in set (... sec)
Note
The first index value is 1
. The maximum array index is 2147483648
.
Using an index greater than the array size results in a NULL value.
Individual array elements can also be addressed in the where clause, like so:
cr> select name, landmarks from locations
... where landmarks[2] = 'Milliways';
+----------+-------------------------------------------+
| name | landmarks |
+----------+-------------------------------------------+
| Frogstar | ["Total Perspective Vortex", "Milliways"] |
+----------+-------------------------------------------+
SELECT 1 row in set (... sec)
When using the =
operator, as above, the value of the array element at
index n
is compared. To compare against any array element, see
ANY (array).
Objects¶
CrateDB supports nonscalar objects. It is possible to select and query object properties.
For example, you might insert an object like so:
cr> insert into locations (id, name, position, kind, inhabitants)
... values (15, 'Betelgeuse', 2, 'Star System',
... {name = 'Betelgeuseans',
... description = 'Humanoids with two heads'}
... );
INSERT OK, 1 row affected (... sec)
The result:
cr> select name, inhabitants from locations
... where name = 'Betelgeuse';
+------------+----------------------------------------------------------------------+
| name | inhabitants |
+------------+----------------------------------------------------------------------+
| Betelgeuse | {"description": "Humanoids with two heads", "name": "Betelgeuseans"} |
+------------+----------------------------------------------------------------------+
SELECT 1 row in set (... sec)
The object properties can be selected from the inhabitants
column with
inhabitants['property']
, where property
is the property name, like so:
cr> select name, inhabitants['name'] from locations
... where name = 'Betelgeuse';
+------------+---------------------+
| name | inhabitants['name'] |
+------------+---------------------+
| Betelgeuse | Betelgeuseans |
+------------+---------------------+
SELECT 1 row in set (... sec)
Object property can also be addressed in the where clause, like so:
cr> select name, inhabitants from locations
... where inhabitants['name'] = 'Betelgeuseans';
+------------+----------------------------------------------------------------------+
| name | inhabitants |
+------------+----------------------------------------------------------------------+
| Betelgeuse | {"description": "Humanoids with two heads", "name": "Betelgeuseans"} |
+------------+----------------------------------------------------------------------+
SELECT 1 row in set (... sec)
Nested structures¶
Objects may contain arrays and arrays may contain objects. These nested structures can be selected and queried.
For example, you might insert something like this:
cr> insert into locations (id, name, position, kind, inhabitants, information)
... values (16, 'Folfanga', 4, 'Star System',
... {name = 'A-Rth-Urp-Hil-Ipdenu',
... description = 'A species of small slug',
... interests = ['lettuce', 'slime']},
... [{evolution_level=42, population=1},
... {evolution_level=6, population=3600001}]
... );
INSERT OK, 1 row affected (... sec)
The query above includes:
An array nested within an object. Specifically, the
inhabitants
column contains an parent object with aninterests
property set to an child array of strings (e.g.,lettuce
).Objects nested within an array. Specifically, the
information
column contains an parent array with two child objects (e.g.,{evolution_level=42, population=1}
).
Arrays within objects¶
The child array (above) can be selected as a property of the parent object:
cr> select name, inhabitants['interests'] from locations
... where name = 'Folfanga';
+----------+--------------------------+
| name | inhabitants['interests'] |
+----------+--------------------------+
| Folfanga | ["lettuce", "slime"] |
+----------+--------------------------+
SELECT 1 row in set (... sec)
Individual elements of the child array can be selected by combining the array index syntax with the object property name syntax, like so:
cr> select name, inhabitants[1]['interests'] from locations
... where name = 'Folfanga';
+----------+-----------------------------+
| name | inhabitants[1]['interests'] |
+----------+-----------------------------+
| Folfanga | lettuce |
+----------+-----------------------------+
SELECT 1 row in set (... sec)
Tip
The example above might surprise you because the child array index comes before the parent object property name, which doesn’t follow the usual left-to-right convention for addressing the contents of a nested structure.
To address the contents of a nested structure in the usual left-to-right sense, CrateDB would have to read the string literal from disk, parse it into memory, and then manipulate the result. Compared to a Lucene search (which CrateDB uses for most other), this would be a very slow and resource intensive process.
To avoid this issue, CrateDB indexes complex structures by flattening them
into multiple Lucene records. Accordingly, an expression like
inhabitants[1]['interests']
is better thought of as a search
instruction for the Lucene index.
Limitations¶
As mentioned above, CrateDB indexes complex structures by flattening them into multiple Lucene records. While this approach makes it possible to efficiently query nested documents to an arbitrary depth with the full speed of the table index, it does have some limitations:
You cannot directly nest an array within an array (i.e.,
array(array(...)
is not a valid column definition). You can, however, nest multiple arrays as long as an object comes between them (e.g.,array(object as (array(...)))
is a valid).Using the standard syntax, you can only address the elements of one array in a single expression. Note: If you do address the elements of an array, the array index must appear before any object property names (see the previous tip for more information).
Tip
If you want to address the elements of more than one array in a single expression, you can use the following non-standard syntax:
select foo[n1]['bar']::text[][n2] from my_table;
Here, n1
is the index of the first array (column foo
) and n2
is
the index of the second array (object property bar
).
This works by:
- Type casting the second array (i.e.,
foo[n1]['bar']
) to a string using the<expression>::text
syntax, which is equivalent tocast(<expression> as text)
- Creating an temporary array (in-memory and
addressable) from that string using the
<expression>[]
syntax, which is equivalent toarray(expression
)
Note: Because this syntax effectively circumvents the index, it may considerably degrade query performance.
Objects within arrays¶
An individual child object (above) can be selected from a parent array as an array element using the array index syntax:
cr> select name, information[1] from locations
... where name = 'Outer Eastern Rim';
+-------------------+--------------------------------------------------+
| name | information[1] |
+-------------------+--------------------------------------------------+
| Outer Eastern Rim | {"evolution_level": 2, "population": 5673745846} |
+-------------------+--------------------------------------------------+
SELECT 1 row in set (... sec)
Properties of individual child objects can be selected by combining the array index syntax with the object property name syntax, like so:
cr> select name, information[1]['population'] from locations
... where name = 'Outer Eastern Rim';
+-------------------+------------------------------+
| name | information[1]['population'] |
+-------------------+------------------------------+
| Outer Eastern Rim | 5673745846 |
+-------------------+------------------------------+
SELECT 1 row in set (... sec)
Additionally, consider this data:
cr> select name, information from locations
... where information['population'] is not null;
+-------------------+-------------------------------------------------------------------------------------------+
| name | information |
+-------------------+-------------------------------------------------------------------------------------------+
| North West Ripple | [{"evolution_level": 4, "population": 12}, {"evolution_level": 42, "population": 163}] |
| Outer Eastern Rim | [{"evolution_level": 2, "population": 5673745846}] |
| Folfanga | [{"evolution_level": 42, "population": 1}, {"evolution_level": 6, "population": 3600001}] |
+-------------------+-------------------------------------------------------------------------------------------+
SELECT 3 rows in set (... sec)
If you’re only interested in one property of each object (e.g., population), you can select a virtual array containing all of the values for that property, like so:
cr> select name, information['population'] from locations
... where information['population'] is not null;
+-------------------+---------------------------+
| name | information['population'] |
+-------------------+---------------------------+
| North West Ripple | [12, 163] |
| Outer Eastern Rim | [5673745846] |
| Folfanga | [1, 3600001] |
+-------------------+---------------------------+
SELECT 3 rows in set (... sec)
Data aggregation¶
CrateDB supports Aggregation via the following aggregation functions.
Aggregation works across all the rows that match a query or on all matching
rows in every distinct group of a GROUP BY
statement. Aggregating
SELECT
statements without GROUP BY
will always return one row.
Name | Arguments | Description | Return Type |
---|---|---|---|
ARBITRARY | column name of a primitive typed column (all but object) | Returns an undefined value of all the values in the argument column. Can be NULL. | the input column type or NULL if some value of the matching rows in that column is NULL |
AVG / MEAN | column name of a numeric or timestamp column | Returns the arithmetic mean of the values in the argument column. NULL-values are ignored. | double or NULL if all values of all matching rows in that column are NULL |
COUNT(*) | star as parameter or as constant | Counts the number of rows that match the query. | long |
COUNT | column name | Counts the number of rows that contain a non NULL value for the given column. | long |
COUNT(DISTINCT col) | column name | Counts the number of distinct values for the given column that are not NULL. | long |
GEOMETRIC_MEAN | column name of a numeric or timestamp column | Computes the geometric mean for positive numbers. | double or NULL if all values of all matching rows in that are NULL or if a value is negative. |
MIN | column name of a numeric, timestamp or string column | Returns the smallest of the values in the argument column in case of strings this means the lexicographically smallest. NULL-values are ignored | the input column type or NULL if all values in that matching rows in that column are NULL |
MAX | column name of a numeric, timestamp or string column | Returns the biggest of the values in the argument column in case of strings this means the lexicographically biggest. NULL-values are ignored | the input column type or NULL if all values of all matching rows in that column are NULL |
STDDEV | column name of a numeric or timestamp column | Returns the standard deviation of the values in the argument column. NULL-values are ignored. | double or NULL if all values are NULL or we got no value at all |
STRING_AGG | an expression and delimiter of a text type | Concatenated input values into a string, separated by a delimiter. NULL-values are ignored. | text |
PERCENTILE | column of a numeric type and a double percentile value | Returns the provided percentile of the values in the argument column. NULL-values are ignored. | a double precision value |
SUM | column name of a numeric or timestamp column | Returns the sum of the values in the argument column. NULL-values are ignored. | double or NULL if all values of all matching rows in that column are NULL |
VARIANCE | column name of a numeric or timestamp column | Returns the variance of the values in the argument column. NULL-values are ignored. | double or NULL if all values are NULL or we got no value at all |
Some Examples:
cr> select count(*) from locations;
+----------+
| count(*) |
+----------+
| 16 |
+----------+
SELECT 1 row in set (... sec)
cr> select count(*) from locations where kind = 'Planet';
+----------+
| count(*) |
+----------+
| 5 |
+----------+
SELECT 1 row in set (... sec)
cr> select count(name), count(*) from locations;
+-------------+----------+
| count(name) | count(*) |
+-------------+----------+
| 15 | 16 |
+-------------+----------+
SELECT 1 row in set (... sec)
cr> select max(name) from locations;
+-------------------+
| max(name) |
+-------------------+
| Outer Eastern Rim |
+-------------------+
SELECT 1 row in set (... sec)
cr> select min(date) from locations;
+--------------+
| min(date) |
+--------------+
| 308534400000 |
+--------------+
SELECT 1 row in set (... sec)
cr> select count(*), kind from locations
... group by kind order by kind asc;
+----------+-------------+
| count(*) | kind |
+----------+-------------+
| 4 | Galaxy |
| 5 | Planet |
| 7 | Star System |
+----------+-------------+
SELECT 3 rows in set (... sec)
cr> select max(position), kind from locations
... group by kind order by max(position) desc;
+---------------+-------------+
| max(position) | kind |
+---------------+-------------+
| 6 | Galaxy |
| 5 | Planet |
| 4 | Star System |
+---------------+-------------+
SELECT 3 rows in set (... sec)
cr> select min(name), kind from locations
... group by kind order by min(name) asc;
+------------------------------------+-------------+
| min(name) | kind |
+------------------------------------+-------------+
| | Planet |
| Aldebaran | Star System |
| Galactic Sector QQ7 Active J Gamma | Galaxy |
+------------------------------------+-------------+
SELECT 3 rows in set (... sec)
cr> select count(*), min(name), kind from locations
... group by kind order by kind;
+----------+------------------------------------+-------------+
| count(*) | min(name) | kind |
+----------+------------------------------------+-------------+
| 4 | Galactic Sector QQ7 Active J Gamma | Galaxy |
| 5 | | Planet |
| 7 | Aldebaran | Star System |
+----------+------------------------------------+-------------+
SELECT 3 rows in set (... sec)
cr> select sum(position) as sum_positions, kind from locations
... group by kind order by sum_positions;
+---------------+-------------+
| sum_positions | kind |
+---------------+-------------+
| 13 | Galaxy |
| 15 | Planet |
| 20 | Star System |
+---------------+-------------+
SELECT 3 rows in set (... sec)
Window functions¶
CrateDB supports the OVER clause to enable the execution of window functions:
cr> select sum(position) OVER() AS pos_sum, name from locations order by name;
+---------+------------------------------------+
| pos_sum | name |
+---------+------------------------------------+
| 48 | |
| 48 | Aldebaran |
| 48 | Algol |
| 48 | Allosimanius Syneca |
| 48 | Alpha Centauri |
| 48 | Altair |
| 48 | Argabuthon |
| 48 | Arkintoofle Minor |
| 48 | Bartledan |
| 48 | Betelgeuse |
| 48 | Folfanga |
| 48 | Frogstar |
| 48 | Galactic Sector QQ7 Active J Gamma |
| 48 | North West Ripple |
| 48 | Outer Eastern Rim |
| 48 | NULL |
+---------+------------------------------------+
SELECT 16 rows in set (... sec)
GROUP BY
¶
CrateDB supports the group by
clause. This clause can be used to group the
resulting rows by the value(s) of one or more columns. That means that rows
that contain duplicate values will be merged.
This is useful if used in conjunction with aggregation functions:
cr> select count(*), kind from locations
... group by kind order by count(*) desc, kind asc;
+----------+-------------+
| count(*) | kind |
+----------+-------------+
| 7 | Star System |
| 5 | Planet |
| 4 | Galaxy |
+----------+-------------+
SELECT 3 rows in set (... sec)
Note
All columns that are used either as result column or in the order by clause have to be used within the group by clause. Otherwise the statement won’t execute.
Grouping will be executed against the real table column when aliases that shadow the table columns are used.
Grouping on array columns doesn’t work, but arrays can be unnested in a subquery using unnest( array [ array , ] ), it is then possible to use GROUP BY on the subquery.
HAVING
¶
The having clause is the equivalent to the where clause for the resulting rows of a group by clause.
A simple having clause example using an equality operator:
cr> select count(*), kind from locations
... group by kind having count(*) = 4 order by kind;
+----------+--------+
| count(*) | kind |
+----------+--------+
| 4 | Galaxy |
+----------+--------+
SELECT 1 row in set (... sec)
The condition of the having clause can refer to the resulting columns of the group by clause.
It is also possible to use aggregates in the having clause just like in the result columns:
cr> select count(*), kind from locations
... group by kind having min(name) = 'Aldebaran';
+----------+-------------+
| count(*) | kind |
+----------+-------------+
| 7 | Star System |
+----------+-------------+
SELECT 1 row in set (... sec)
cr> select count(*), kind from locations
... group by kind having count(*) = 4 and kind like 'Gal%';
+----------+--------+
| count(*) | kind |
+----------+--------+
| 4 | Galaxy |
+----------+--------+
SELECT 1 row in set (... sec)
Note
Aliases are not supported in the having clause.