Window functions¶
Table of contents
Introduction¶
Window functions are functions which perform a computation across a set of rows which are related to the current row. This is comparable to aggregation functions, but window functions do not cause multiple rows to be grouped into a single row.
Window Function Call¶
Synopsis¶
The synopsis of a window function call is one of the following
function_name ([expression [, expression ... ]])
[ FILTER ( WHERE condition ) ]
over_clause
function_name ( * ) [ FILTER ( WHERE condition ) ] over_clause
where function_name
is a name of
a general-purpose window or
aggregate function
and expression
is a column reference, scalar function or literal.
If FILTER
is specified, then only the rows that met the
WHERE clause condition are supplied to the window
function. Only window functions that are aggregates
accept the FILTER
clause.
The OVER clause is what declares a function to be a window function.
The window function call that uses a wildcard
instead of an expression
as
a function argument is supported only by the count(*)
aggregate function.
Window Definition¶
OVER¶
Synopsis¶
OVER { window_name | ( [ window_definition ] ) }
where window_definition
has the syntax
window_definition:
[ window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ { RANGE | ROWS } BETWEEN frame_start AND frame_end ]
The window_name
refers to window_definition
defined in the
WINDOW clause.
The frame_start
and frame_end
can be one of
UNBOUNDED PRECEDING
offset PRECEDING
CURRENT ROW
offset FOLLOWING
UNBOUNDED FOLLOWING
The default frame definition is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT
ROW
. If frame_end
is omitted it defaults to CURRENT ROW
.
frame_start
cannot be FOLLOWING
or UNBOUNDED FOLLOWING
and
frame_end
cannot be PRECEDING
or UNBOUNDED PRECEDING
.
In RANGE
mode if the frame_start
is CURRENT ROW
the frame starts
with the current row’s first peer (a row that the window’s ORDER BY
expression sorts as equal to the current row), while a frame_end
of
CURRENT ROW
means the frame will end with the current’s row last peer row.
In ROWS
mode CURRENT_ROW
means the current row.
The offset PRECEDING
and offset FOLLOWING
options vary in meaning
depending on the frame mode. In ROWS
mode, the offset
is an integer
indicating that the frame start or end is offsetted by that many rows before or
after the current row. In RANGE
mode, the use of a custom offset
option requires that there is exactly one ORDER BY
column in the window
definition. The frame contains those rows whose ordering column value is no
more than offset
minus (for PRECEDING) or plus (for FOLLOWING) the current
row’s ordering column value. Because the value of offset
is substracted/added
to the values of the ordering column, only type combinations that support
addition/substraction operations are allowed. For instance, when the ordering
column is of type timestamp, the offset
expression can be an interval.
The OVER
clause defines the window
containing the appropriate rows
which will take part in the window function
computation.
An empty OVER
clause defines a window
containing all the rows in the
result set.
Example:
cr> SELECT dept_id, COUNT(*) OVER() AS cnt FROM employees ORDER BY 1, 2;
+---------+-----+
| dept_id | cnt |
+---------+-----+
| 4001 | 18 |
| 4001 | 18 |
| 4001 | 18 |
| 4002 | 18 |
| 4002 | 18 |
| 4002 | 18 |
| 4002 | 18 |
| 4003 | 18 |
| 4003 | 18 |
| 4003 | 18 |
| 4003 | 18 |
| 4003 | 18 |
| 4004 | 18 |
| 4004 | 18 |
| 4004 | 18 |
| 4006 | 18 |
| 4006 | 18 |
| 4006 | 18 |
+---------+-----+
SELECT 18 rows in set (... sec)
The PARTITION BY
clause groups the rows within a window into
partitions which are processed separately by the window function, each
partition in turn becoming a window. If PARTITION BY
is not specified, all
the rows are considered a single partition.
Example:
cr> SELECT dept_id, ROW_NUMBER() OVER(PARTITION BY dept_id) AS row_num
... FROM employees ORDER BY 1, 2;
+---------+---------+
| dept_id | row_num |
+---------+---------+
| 4001 | 1 |
| 4001 | 2 |
| 4001 | 3 |
| 4002 | 1 |
| 4002 | 2 |
| 4002 | 3 |
| 4002 | 4 |
| 4003 | 1 |
| 4003 | 2 |
| 4003 | 3 |
| 4003 | 4 |
| 4003 | 5 |
| 4004 | 1 |
| 4004 | 2 |
| 4004 | 3 |
| 4006 | 1 |
| 4006 | 2 |
| 4006 | 3 |
+---------+---------+
SELECT 18 rows in set (... sec)
If ORDER BY
is supplied the window
definition consists of a range of
rows starting with the first row in the partition
and ending with the
current row, plus any subsequent rows that are equal to the current row, which
are the current row’s peers
.
Example:
cr> SELECT
... dept_id,
... sex,
... COUNT(*) OVER(PARTITION BY dept_id ORDER BY sex) AS cnt
... FROM employees
... ORDER BY 1, 2, 3
+---------+-----+-----+
| dept_id | sex | cnt |
+---------+-----+-----+
| 4001 | M | 3 |
| 4001 | M | 3 |
| 4001 | M | 3 |
| 4002 | F | 1 |
| 4002 | M | 4 |
| 4002 | M | 4 |
| 4002 | M | 4 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4004 | F | 1 |
| 4004 | M | 3 |
| 4004 | M | 3 |
| 4006 | F | 1 |
| 4006 | M | 3 |
| 4006 | M | 3 |
+---------+-----+-----+
SELECT 18 rows in set (... sec)
Note
Taking into account the peers
concept mentioned above, for an empty
OVER
clause all the rows in the result set are peers
.
Note
Aggregation functions will be treated as
window functions
when used in conjunction with the OVER
clause.
Note
Window definitions order or partitioned by an array column type are currently not supported.
In the UNBOUNDED FOLLOWING
case the window
for each row starts with
each row and ends with the last row in the current partition
. If the
current row
has peers
the window
will include (or start with) all
the current row
peers and end at the upper bound of the partition
.
Example:
cr> SELECT
... dept_id,
... sex,
... COUNT(*) OVER(
... PARTITION BY dept_id
... ORDER BY
... sex RANGE BETWEEN CURRENT ROW
... AND UNBOUNDED FOLLOWING
... ) partitionByDeptOrderBySex
... FROM employees
... ORDER BY 1, 2, 3
+---------+-----+---------------------------+
| dept_id | sex | partitionbydeptorderbysex |
+---------+-----+---------------------------+
| 4001 | M | 3 |
| 4001 | M | 3 |
| 4001 | M | 3 |
| 4002 | F | 4 |
| 4002 | M | 3 |
| 4002 | M | 3 |
| 4002 | M | 3 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4004 | F | 3 |
| 4004 | M | 2 |
| 4004 | M | 2 |
| 4006 | F | 3 |
| 4006 | M | 2 |
| 4006 | M | 2 |
+---------+-----+---------------------------+
SELECT 18 rows in set (... sec)
Named windows¶
It is possible to define a list of named window definitions that can be referenced in OVER clauses. To do this, use the WINDOW clause in the SELECT clause.
Named windows are particularly useful when the same window definition could be used in multiple OVER clauses. For instance
cr> SELECT
... x,
... FIRST_VALUE(x) OVER (w) AS "first",
... LAST_VALUE(x) OVER (w) AS "last"
... FROM (VALUES (1), (2), (3), (4)) AS t(x)
... WINDOW w AS (ORDER BY x)
+---+-------+------+
| x | first | last |
+---+-------+------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
+---+-------+------+
SELECT 4 rows in set (... sec)
If a window_name
is specified in the window definition of the OVER
clause, then there must be a named window entry that matches the window_name
in the window definition list of the WINDOW clause.
If the OVER clause has its own non-empty window definition and references a window definition from the WINDOW clause, then it can only add clauses from the referenced window, but not overwrite them.
cr> SELECT
... x,
... LAST_VALUE(x) OVER (w ORDER BY x) AS y
... FROM (VALUES
... (1, 1),
... (2, 1),
... (3, 2),
... (4, 2) ) AS t(x, y)
... WINDOW w AS (PARTITION BY y)
+---+---+
| x | y |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+---+---+
SELECT 4 rows in set (... sec)
Otherwise, an attempt to override the clauses of the referenced window by the window definition of the OVER clause will result in failure.
cr> SELECT
... FIRST_VALUE(x) OVER (w ORDER BY x)
... FROM (VALUES(1), (2), (3), (4)) as t(x)
... WINDOW w AS (ORDER BY x)
SQLParseException[Cannot override ORDER BY clause of window w]
It is not possible to define the PARTITION BY
clause in the window
definition of the OVER clause if it references a window definition
from the WINDOW clause.
The window definitions in the WINDOW clause cannot define its own window frames, if they are referenced by non-empty window definitions of the OVER clauses.
The definition of the named window can itself begin with a window_name
.
In this case all the elements of inter-connected named windows will be copied
to the window definition of the OVER clause if it references the named
window definition that has subsequent window references. The window definitions
in the WINDOW
clause permits only backward references.
cr> SELECT
... x,
... ROW_NUMBER() OVER (w) AS y
... FROM (VALUES
... (1, 1),
... (3, 2),
... (2, 1)) AS t (x, y)
... WINDOW p AS (PARTITION BY y),
... w AS (p ORDER BY x)
+---+---+
| x | y |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
+---+---+
SELECT 3 rows in set (... sec)
General-Purpose Window Functions¶
row_number()
¶
Returns the number of the current row within its window.
Example:
cr> SELECT
... col1,
... ROW_NUMBER() OVER(ORDER BY col1) as row_num
... FROM (VALUES('x'), ('y'), ('z')) AS t;
+------+---------+
| col1 | row_num |
+------+---------+
| x | 1 |
| y | 2 |
| z | 3 |
+------+---------+
SELECT 3 rows in set (... sec)
first_value(arg)
¶
Note
The first_value
window function is an enterprise
feature.
Returns the argument value evaluated at the first row within the window.
Its return type is the type of its argument.
Example:
cr> SELECT
... col1,
... FIRST_VALUE(col1) OVER (ORDER BY col1) AS value
... FROM (VALUES('x'), ('y'), ('y'), ('z')) AS t;
+------+-------+
| col1 | value |
+------+-------+
| x | x |
| y | x |
| y | x |
| z | x |
+------+-------+
SELECT 4 rows in set (... sec)
last_value(arg)
¶
Note
The last_value
window function is an enterprise
feature.
Returns the argument value evaluated at the last row within the window.
Its return type is the type of its argument.
Example:
cr> SELECT
... col1,
... LAST_VALUE(col1) OVER(ORDER BY col1) AS value
... FROM (VALUES('x'), ('y'), ('y'), ('z')) AS t;
+------+-------+
| col1 | value |
+------+-------+
| x | x |
| y | y |
| y | y |
| z | z |
+------+-------+
SELECT 4 rows in set (... sec)
nth_value(arg, number)
¶
Note
The nth_value
window function is an enterprise
feature.
Returns the argument value evaluated at row that is the nth row within the window. Null is returned if the nth row doesn’t exist in the window.
Its return type is the type of its first argument.
Example:
cr> SELECT
... col1,
... NTH_VALUE(col1, 3) OVER(ORDER BY col1) AS val
... FROM (VALUES ('x'), ('y'), ('y'), ('z')) AS t;
+------+------+
| col1 | val |
+------+------+
| x | NULL |
| y | y |
| y | y |
| z | y |
+------+------+
SELECT 4 rows in set (... sec)
lag(arg [, offset [, default] ])
¶
Note
The lag
window function is an enterprise feature.
Synopsis¶
lag(argument any [, offset integer [, default any]])
Returns the argument value evaluated at the row that precedes the current row
by the offset within the partition. If there is no such row, the return value
is default
. If offset
or default
arguments are missing, they
default to 1
and null
, respectively.
Both offset
and default
are evaluated with respect to the current row.
If offset
is 0
, then argument value is evaluated for the current row.
The default
and argument
data types must match.
Example:
cr> SELECT
... dept_id,
... year,
... budget,
... LAG(budget) OVER(
... PARTITION BY dept_id) prev_budget
... FROM (VALUES
... (1, 2017, 45000),
... (1, 2018, 35000),
... (2, 2017, 15000),
... (2, 2018, 65000),
... (2, 2019, 12000))
... as t (dept_id, year, budget);
+---------+------+--------+-------------+
| dept_id | year | budget | prev_budget |
+---------+------+--------+-------------+
| 1 | 2017 | 45000 | NULL |
| 1 | 2018 | 35000 | 45000 |
| 2 | 2017 | 15000 | NULL |
| 2 | 2018 | 65000 | 15000 |
| 2 | 2019 | 12000 | 65000 |
+---------+------+--------+-------------+
SELECT 5 rows in set (... sec)
lead(arg [, offset [, default] ])
¶
Note
The lead
window function is an enterprise feature.
Synopsis¶
lead(argument any [, offset integer [, default any]])
The lead
function is the counterpart of the
lag window function as it allows the evaluation of
the argument at rows that follow the current row. lead
returns the argument
value evaluated at the row that follows the current row by the offset within
the partition. If there is no such row, the return value is default
.
If offset
or default
arguments are missing, they default to 1
and
null
, respectively.
Both offset
and default
are evaluated with respect to the current row.
If offset
is 0
, then argument value is evaluated for the current row.
The default
and argument
data types must match.
Example:
cr> SELECT
... dept_id,
... year,
... budget,
... LEAD(budget) OVER(
... PARTITION BY dept_id) next_budget
... FROM (VALUES
... (1, 2017, 45000),
... (1, 2018, 35000),
... (2, 2017, 15000),
... (2, 2018, 65000),
... (2, 2019, 12000))
... as t (dept_id, year, budget);
+---------+------+--------+-------------+
| dept_id | year | budget | next_budget |
+---------+------+--------+-------------+
| 1 | 2017 | 45000 | 35000 |
| 1 | 2018 | 35000 | NULL |
| 2 | 2017 | 15000 | 65000 |
| 2 | 2018 | 65000 | 12000 |
| 2 | 2019 | 12000 | NULL |
+---------+------+--------+-------------+
SELECT 5 rows in set (... sec)
rank()
¶
Note
The rank
window function is an enterprise feature.
Synopsis¶
rank()
Returns the rank of every row within a partition of a result set.
Within each partition, the rank of the first row is 1
. Subsequent tied
rows are given the same rank, and the potential rank of the next row
is incremented. Because of this, ranks may not be sequential.
Example:
cr> SELECT
... name,
... department_id,
... salary,
... RANK() OVER (ORDER BY salary desc) as salary_rank
... FROM (VALUES
... ('Bobson Dugnutt', 1, 2000),
... ('Todd Bonzalez', 2, 2500),
... ('Jess Brewer', 1, 2500),
... ('Safwan Buchanan', 1, 1900),
... ('Hal Dodd', 1, 2500),
... ('Gillian Hawes', 2, 2000))
... as t (name, department_id, salary);
+-----------------+---------------+--------+-------------+
| name | department_id | salary | salary_rank |
+-----------------+---------------+--------+-------------+
| Todd Bonzalez | 2 | 2500 | 1 |
| Jess Brewer | 1 | 2500 | 1 |
| Hal Dodd | 1 | 2500 | 1 |
| Bobson Dugnutt | 1 | 2000 | 4 |
| Gillian Hawes | 2 | 2000 | 4 |
| Safwan Buchanan | 1 | 1900 | 6 |
+-----------------+---------------+--------+-------------+
SELECT 6 rows in set (... sec)
dense_rank()
¶
Note
The dense_rank
window function is an enterprise feature.
Synopsis¶
dense_rank()
Returns the rank of every row within a partition of a result set, similar to
rank
. However, unlike rank
, dense_rank
always returns sequential
rank values.
Within each partition, the rank of the first row is 1
. Subsequent tied
rows are given the same rank.
Example:
cr> SELECT
... name,
... department_id,
... salary,
... DENSE_RANK() OVER (ORDER BY salary desc) as salary_rank
... FROM (VALUES
... ('Bobson Dugnutt', 1, 2000),
... ('Todd Bonzalez', 2, 2500),
... ('Jess Brewer', 1, 2500),
... ('Safwan Buchanan', 1, 1900),
... ('Hal Dodd', 1, 2500),
... ('Gillian Hawes', 2, 2000))
... as t (name, department_id, salary);
+-----------------+---------------+--------+-------------+
| name | department_id | salary | salary_rank |
+-----------------+---------------+--------+-------------+
| Todd Bonzalez | 2 | 2500 | 1 |
| Jess Brewer | 1 | 2500 | 1 |
| Hal Dodd | 1 | 2500 | 1 |
| Bobson Dugnutt | 1 | 2000 | 2 |
| Gillian Hawes | 2 | 2000 | 2 |
| Safwan Buchanan | 1 | 1900 | 3 |
+-----------------+---------------+--------+-------------+
SELECT 6 rows in set (... sec)