Chapter 10 – Collection types

Note

You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.

The book and the standard does not reflect the features of CrateDB, we are just publishing it as a service for the community and for reference purposes.

On the one hand, CrateDB does not implement the SQL-99 standard thoroughly, on the other hand, it extends the standard for implementing cluster features and others.

For more information specific to CrateDB, check out the CrateDB Reference documentation.

[Obscure Rule] applies for this entire chapter.

In SQL, a <collection type> is an array: a composite constructed SQL <data type>.

Table of Contents

Collection <data type>s

A <collection type> is defined by a descriptor that contains three pieces of information:

  1. The <data type>’s name: ARRAY.

  2. The maximum number of elements in the array.

  3. The array’s element <data type> specification.

ARRAY

The required syntax for a <collection type> specification is as follows.

<collection type> ::=
<data type> <array specification>

   <array specification> ::=
   ARRAY[unsigned integer] | ARRAY??(unsigned integer??)

A <collection type> specification defines an array – it consists of an element <data type> specification followed by an indication of the array’s maximum element size. For example, both of these <collection type> specifications define an array of up to ten numbers, all of which must fall into the range supported by SQL’s SMALLINT <data type>:

SMALLINT ARRAY[10]
SMALLINT ARRAY??(10??)

A <collection type> may not specify, either directly or indirectly (for example, via a Domain), a <data type> of REF or ARRAY. All other SQL <data type>s are supported.

It is important to note that the square brackets in the <collection type>’s syntax diagrams should not be read as if they are BNF brackets – that is, they don’t mean that ARRAY can optionally be followed by an integer. Instead, you use either square brackets or trigraphs to delimit the integer that specifies the size of the array. We’ll use square brackets in the rest of our examples.

An array is an ordered collection of elements. There are some similarities between SQL ARRAYs and C or Pascal arrays, but there are also some significant differences: (a) SQL ARRAYs are only “vectors”, so an array of an array is not supported and (b) SQL ARRAYs are variable size: the specification defines an array’s maximum size, rather than its exact size. In earlier SQL versions there was no support for ARRAYs. In part, this lack of support was due to a well-known principle of database design, the rule of first normal form: “A field (a column/row intersection) may have only one atomic value.” If rows contain arrays, they violate this principle, so a well-designed database doesn’t define arrays all over the place.

[Obscure Rule] A <collection type> is a subtype of a <data type> if (a) both are arrays and (b) the element <data type> of the first array is a subtype of the element <data type> of the second array.

ARRAY <element reference>

An <element reference> returns an element of an array. The required syntax for an <element reference> is as follows.

<element reference> ::=
array_argument[numeric_argument] |
array_argument??(numeric_argument??)

An <element reference> allows you to access a specific element of an array. It operates on two arguments: the first must evaluate to an array and the second must evaluate to an integer. (The integer refers to the ordinal position of the element in the array: the first element in an array is element number 1, the second element is element number 2 and so on.) If either portion of an element reference is NULL, that element is also NULL. Here are two equivalent examples of an <element reference>:

array_column[4]
array_column??(4??)

In each case, the reference would return the fourth element of the array.

If numeric_argument is a negative number, or if it is greater than the number of elements in the array, the <element reference> will fail: your DBMS will return the SQLSTATE error 2202E "data exception-array element error".

<array value constructor>

An <array value constructor> is used to construct an array. The required syntax for an <array value constructor> is as follows.

<array value constructor> ::=
ARRAY[element_expression [ {,element_expression}... ]] |
ARRAY??(element_expression [ {,element_expression}... ]??)

An <array value constructor> allows you to assign values to the elements of an array. An element_expression may be any expression that evaluates to a scalar value with a <data type> that is assignable to the array’s element <data type>. The result is an array whose n-th element value is the value of the n-th element_expression you specify. Here are two equivalent examples of an <array value constructor>:

ARRAY['hello','bob','and','sally']
ARRAY??('hello','bob','and','sally'??)

These examples both construct an array with four elements. The first element has a value of 'hello', the second has a value of 'bob', the third has a value of 'and' and the fourth has a value of 'sally'. Both <array value constructor>s would be valid for this <collection type> specification:

VARCHAR(5) ARRAY[4]

An <array value constructor> serves the same purpose for an array as a <literal> does for a predefined <data type>. It has the same format as the <collection type>’s ARRAY specification – that is, ARRAY[ ] or ARRAY??( ??) - - but instead of a number inside the size delimiters, it contains comma- delimited values of the correct <data type> (these are called array elements in SQL, though in other languages, an instance of an array is called an “occurrence”). For example, if your <collection type> specification is:

INT ARRAY[3]

a valid <array value constructor> would be:

ARRAY[20,10,52]

And if your <collection type> specification is:

BIT(4) ARRAY[3]

a valid <array value constructor> would be:

ARRAY[B'1011',B'0011',B'0110']

The number of elements in an <array value constructor> may vary from zero elements (for example ARRAY[], an empty specification) to any number of elements up to the array’s maximum size.

If you want to restrict your code to Core SQL, don’t define any ARRAY <data type>s and don’t use any of SQL’s array syntax.

Collection Operations

An array is compatible with, and comparable to, any array with a compatible element <data type> – that is, arrays are mutually comparable and mutually assignable only if their element <data type>s are mutually comparable and mutually assignable. Arrays may not be directly compared with, or directly assigned to, any other <data type> class, though implicit type conversions can occur in expressions, SELECTs, INSERTs, DELETEs and UPDATEs. Explicit array type conversions can be forced with the CAST operator.

CAST

In SQL, CAST is a scalar operator that converts a given scalar value to a given scalar <data type>. The required syntax for the CAST operator is:

CAST (<cast operand> AS <cast target>)
   <cast operand> ::= scalar_expression
   <cast target> ::= <Domain name> | <data type>

The CAST operator converts values of a source <data type> into values of a target <data type>, where each <data type> is an SQL pre-defined <data type> (data conversions between UDTs are done with a user-defined cast). The source <data type>, or <cast operand>, can be any expression that evaluates to a single value. The target <data type>, or <cast target>, is either an SQL predefined <data type> specification or the name of a Domain whose defined <data type> is the SQL predefined <data type> that you want to convert the value of “scalar_expression” into. (If you use CAST (... AS <Domain name>), your current <AuthorizationID> must have the USAGE Privilege on that Domain.)

It isn’t, of course, possible to convert the values of every <data type> into the values of every other <data type>. For <collection type>s, the rules are:

  • CAST (NULL AS <data type>) and CAST (array_source_is_a_null_value AS <data type>) both result in NULL.

  • You can CAST a <collection type> source only to a <collection type> target whose element <data type> is an appropriate CAST target for the source’s element <data type>.

When you CAST an array to an array target, you’re actually asking your DBMS to CAST the value of the source element <data type> to the target element <data type>, so the CAST implied therein must be a legal CAST – see our descriptions of CAST for each predefined SQL <data type>. Your source may be an empty array – that is, <cast operand> may be ARRAY[] or ARRAY??(??). In this case, the result of the CAST is an empty array whose element <data type> is the target element <data type>. Otherwise, the result of the CAST is an array with the same number of elements as the source and the target element <data type>.

Assignment

In SQL, when an array is assigned to an array target, the assignment is done one element at a time – that is, the source value’s first element is assigned to the target’s first element, the source’s next element is assigned to the target’s next element, and so on. Two arrays are assignable if their element <data type>s are mutually assignable.

When an array is taken from SQL-data to be assigned to an array target, if the number of elements in the source array equals the maximum number of elements in the target array, assignment is straightforward: the value of each element of the source is assigned to the corresponding element of the target. If the maximum number of elements in the target array is less than the number of elements in the source array, then assignment of as many of the source element values to the target elements as is possible occurs and your DBMS will return the SQLSTATE warning 0102F "warning-array data, right truncation". If the maximum number of elements in the target array is greater than the number of elements in the source array, then assignment of each of the source element values to the target elements occurs and the size of the target for that row becomes the number of elements assigned. (Note that this doesn’t mean that the maximum size of the target for other assignments lessens.)

[Obscure Rule] Since only SQL accepts null values, if your source is NULL, then your target’s value is not changed. Instead, your DBMS will set its indicator parameter to -1, to indicate that an assignment of the null value was attempted. If your target doesn’t have an indicator parameter, the assignment will fail: your DBMS will return the SQLSTATE error 22002 "data exception-null value, no indicator parameter". We’ll talk more about indicator parameters in our chapters on SQL binding styles.

When an array is assigned to a SQL-data array target, if the number of elements in the source array equals the maximum number of elements in the target array, assignment is straightforward: the value of each element of the source is assigned to the corresponding element of the target. If the maximum number of elements in the target array is less than the number of elements in the source array, but the extra source elements are all NULL, then the value of each non-null element of the source is assigned to the corresponding element of the target. If the maximum number of elements in the target array is less than the number of elements in the source array and the extra source elements are not all NULL, the assignment will fail: your DBMS will return the SQLSTATE error 2202F "data exception-array data, right truncation". If the maximum number of elements in the target array is greater than the number of elements in the source array, then assignment of each of the source element values to the target elements occurs and the size of the target for that row becomes the number of elements assigned. (Note, once again, that this doesn’t mean that the maximum size of the target for other assignments lessens.)

[Obscure Rule] There are two ways to assign a null value to an SQL-data target. Within SQL, you can use the <keyword> NULL in an INSERT or an UPDATE statement to indicate that the target should be set to NULL; that is, if your source is NULL, your DBMS will set your target to NULL. Outside of SQL, if your source has an indicator parameter that is set to -1, your DBMS will set your target to NULL (regardless of the value of the source). An indicator parameter with a value less than -1 will cause an error: your DBMS will return the SQLSTATE error 22010 "data exception-invalid indicator parameter value".

Comparison

SQL provides only two scalar comparison operators – = and <> – to perform operations on arrays. Both will be familiar; there are equivalent operators in other computer languages. Two arrays are comparable if their element <data type>s are mutually comparable. During comparison, the elements of the comparands are compared pairwise in element order – that is, the first element of the first array is compared to the first element of the second array, the second element of the first array is compared to the second element of the second array, and so on. The two arrays are equal if (a) they both have the same number of elements and (b) each pair of elements is equal. The arrays are not equal if (a) they do not have the same number of elements or (b) at least one pair of elements is not equal. If either comparand is NULL, or if at least one element is NULL and all non-null element pairs are equal, the result of the operation is UNKNOWN. For example:

ARRAY[1] <> ARRAY[2]

returns TRUE.

array_column = {array result is NULL}

returns UNKNOWN.

But:

ARRAY [1,NULL] = ARRAY[1]

returns FALSE because the number of elements is not equal.

Other Operations

With SQL, you have several other operations that you can perform on <collection type>s.

Scalar functions

SQL provides two scalar functions that return a <collection type>: the <array concatenation function> and the <cardinality expression>.

<array concatenation function>

The required syntax for an <array concatenation function> is as follows.

<array concatenation function> ::=
CONCATENATE(array_argument_1 { , | WITH } array_argument_2)

CONCATENATE operates on two operands, both of which must evaluate to an array. It joins the arrays together in the order given and returns an array value that consists of every element of array_argument_1 followed by every element of array_argument_2. If either operand is NULL, CONCATENATE returns NULL. For this function, you can use either a comma or the <keyword> WITH to separate your operands. Here are two equivalent examples of array concatenations:

CONCATENATE(array_column,ARRAY['element_1','element_2'])
CONCATENATE(array_column WITH ARRAY['element_1','element_2'])

[Obscure Rule] There are various details about the precise characteristics of the elements in the result array; if you’re interested, see “Rules of Aggregation”, in our description of the CASE expression in our chapter on simple search conditions.

If you want to restrict your code to Core SQL, don’t use CONCATENATE.

<cardinality expression>

The required syntax for a <cardinality expression> is as follows.

<cardinality expression> ::=
CARDINALITY (array_argument)

CARDINALITY operates on an argument that evaluates to an array. It counts the number of elements in the array and returns this as an exact numeric integer. If the argument is NULL, CARDINALITY returns NULL. Here is an example:

CARDINALITY(ARRAY[10,20,30,40])
--returns 4

[NON-PORTABLE] The precision of CARDINALITY’s result is non-standard because the SQL Standard requires implementors to define the result’s precision. [OCELOT Implementation] The OCELOT DBMS that comes with this book gives the result of CARDINALITY an INTEGER <data type>.

If you want to restrict your code to Core SQL, don’t use CARDINALITY.

Set Functions

SQL provides three set functions that operate on a <collection type>: COUNT and GROUPING. Since none of these operate exclusively with array arguments, we won’t discuss them here; look for them in our chapter on set functions.

Predicates

In addition to the comparison operators, SQL provides four other predicates that operate on arrays: the <null predicate>, the <exists predicate>, the <quantified predicate> and the <distinct predicate>. Each will return a boolean value: either TRUE, FALSE or UNKNOWN. None of these operate strictly on arrays, so we won’t discuss them here. Look for them in our chapters on search conditions.

Comprehensive Example

Now that we’ve described SQL’s <collection type>, let’s look at some example SQL statements that put it to use.

For our example, we’ve chosen a street address: a plausible choice for an array because its subdivisions (“lines”) are not meaningful – e.g.: we don’t require that the fourth line of a street address must contain a city name. Here is an SQL statement that creates a Table with three Columns, the third of which is an array:

CREATE TABLE Mailouts (
   given_name CHAR(5),
   surname CHAR(5),
   street_address CHAR(20) ARRAY[5]);

In this Table definition, ARRAY[5] indicates that the street_address Column is an array which occurs up to 5 times.

This SQL statement adds a row to the Mailouts Table:

INSERT INTO TABLE Mailouts (
  given_name,
  surname,
  street_address)
VALUES (
  'Jean',                               -- given_name
  'Boyer',                              -- surname
  ARRAY['line#1','line#2','line#3']);   -- street_address

In this INSERT statement, ARRAY['line#1','line#2','line#3'] is an <array value constructor> that specifies the values for the first three elements of the street_address Column array. The two possible remaining elements are not provided with values, so they aren’t constructed for this row.

An element of an ARRAY can be updated using an <element reference>. For example, this SQL statement would change the rows in the Mailouts Table by updating the second element of every street_address value:

UPDATE Mailouts SET
  street_address[2] = 'line#2 after update';

This example uses a <character string literal> to change the value of street_address’s second element. (Remember that the <data type> of the array – and therefore of each of the array’s elements – is CHAR(20), so any assignment of a compatible character string value would be accepted.) The result is that the second array element contains the string 'line#2 after update' and the other elements are unchanged. We could have achieved the same result by assigning an <array value constructor> to the street_address Column as a whole, as in this example:

UPDATE Mailouts SET
   street_address = ARRAY['line#1','line #2 after update','line#3'];

Note

If you assign a 2-element <array value constructor> to a 3-element array, the result is a 2-element array – not a 3-element array with the final element unchanged. That is, if the above example was:

UPDATE Mailouts SET
   street_address = ARRAY['line#1','line #2 after update'];

the result in the STREET_ADDRESS Column would be 'line#1','line #2 after update' rather than 'line#1','line #2 after update','line#3'.

Tip

To avoid “array element error”, ensure all occurrences of an array are fixed size – pad with NULLs if necessary.

Both of the above examples updated the value of an existing array element. It’s also possible to place a value into an element that hasn’t been constructed for the row yet. For example, this SQL statement:

UPDATE Mailouts SET
   street_address[5] = 'line#5';

has a two-fold effect: as the UPDATE explicitly requires, it creates element number five for the row, placing the string 'line#5' into it, and it creates element number four for the row, placing the null value into it (since the UPDATE statement doesn’t provide an explicit value for the fourth element and since the fifth element can’t exist unless the first four also exist).

Here is an example of a query on the Mailouts Table:

SELECT given_name, surname, street_address
FROM   Mailouts
WHERE  street_address <> ARRAY[];

In this SELECT statement, the street_address Column is being referred to as a whole in both the <select list> and the WHERE clause. We could also have used an <element reference> (e.g.: street_address[3]) in either clause, to refer to a specific element of the Column (this is true in most situations). Our sample query searches for all rows of the Mailouts Table where street_address is not an empty array; TRUE for the row we inserted earlier.

This SQL statement uses CONCATENATE to query the Mailouts Table:

SELECT CONCATENATE(street_address WITH ARRAY['line#4','line#5'])
FROM   Mailouts;

In this example, we’ve used CONCATENATE to join the elements of two arrays: one array-Column reference and one <array value constructor>. The resulting is an array containing five elements: three from the street_address Column and two from the <array value constructor> (in that order). Each element of the result is a CHAR string.

Illegal Operations

SQL places a number of restrictions on where you may validly use arrays. These are worth pointing out, since most of the language is orthogonal in this respect.

  • You may not use arrays in a JOIN Column list, as a grouping Column or in a Constraint definition (for UNIQUE or PRIMARY KEY or FOREIGN KEY Constraints – Constraints are illegal in most predicates and can’t contain REFs or ARRAYs.

  • Be very careful with <element reference>s: the fact that an array is defined as ARRAY[5] does not guarantee that array element [4] exists. Above all, it is a mistake to assume that an <element reference> is allowed wherever a <Column reference> is allowed – SQL’s rules aren’t there to help you break the rules of database design.