Chapter 32 – Searching with Set Operators

Note

You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.

The book and the standard does not reflect the features of CrateDB, we are just publishing it as a service for the community and for reference purposes.

On the one hand, CrateDB does not implement the SQL-99 standard thoroughly, on the other hand, it extends the standard for implementing cluster features and others.

For more information specific to CrateDB, check out the CrateDB Reference documentation.

SQL supports three set operators: UNION, EXCEPT and INTERSECT. We described their general effects in our discussion of set theory (see our chapter on “General Concepts”). Here’s a quick recap, using two Tables: TABLE_1 contains these three, one-Column rows: {1,2,3} and TABLE_2 contains these three, one- Column rows: {1,3,5}.

The expression:

(Table_1) UNION (Table_2)

yields all non-duplicate rows from both TABLE_1 and TABLE_2, so the result is these four rows: {1,2,3,5}. The expression:

(Table_1) EXCEPT (Table_2)

yields all rows that are in TABLE_1 and are not also in TABLE_2, so the result is this row: {2}. The expression:

(Table_1) INTERSECT (Table_2)

yields all rows that are in TABLE_1 and are also in TABLE_2, so the result is these two rows: {1,3}

The result in all three cases is, of course, another Table: UNION and EXCEPT and INTERSECT take two Tables as input, and produce one result Table as output (that’s why some sources – particularly FIPS – refer to UNION and EXCEPT and INTERSECT as “Table operators” rather than “set operators”) – and so these constructs can also be used as a <Table reference> in an SQL statement.

Table of Contents

<query expression>

In this chapter, we’ll discuss both the implementation problems and the details for the three set operators. You’ll notice that the list of implementation-specific restrictions is quite long, which reflects the difficulty that DBMS vendors have supporting set operators. Before we begin our discussion though, we want to emphasize that each of the set operators can be used to form what the SQL Standard calls a <query expression> – that is, an expression that results in a Table. Here is the full syntax required for a <query expression>:

<query expression> ::=
[ WITH [ RECURSIVE ] <with list> ] <query expression body>

   <with list> ::=
   <with list element> [ {,<with list element> } ... ]

      <with list element> ::=
      <query name> [ (<Column name list>) ] AS (<query expression> )
         [ <search or cycle clause> ]

   <query expression body> ::=
   <non-join query expression> | <joined table>

      <non-join query expression> ::=
      <non-join query term> |
      <query expression> UNION [ ALL | DISTINCT ]
           [ <corresponding spec> ] <query term> |
        <query expression> EXCEPT [ ALL | DISTINCT ]
           [ <corresponding spec> ] <query term>

           <query term> ::=
           <non-join query term> |
           <joined table>

           <non-join query term> ::=
           <non-join query primary> |
           <query term> INTERSECT [ ALL | DISTINCT ]
              [ <corresponding spec> ] <query primary>

              <query primary> ::=
              <non-join query primary> |
              <joined table>

              <non-join query primary> ::=
              <simple table> |
              (<non-join query expression>)

                 <simple table> ::=
                 <query specification> |
                 VALUES (<row value constructor>s) |
                 TABLE <Table name>

           <corresponding spec> ::=
           CORRESPONDING [ BY (<Column name list>) ]

              <Column name list> ::=
              <Column name> [ {,<Column name>} ...]

           <search or cycle clause> ::=
           <search clause> |
           <cycle clause> |
           <search clause> <cycle clause>

           <search clause> ::=
           SEARCH <recursive search order> SET <Column name>

              <recursive search order> ::=
              DEPTH FIRST BY <sort specification list> |
              BREADTH FIRST BY <sort specification list>

           <cycle clause> ::=
           CYCLE <cycle column list>

              SET <Column name> TO <cycle mark value>
              DEFAULT <non-cycle mark value> USING <Column name>

              <cycle column list> ::=
              <Column name> [ {,<Column name>}... ]

              <cycle mark value> ::= <value expression>

              <non-cycle mark value> ::= <value expression>

Although this syntax diagram looks extremely complicated, don;t be too discouraged! Most SQL queries are fairly simple SELECT statements, and those are mostly the types of examples we’ll show you. And, in this chapter, we’re just going to concentrate on the set operators anyway. However, a brief description of some of the terms in the diagram is warranted.

  • UNION and EXCEPT may appear in a <non-join query expression> and INTERSECT may appear in a <non-join query term>. In none of the cases are the input Columns involved allowed to have a BLOB, CLOB or NCLOB <data type>.

  • A <query term> is either a <joined table> (see our chapter on “Searching with Joins”) or a <non-join query term>.

  • A <non-join query term> is either a <non-join query primary> or a <query term> that uses INTERSECT.

  • You use a <query primary> to specify the second Table operand of an INTERSECT operation. A <query primary> is either a <joined table>, a parenthesized <non-join query expression>, a SELECT statement (i.e., SELECTFROMWHERE …), a Table constructor, or TABLE <Table name> (which evaluates to SELECT * FROM <Table name>). If you want to restrict your code to Core SQL, don’t use TABLE <Table name>.

  • A Table constructor is the <keyword> VALUES followed by a comma- delimited, parenthesized list of rows or an expression that evaluates to a Table. Here’s an example of the first case:

    ... VALUES (10,'hello',B'1011'),
               (20,'goodbye',B'1111'),
               (30,'bobby',B'0000') ...
    

    is a Table constructor that constructs a three-Column Table with three rows. (Each value inside the VALUES clause may be any expression that evaluates to a single value.) Here’s another example, this time of the second case:

    ... VALUES (SELECT column_1, column_5 FROM Table_1 WHERE column_1=10) ...
    

    is a Table constructor that constructs a two-Column Table.

If you want to restrict your code to Core SQL, don’t use a Table constructor that involves a Table expression, don’t use a Table constructor anywhere but in an INSERT statement and make all your Table constructors contain exactly one row.

Set Operation Syntax

The required syntax for the simplest form of a set operation is:

<query expression> {UNION | EXCEPT | INTERSECT} <query expression>

Here’s an example of each:

SELECT boss_name, salary FROM Bosses
UNION
SELECT employee_name, salary FROM Employees;

SELECT boss_name, salary FROM Bosses
EXCEPT
SELECT employee_name, salary FROM Employees;

SELECT boss_name, salary FROM Bosses
INTERSECT
SELECT employee_name, salary FROM Employees;

Despite the complicated <query expression> syntax we showed you earlier, assume that <query expression> means “SELECT …” or “VALUES …” or “TABLE <Table name>” — we want deliberately to avoid some complications of the official syntax description. For now, it is enough to say that there are slight restrictions on the type of <query expression> that can be an argument for UNION or EXCEPT and slightly different restrictions for INTERSECT. These restrictions are obscure and have little or no practical effect.

In the previous examples, the first SELECT statement retrieves two Columns (BOSS_NAME, SALARY) and so does the second SELECT statement (EMPLOYEE_NAME, SALARY). This reflects a mandatory rule — each input Table must have the same number of Columns and each corresponding pair of Columns must have comparable <data type>s. This is because each pair of Columns will merge into a single Column of the result Table — that is, BOSS_NAME and EMPLOYEE_NAME will merge into one result Column and so will SALARY and SALARY. Usually, your DBMS decides which Column to merge with which Column by looking at ordinal position; it merges the first Columns from each input Table, then the second Columns, and so on. Merging by ordinal position is the simplest method, but there are alternative ways, which we’ll discuss later. First, though, we’ll add a bit more to our basic set operation syntax.

ALL | DISTINCT

The required syntax for a slightly more complicated form of a set operation is:

<query expression>
{UNION | EXCEPT | INTERSECT}
[ ALL | DISTINCT ]
<query expression>

Although the default for all three set operations is duplicate elimination, you can instruct your DBMS otherwise. The optional <keyword> ALL after UNION, EXCEPT, or INTERSECT means “keep all duplicates,” while the optional <keyword> DISTINCT means “eliminate duplicates.” In SQL-92, the <keyword> DISTINCT was not valid syntax, but it has always been the default — that is, if you do not explicitly say ALL, then the DBMS assumes that you want to eliminate duplicates. Assume that TABLE_1 contains these five, one-Column rows: {0,1,2,2,3} and TABLE_2 contains these five, one-Column rows: {1,2,3,5,5}.

The expressions:

(Table_1) UNION (Table_2)
(Table_1) UNION DISTINCT (Table_2)

both yield all non-duplicate rows from TABLE_1 and TABLE_2: {0,1,2,3,5}. The expression:

(Table_1) UNION ALL (Table_2)

yields all rows from TABLE_1 and TABLE_2: {0,1,2,2,2,3,3,5,5}. The expressions:

(Table_1) EXCEPT (Table_2)
(Table_1) EXCEPT DISTINCT (Table_2)

both yield all non-duplicate rows that are in TABLE_1 and are not also in TABLE_2: {0}. The expression:

(Table_1) EXCEPT ALL (Table_2)

yields all rows that are in TABLE_1 and are not also in TABLE_2: {0,2}. (For EXCEPT ALL, if a row appears x times in the first Table and y times in the second Table, it will appear z times in the result Table (where z is x - y or zero, whichever is greater). The expressions:

(Table_1) INTERSECT (Table_2)
(Table_1) INTERSECT DISTINCT (Table_2)

both yield all non-duplicate rows that are in TABLE_1 and are also in TABLE_2: {1,2,3} The expression:

(Table_1) INTERSECT ALL (Table_2)

yields all rows that are in TABLE_1 and are also in TABLE_2: {1,2,3}. (For INTERSECT ALL, if a row appears x times in the first Table and y times in the second Table, it will appear z times in the result Table (where z is the lesser of x and y).

That is, when you use an expression like “SELECT … <set operator> SELECT …,” the effect is the same as if you’d used “SELECT DISTINCT … <set operator> SELECT DISTINCT …” When you use an expression like “SELECT … <set operator> ALL SELECT …,” then any duplicates are paired off against each other, one by one.

Caution

The [ALL | DISTINCT] option also appears at the beginning of a SELECT expression (i.e., SELECT [ALL|DISTINCT] <select list> FROM <Table-reference> …). But in that case, the default is ALL rather than DISTINCT. By contrast, after a set operator, the default is DISTINCT. There’s an inconsistency here — maybe the original assumption was that when you SELECT you don’t need DISTINCT, but when you UNION you do need DISTINCT.

Probably UNION ALL is a faster operation than UNION DISTINCT, but for theoretical reasons you shouldn’t go out of your way to preserve duplicate rows. Use the ALL option only for those cases where duplicates don’t matter, where duplicates won’t happen anyway or where the result of UNION ALL is only going to be used for summary-information purposes.

For the purposes of set operation, two NULLs are “duplicates” of each other. Now let’s expand our set operation syntax a little more.

CORRESPONDING

The required syntax for the final form of a set operation is:

<query expression>
{UNION | EXCEPT | INTERSECT}
[ ALL | DISTINCT ]
[ CORRESPONDING [ BY (<Column name list>) ] ]
<query expression>

In a well-designed database, when Tables have similar Columns, the Columns have similar names. For example, suppose two Tables, VILLAS and MANSIONS, are defined with these CREATE TABLE statements:

CREATE TABLE Villas (
   county char(20),
   acreage DECIMAL(4,2),
   price DECIMAL(8));

CREATE TABLE Mansions (
   owner char(20),
   acreage DECIMAL(4,2),
   house_rating INT,
   price DECIMAL(8));

Notice that two <Column name>s – ACREAGE and PRICE – are in both Tables, although the Columns they represent don’t occupy the same ordinal positions. Now, if we want a list of properties which are both villas and mansions, we can use this simple intersection:

SELECT acreage, price FROM Villas
INTERSECT
SELECT acreage, price FROM Mansions;

But such a query is lots of work. To formulate it, we must look at the definitions for each Table, figure out which Columns they have in common and then list those Columns, in the right order, in our query. It would be far simpler to be able to say: “select Columns with the same names”. The optional CORRESPONDING clause for a set operation does just that. Here’s an example that is equivalent to the last one:

SELECT * FROM Villas
INTERSECT CORRESPONDING
SELECT * FROM Mansions;

The <keyword> CORRESPONDING, used alone after a set operator, means “instead of going by Column position as in the ‘simple’ format, merge all those Columns which have the same name, regardless of their position”. In this case, the Columns which have the same name in both Tables are ACREAGE and PRICE, so those are the Columns that will appear in the result Table.

Now suppose that we only wanted the ACREAGE Column in our result – we’re not interested in the PRICE at all. Since CORRESPONDING, by itself, automatically merges every pair of Columns with the same name, we can’t use it to form a quick query for the answer we want. Frankly, the easy solution is to return to the simple format and say:

SELECT acreage FROM Mansions
INTERSECT
SELECT acreage FROM Villas;

But let’s pretend that we have some reason to avoid using a SELECT in this situation. There are alternatives, one of which is the expression “TABLE <Table-name>”, so let’s consider this expression:

TABLE Mansions
INTERSECT CORRESPONDING
TABLE Villas;

which gives us two Columns – ACREAGE and PRICE – and now let’s consider this expression:

TABLE Mansions
INTERSECT CORRESPONDING BY (acreage)
TABLE Villas;

which gives us one Column – ACREAGE. It’s a contrived example, but if we assume that the query is not SELECT, then “INTERSECT CORRESPONDING BY ...” is the simplest way to use a set operator and merge specific same-name Columns.

The rules for this operation are:

  1. If CORRESPONDING BY is used with a set operator, at least one <Column name> must be in the comma-delimited and parenthesized BY list and each of the <Column name>s in the list must identify a Column that appears in both input Tables. The number of Columns in the result Table will equal the number of Columns in the BY list.

  2. If CORRESPONDING is used with a set operator, you may not specify a Column list. The number of Columns in the result Table will equal the number of Columns that both Tables have in common.

  3. If you omit the CORRESPONDING clause entirely, the number of Columns in the result Table will equal the number of Columns that you explicitly specify in your query.

To sum it up, there are three ways to pick which Columns to merge with a set operation: (a) merge “all” Columns by ordinal position (the simple format), (b) merge “all Columns with the same name” (the CORRESPONDING format) and (c) merge “specified Columns with the same name (the CORRESPONDING BY format).

Result Names and ORDER BY

When you’re doing a set operation, if the <Column name>s of two merged Columns are the same, then the merged result Column has that name too. If the <Column name>s of two merged Columns are not the same, then, by default, the merged Column is given some unique temporary name by your DBMS. Why would anyone care what the name of a merged Column is? Well, if you want the rows in the result Table to come out in some specific order, you’ll have to use an ORDER BY clause — which can only operate on the result Table and that means the names of the result Table’s Columns must be known. An ORDER BY clause may optionally appear after a <query expression>; here is the required syntax:

<query expression> [ ORDER BY <sort specification list> ]

   <sort specification list> ::=
   <sort specification> [ {,<sort specification> }. . . ]

      <sort specification> ::=
      <sort key> [ COLLATE <Collation name> ] [ {ASC | DESC} ]

         <sort key> ::= scalar_expression

The ORDER BY clause provides your DBMS with a list of items to sort and the order in which to sort them; either ascending order (ASC, the default) or descending order (DESC). It must follow a <query expression> — this means it operates on the final Table that results from the evaluation of the query; it cannot operate on any interim result Tables at all. The <sort key> you specify can be any expression that evaluates to a single value that identifies a Column of the final result Table — this is almost always a <Column name> and thus our concern to know what name will result from a set operation.

Suppose, then, you want to specify a sort order for the result of our earlier examples:

SELECT boss_name, salary FROM Bosses
UNION
SELECT employee_name, salary FROM Employees;

In this case, it isn’t possible to add an ORDER BY clause that specifies “ORDER BY boss_name” or “ORDER BY employee_name” at the end of the query – these names identify the original Columns, not the merged result Column. The name of the merged result Column is defined by the Standard to be implementation-dependent: that is, it will be a name provided by your DBMS, but what name will be provided doesn’t even have to be documented; it’s considered to be an internal thing.

In SQL-92, you can solve this dilemma with an ORDER BY clause that specifies “ORDER BY 1” – this instructs the DBMS to sort by the first result Column, no name required. This, however, is not legal in SQL3, so you’ll have to force your own names for the result Columns by providing an AS clause in each select list for corresponding Columns and then using that name in the ORDER BY clause. Here’s an example:

SELECT boss_name AS sort_name, salary FROM Bosses
UNION
SELECT employee_name AS sort_name, salary FROM Employees
ORDER BY sort_name;

With this syntax, you’re forcing the corresponding Columns in each input Table to have the same name. This name is thus the name given to the merged result Column and can therefore be used in an ORDER by clause to identify that result Column.

[Obscure Rule] The Collation of a merged Column with a character string <data type> will depend on the coercibility characteristics of the two input Columns, and whether you add the optional COLLATE specification or not. If you want to restrict your code to Core SQL, don’t add the COLLATE clause.

Result <data type>s and Compatibility

Consider this SQL statement:

``SELECT 'A', 5.0 FROM Table_1
UNION
SELECT 'BB', 12 FROM Table_2;``

The two queries that make up the UNION operator’s Table operands are union compatible: each corresponding pair of Columns is comparable (that is, we can compare 'A' with 'BB' and we can compare 5.0 with 12). The rules about the resultant merged Columns are:

  1. The first result Column has a CHAR(2) <data type> (see Rules of Aggregation for the CASE expression, in our chapter on “Simple Search Conditions”), belongs to the same Character set that 'A' does (the result always belongs to the Character set of the first input Column), and has COERCIBLE coercibility.

  2. The second result Column has a DECIMAL or NUMERIC <data type>, with implementor-defined precision and a scale of 2 (again, see “Rules of Aggregation” for the CASE expression).

Now consider this SQL statement:

SELECT 'A', 5.0 FROM Table_1
UNION
SELECT 12, 'BB' FROM Table_;

This statement will fail: it’s an error to try to merge Columns which are not union compatible.

Set Operation Examples

Earlier in this chapter, we defined two example Tables: VILLAS and MANSIONS. Let’s suppose that they have these contents:

VILLAS

COUNTY

ACREAGE

PRICE

Lacombe

39.00

100000

Stettler

15.78

900000

Roseland

15.77

200000

Victoria

17.90

NULL

Victoria

17.90

NULL

MANSIONS

OWNER

ACREAGE

HOUSE_RATING

PRICE

Bodnar

15.77

NULL

200000

Melnyk

39.00

15

900000

Skoreyko

39.00

0

900000

Mudriy

NULL

NULL

NULL

Based on this data, here are some example of SQL statements that use set operators, and their results.

Example Statement:

   -- "self-union"
SELECT * FROM Villas
UNION DISTINCT
SELECT * FROM Villas
ORDER BY acreage DESC;

COUNTY

ACREAGE

PRICE

Lacombe

39.00

100000

Victoria

17.90

NULL

Stettler

15.78

900000

Roseland

15.77

200000

Example Statement:

   -- "NULL is the same as NULL"
SELECT price FROM Villas
EXCEPT ALL
SELECT price FROM Mansions;

Result Table:

PRICE

100000

NULL

Example Statement:

   -- "NULL is the same as NULL"
SELECT DISTINCT price FROM Villas
EXCEPT ALL
SELECT DISTINCT price FROM Mansions;

Result table:

PRICE

100000

NULL

Example Statement:

SELECT * FROM Villas
INTERSECT CORRESPONDING
SELECT * FROM Mansions;

Result table:

ACREAGE

PRICE

15.77

200000

Example Statement:

SELECT acreage, 'Villa' AS type
FROM Villas
UNION
SELECT acreage, 'Mansion' AS type
FROM Mansions;

Result table:

ACREAGE

TYPE

39.00

'Villa'

15.78

'Villa'

15.77

'Villa'

17.90

'Villa'

15.77

'Mansion'

39.00

'Mansion'

NULL

'Mansion'

Updatability

Like joined Tables, Tables merged with a set operator aren’t updatable in SQL-92. Usually, this means that a View which is based on a query containing a set operator can’t be the object of a DELETE, INSERT or UPDATE statement.

But such Views might be updatable in SQL3. For example, if the set operator is UNION ALL, and the original Tables are both updatable, then so is the result Table. This feature causes some complications, since the DBMS has to track the original Tables. Most DBMSs support set operations by creating new temporary Tables, therefore a data-change operation would only affect the result Table and not the original Table.

Consider a situation where you want to change the rows that result from a UNION of two Tables. There are three possible situations:

  1. You want to delete a row. In this case, if the row you want to DELETE was derived from the first input Table, then the DELETE operation will remove that row from the first input Table. If the row you want to DELETE was derived from the second input Table, then the DELETE operation will remove that row from the second input Table.

  2. You want to update a row. Once again, if the row you want to UPDATE was derived from the first input Table, then the UPDATE operation will change that row in the first input Table. If the row you want to UPDATE was derived from the second input Table, then the UPDATE operation will change that row in the second input Table.

  3. You want to insert a row. This isn’t allowed, because your DBMS wouldn’t know which underlying Table the new row should be put into.

Now consider a situation where you want to change the rows that result from an EXCEPT operation of two Tables. There are two possible situations:

  1. You want to update a row. Since every row of the result is derived from a row of the first input Table, the UPDATE operation will change that row in the first input Table.

  2. You want to delete a row or insert a row. Neither of these are allowed. In the first case, if you delete a row and that row was derived from a row that is duplicated in the first input Table, the row you “deleted” would still be in the result, causing confusion. In the second case, if you insert a row that also happens to be duplicated in the second input Table, the row you “inserted” wouldn’t be part of the result, causing confusion.

Finally, consider a situation where you want to change the rows that result from an INTERSECT operation of two Tables. There are three possible situations:

  1. You want to insert a row. If the first input Table doesn’t contain a row with the same values, that row is inserted into the first input Table and if the second input Table doesn’t contain a row with the same values, that row is inserted into the second input Table. However, if the first input Table already contains a row with the same values as the row you want to insert – but the result Table doesn’t have a row derived from that row of the first input Table, then the new row is inserted into the first input Table. The same thing holds for the second input Table too: if the second input Table already contains a row with the same values as the row you want to insert – but the result Table doesn’t have a row derived from that row of the second input Table, then the new row is inserted into the second input Table.

  2. You want to update a row. If the row you want to UPDATE was derived from the first input Table, then the UPDATE operation will change that row in the first input Table. If the row you want to UPDATE was derived from the second input Table, then the UPDATE operation will change that row in the second input Table.

  3. You want to delete a row. This isn’t allowed, because your DBMS wouldn’t know which underlying Table the new row should be put into.

Recursive Unions

[Obscure Rule] applies for this entire section.

Suppose you make widgets, which are made of doohickeys and framistats, and framistats in turn are made of a certain number of screws, nails and so on. You get a request: list all parts – that is, list all widgets, all doohickeys, all framistats, all screws and all nails. This is a “bill of materials” problem. It’s not a burning issue, but it’s hard to solve with SQL-92, where the only way you can generate a list of parts is to use several queries to scan the PARTS Table – whenever your DBMS finds a part that’s made up of other parts, it has to start another query to determine its components, and so on. SQL3 supports a set operation – the RECURSIVE UNION – to solve this problem: it follows a trail of rows for you. Let’s go back to the full syntax required for a <query expression> that we showed you at the beginning of this chapter. The parts we haven’t talked about yet are used to form a RECURSIVE UNION. Here’s the relevant syntax:

<query expression> ::=
[ WITH [RECURSIVE] <with list> ] <query expression body>

   <with list> ::=
   <with list element> [ {,<with list element> } . . . ]

      <with list element> ::=
      <query name> [ (<Column name list>) ] AS (<query expression>)
         [ <search or cycle clause> ]

      <query expression body> ::=
      <non-join query expression> | <joined table>

      <search or cycle clause> ::=
      <search clause> | <cycle clause> | <search clause> <cycle clause>

         <search clause> ::=
         SEARCH <recursive search order> SET <Column name>

            <recursive search order> ::=
            DEPTH FIRST BY <sort specification list> |
            BREADTH FIRST BY <sort specification list>

         <cycle clause> ::=
         CYCLE <cycle column list>
           SET <Column name> TO <cycle mark value>
           DEFAULT <non-cycle mark value> USING <Column name>

            <cycle column list> ::=
            <Column name> [ {,<Column name>} . . . ]

            <cycle mark value> ::= <value expression>

            <non-cycle mark value> ::= <value expression>

The optional WITH clause for a <query expression> is made up of one or more elements, each of which is a named query that optionally includes names for the Columns that result from that query. (If you include the optional <Column name> list, you must provide a name for every result Column.) The definition of a WITH element may also include the optional <search or cycle clause>.

A WITH clause preceding a <query expression> provides a list of related element queries for a search. WITH by itself identifies a non-recursive set of element queries – this means that an element in the list may not contain the <query name> of an element that follows it in the list. WITH RECURSIVE, on the other hand, identifies a set of element queries that are potentially recursive – so an element can contain the <query name> of a following element.

If a WITH clause’s elements cycle at least once – that is, if element one uses element two and element two uses element three and element three cycles back to use element one – and if at least one element is a <non-join query expression> formed with UNION, then the WITH clause is RECURSIVE: a potentially recursive WITH clause’s element queries each depend on the query which follows them in the list – that is, the first element in the list is a query that contains the <query name> of the second element in the list (and so depends on that query), the second element is a query that contains the <query name> of the third element, and so on and a WITH RECURSIVE clause’s element queries each depend on another element in a way that causes a least one cycle of each element query to be executed. A linearly recursive WITH clause is a clause where each element query has at most one reference to another query in the list.

The optional <Column name> subclause within the WITH clause allows you to specify names for the Columns returned by the query. (Note that the element query may not use an expression which would result in a result Column with a NO COLLATION coercibility attribute.) If any two Columns returned by a WITH element query have the same name, or if the WITH query is potentially recursive, you must use the <Column name> subclause to uniquely name each Column of the result Table.

For each element query in a WITH clause, the element is a recursive query if it depends on another element query. (Such queries may not include set operators unless the second operand includes the name of the element query this element depends on, nor may they invoke routines that affect the element query this element depends on, nor may they include table subqueries that affect the element query this element depends on, nor may they include OUTER JOIN specifications that affect a <Table reference> that names the element query this element depends on, nor may they include a <query specification> that names the element query this element depends on.) Each element query accumulates rows for the final result. An element’s <query name> is the name used in an iteration of the entire <query expression> to identify the parents of any row being accumulated into the result. If each accumulated row has only one parent (that is, it results from a query that uses only one other query in the WITH list), the query is a linear RECURSIVE UNION. If accumulated rows have more than one parent (that is, they result from a query that uses more than one other query in the WITH list), the query is a nonlinear RECURSIVE UNION.

An expandable element query is a linearly recursive element query whose query contains UNION or UNION ALL. If a WITH clause element query is not expandable, the element definition may not include a <search or cycle clause>.

A WITH element query defined with a <search clause> can specify a search order of either “SEARCH DEPTH FIRST BY” or “SEARCH BREADTH FIRST BY” followed, in both cases, by a <sort key> and either ASC or DESC (the syntax for the <sort specification list> is the same as the syntax we showed you earlier for the ORDER BY clause’s <sort specification list>). SEARCH DEPTH FIRST means “travel down the tree structure first” and SEARCH BREADTH FIRST means “travel across the tree structure first”. The Column named in the SET subclause that follows must be a Column with an INTEGER <data type>; your DBMS will set that Column to a value that indicates when a row accumulated into the result was found.

A WITH element query defined with a <cycle clause> helps your DBMS determine whether a row that has already been accumulated is found a second time. This rarely happens in real life and so it probably means an error when it happens for a query. By adding a <cycle clause> to a WITH element definition, you can avoid getting into infinite loops by setting a limit on how long you want the search to go on. The <cycle column list> names the Columns affected by the <cycle clause>. The SET <column name> and the USING <column name> may not identify any of the Columns returned by the query, nor may they be the same Column. The <cycle mark value> and the <non-cycle mark value> must both be <character string literal>s that are one character long. They may not be the same value.

Dialects

Here are just some of the many restrictions that some DBMSs had placed on set operations in the past:

  • IBM DB2: the <data type>s of corresponding Columns must be exactly the same and be defined with exactly the same size.

  • IBM SQL/DS: neither of the input <query expression>s may contain DISTINCT.

  • Various: neither the input <query expression>s nor the result Table may be used with set functions, with GROUP BY or with HAVING.

  • dBASE IV, SQL Server: all set operators are unsupported, period.

Those restrictions are history now. The ones which you are much more likely to encounter are the ones that are allowed for Entry-Level SQL-92:

  • No derived Column list is allowed, which effectively means that you can’t use <literal>s or expressions in queries that use set operators.

  • Set operators are illegal inside subqueries, in INSERT ... SELECT and in View definitions.

  • EXCEPT, INTERSECT and CORRESPONDING are illegal.

Even if your DBMS supports SQL3, it might only be Core SQL. If you want to restrict your code to Core SQL, don’t use WITH [RECURSIVE], the INTERSECT set operator, a CORRESPONDING clause with any set operator or any set operator with an explicit DISTINCT.