A subquery is a parenthesized query enclosed within some outer SQL
statement. Most queries are `SELECT`

s, so this means that a subquery usually
takes the form (`SELECT ...`

), nested somewhere inside an expression.
Queries return result sets, or Tables, and the values in such Tables can be
used when the syntax of the outer expression calls for a value of the
appropriate <data type>.

Subqueries make an SQL statement look structured, and indeed it was the
presence of subqueries that gave the original SQL its distinctive look (the
letters SQL used to stand for “Structured Query Language”). Nowadays subqueries
are less essential because there other ways (particularly joins and
`UNION/EXCEPT/INTERSECT`

operators) to reach the same ends. Nevertheless,
they are important because they provide these benefits:

- SQL statements with subqueries are readable. Most people, especially if they are familiar with the role of subordinate clauses in English, can figure that a subquery-containing SQL statement can be read “from the inside out” – that is, they can focus on the subquery’s workings first, and then on the separate analysis of the outer statement. Statements with joins, by contrast, must be read all at once.
- Certain types of problems can be stated more concisely, and more efficiently, with subqueries.

Table of Contents

A subquery is used to specify either a value (the scalar subquery, which returns one value), a row (the row subquery, which returns one row), or a Table (the Table subquery, which returns a result Table). The required syntax for a subquery is:

```
<subquery> ::=
( <query experession> )
```

That is, a subquery is a parenthesized <query experession> – and the <query
expression> is usually a `SELECT`

statement. Here’s an example:

```
SELECT /* outer statement begins */
Table_1.column_1,
Table_1.column_2
FROM Table_1 WHERE Table_1.column_1 =
(SELECT * /* subquery begins */
FROM Table_2
WHERE Table_2.column_1 = 5);
```

There are some restrictions regarding where subqueries can be used and what
they may contain. In early SQL days, the restrictions were quite strict. For
instance, the subquery had to be on the right side of a comparison operator
within a `WHERE`

clause, as in the above example. Nowadays, the restrictions
for an SQL subquery, in a fully-conformant DBMS environment, are mild:

There can be no

`ORDER BY`

clause inside the subquery.The subquery’s select list may not include any reference to a value that evaluates to a

`BLOB`

,`CLOB`

,`NCLOB`

or`ARRAY`

.A subquery may not be immediately enclosed within a set function, for example:

... AVG((SELECT column_1 FROM Table_1)) ...

is not legal syntax.

It is legal to nest subqueries within subqueries. The maximum level of nesting depends on your implementation, since the Standard doesn’t specify the number of levels of nesting that must be supported.

We said earlier that there are three types of subquery. They are distinguished from each other, not by their form – the general form of a subquery is always just “(<query expression>)” – but by the shape of their result: how many Columns and rows do they return?

If a subquery can return exactly one Column and one row, it is a

*scalar subquery*. The subquery:... (SELECT MAX(Table_1.column_1) FROM Table_1) ...

would fit the bill, but usually it’s not so easy to simply glance at a query and realize that it will return only one Column and one row.

If a subquery can return more than one Column, but still exactly one row, it is a row subquery. A

*row subquery*is just a generalized derivation of a scalar subquery, used in some comparisons. Row subqueries are the least-frequently-seen type of subquery.If a subquery can return more than one Column and more than one row, it is a Table subquery. A

*Table subquery*is another type of <Table reference> and can be used in an SQL statement wherever a <Table name> can be used – which isn’t very often! However, there are some special search operators which are specifically designed to work with Table subqueries. Discussion of those search operators is an important part of this chapter.

Thus, the distinction between subquery types depends on the size of the select list, and the number of rows returned. The three types of subquery are not utterly separate: a scalar subquery’s definition is subsumed by a row subquery’s definition, which is subsumed by a Table subquery’s definition. (In fact, a one-Column Table subquery is such a common thing that some people regard it as a separate type, which they call a “Column subquery”.) Nevertheless, the potential distinction should be kept in mind. It’s easy to get confused by assuming that everything which applies for one type, applies for the others as well. We will discuss scalar subqueries, row subqueries and Table subqueries separately.

In general terms, a scalar subquery resembles a scalar function. Remember that a scalar function returns a single value, given an argument which is some sort of expression – well, a scalar subquery returns a single value, given an argument which is a <query expression>. Since that value must be a scalar value, a few aspects of scalar subqueries are simply logical consequences of their definitions:

The <data type>, Collation, length and all other significant attributes of the value are inherited from the attributes of the selected Column. For example, the result of the subquery:

... (SELECT 'abc' FROM Table_1) ...

has a <data type> of

`CHAR(3)`

.It is an error if, at runtime, the DBMS discovers that a scalar subquery returns more than one row. If this is the case, the entire SQL statement will fail: your DBMS will return the

`SQLSTATE error 21000 "cardinality violation"`

.

One thing that does *not* follow from the definition, and in fact surprises
some analysts, is the value returned when a scalar subquery returns zero rows.
In this case, the result is `NULL`

. For example, consider this SQL statement:

```
UPDATE Table_1 SET
column_1 = (SELECT column_1 FROM Table_2 WHERE 1 = 2);
```

Since 1 is never equal to 2, the search condition in this subquery example will
always be `FALSE`

– so the subquery result is an empty set. In this case,
the `UPDATE`

operation assigns a null value to `COLUMN_1`

in `TABLE_1`

.
This is a case where `NULL`

obviously does not mean “unknown” or “not
applicable” – it means “not there” – but no great harm results.

Here are some examples of scalar subqueries. You can tell that these must be scalar subqueries – not row subqueries or Table subqueries – because they are being used in places where only scalar values are legal.

```
-- scalar subquery in a select list
SELECT 'value is: ',
(SELECT column_1 FROM Table_1)
FROM Table_2;
-- scalar subquery in an UPDATE ... SET clause
UPDATE Table_1 SET
column_1 = (SELECT AVG(column_1) FROM Table_2);
-- scalar subquery in a comparison
SELECT column_3,column_1
FROM Table_1
WHERE (SELECT MAX(column_1) FROM Table_1) =
(SELECT MIN(column_1) FROM Table_2);
-- scalar subquery with arithmetic
INSERT INTO Table_1
VALUES (1 + (SELECT column_1 FROM Table_2));
```

Our third example of a scalar subquery shows its use in a `WHERE`

clause.
This is a traditional and common usage, as it used to be the only place a
subquery could appear. These general observations apply if you’re including a
scalar subquery in a comparison:

- The “comparison” operator is not limited to the traditional operators: it can
be pretty well any predicate, including
`LIKE`

or`BETWEEN`

. - The subquery can be on either side of the comparison operator. There can even be subqueries on both sides of the comparison operator.
- The subquery can be in an expression, along with arithmetic or scalar function operators.
- The subquery result will depend on the number of rows returned: one row is
normal and gives a known result, two or more rows results in a “cardinality
violation” error and zero rows results in the comparison predicate returning
`UNKNOWN`

, because the subquery result value is`NULL`

. We’ve repeated these general observations here because we want to emphasize that such observations apply only to scalar subqueries. None of them will apply when we discuss comparisons that involve Table subqueries.

Row subqueries are similar to scalar subqueries in the one fundamental respect:
they may not return more than one row. If a row subquery returns two or more
rows, the entire SQL statement will fail: your DBMS will return the ```
SQLSTATE
error 21000 "cardinality violation"
```

.

Row subqueries offer a slight convenience for comparison operations. For example, we can say:

```
SELECT *
FROM Table_1
WHERE (column_1,'X') = (SELECT column_1,column_2 FROM Table_2);
```

That is, we can compare two Column values using a single equals operator. This is more concise, and probably more efficient, than comparing twice with scalar subqueries, as in this equivalent example:

```
SELECT *
FROM Table_1
WHERE Table_1.column_1 = (SELECT Table_2.column_1 FROM Table_2) AND
'X' = (SELECT Table_2.column_2 FROM Table_2);
```

A row subquery, then, is a form of row value expression – an expression that evaluates to one row. This construct can be used with every SQL predicate, not just basic comparison.

We now switch from talking about “one-row” subqueries, to talking about “multi-row” subqueries, or Table subqueries. This means, almost exclusively, subqueries that are used in comparison operations.

Quite often, a Table subquery looks like a scalar subquery, because – almost
always – a single Column makes up the subquery’s select list. You can
distinguish a Table subquery from context, though: it will always be preceded
by a special for-Table-subqueries-only operator, either
<comparison_operator>`ALL`

, <comparison_operator>`ANY`

(or its synonym,
<comparison_operator>`SOME`

), `IN`

, `EXISTS`

, or `UNIQUE`

.

In our chapters on the various <data type>s, we showed you which of them could
be used in an expression that compared a value of that <data type> with the
collection of values returned by a Table subquery. Here’s a more detailed
explanation of quantified comparisons using the quantifiers `ALL`

, `SOME`

,
`ANY`

.

The required syntax for an `ALL`

quantified comparison is:

```
scalar_expression comparison_operator ALL
```

Here, `scalar_expression`

may be any expression that evaluates to a single
value and `comparison_operator`

may be any one of: = or > or < or >= or <= or
<>. `ALL`

returns `TRUE`

if the Table subquery returns zero rows or if the
comparison operator returns `TRUE`

for every row returned by the Table
subquery. `ALL`

returns `FALSE`

if the comparison operator returns
`FALSE`

for at least one row returned by the Table subquery. Suppose that
`TABLE_1`

has one Column, defined as `DECIMAL(6,2)`

. Here’s what will
happen, for different values in the rows of `TABLE_1`

, for this expression:

```
... WHERE 1000.00 > ALL
(SELECT column_1 FROM Table_1) ...
```

- If
`TABLE_1`

contains the values {`100.00, 200.00, 300.00`

}, the expression is`TRUE`

: all of`TABLE_1`

’s values are less than`1000.00`

. - If
`TABLE_1`

contains the values {`100.00, 2000.00, 300.00`

}, the expression is`FALSE`

: one of`TABLE_1`

’s values is greater than`1000.00`

. - If
`TABLE_1`

contains the values {`1000.00, 200.00, 300.00`

}, the expression is`FALSE`

too: one of`TABLE_1`

’s values is equal to`1000.00`

. - If
`TABLE_1`

contains no values, the expression is`TRUE`

: when the set is empty,`ALL`

is`TRUE`

. - If
`TABLE_1`

contains the values {`100.00, NULL, 300.00`

}, the expression is`UNKNOWN`

: when`NULL`

s are involved,`ALL`

is`UNKNOWN`

.

The required syntax for an `ANY`

or `SOME`

quantified
comparison is:

```
scalar_expression comparison_operator ANY <Table subquery> |
scalar_expression comparison_operator SOME <Table subquery>
```

Once again, `scalar_expression`

may be any expression that evaluates to a
single value and `comparison_operator`

may be any one of: = or > or < or >=
or <= or <>. `SOME`

and `ANY`

are synonyms. They return `TRUE`

if the
comparison operator returns `TRUE`

for at least one row returned by the Table
subquery. They return `FALSE`

if the Table subquery returns zero rows or if
the comparison operator returns `FALSE`

for every row returned by the Table
subquery. Suppose, once again, that `TABLE_1`

has one Column, defined as
`DECIMAL(6,2)`

. Here’s what will happen, for different values in the rows of
`TABLE_1`

, for this expression:

```
... WHERE 1000.00 > ANY
(SELECT column_1 FROM Table_1) ...
```

- If
`TABLE_1`

contains the values {`100.00, 200.00, 300.00`

}, the expression is`TRUE`

: all of`TABLE_1`

’s values are less than`1000.00`

. - If
`TABLE_1`

contains the values {`100.00, 2000.00, 300.00`

}, the expression is`TRUE`

too: at least some of`TABLE_1`

’s values are less than`1000.00`

. - If
`TABLE_1`

contains no values, the expression is`FALSE`

: when the set is empty,`ANY`

is`FALSE`

. - If
`TABLE_1`

contains the values {`1000.00, 2000.00, 3000.00`

}, the expression is`FALSE`

too: all of`TABLE_1`

’s values are greater than or equal to`1000.00`

. - If
`TABLE_1`

contains the values {`100.00, NULL, 300.00`

}, the expression is`UNKNOWN`

: when`NULL`

s are involved,`ANY`

is`UNKNOWN`

.

There is a small trap when one considers the expression “`... 1000.00 <> ANY`

(<subquery>) `...`

”. Such an expression could be carelessly read as “… 1000
is not equal to any subquery result …” – that is, 1000 is not equal to every
row returned by the subquery. That is not the correct way to read such
expressions because it leads to the English-language ambiguity that “any” can
mean “every”. There are three ways to avoid such ambiguities:

Don’t use the

`ANY`

<keyword>. Use its synonym,`SOME`

, instead.When negatives are involved, replace

`ANY`

with`ALL`

. Logic tells us that these transformations are possible:expression <> ANY subquery --> expression = ALL subquery NOT (expression = ANY subquery) --> expression <> ALL subquery

as long as the subquery does not return an empty set.

Use a completely different syntax, for example, the

`EXISTS`

predicate, which we’ll discuss later in this chapter.

Here’s some examples of comparisons with subqueries, using the sample database we defined in our chapter on “Simple Search Conditions”. To find the records for the employees reporting to a manager named D. Black:

```
SELECT *
FROM Employee
WHERE dept = (SELECT dept FROM Department WHERE manager='BLACK D');
```

The result is:

`EMPNUM` |
`DEPT` |
`SURNAME` |
`GNAME` |
`ADDRESS` |

4 | D | MORGAN | CHUCK | 963 SOUTH |

Because there is only one possible row in the `DEPARTMENT`

Table with a
manager named `BLACK D`

, the equals operator can be used in this subquery
without a quantifier. If more than one value is possible, either `ALL`

,
`ANY`

or `SOME`

is needed to quantify the comparison operator, as in this
example:

```
SELECT manager
FROM Department
WHERE dept = ANY
(SELECT dept FROM Employee WHERE empnum<3);
```

The result is:

`MANAGER` |

SMITH A |

JONES B |

In addition to quantified comparisons, SQL provides four other predicates that
operate on the results of a Table subquery: the <in predicate>, the <exists
predicate>, the <unique predicate>, and the <match predicate>. Each will return
a boolean value: either `TRUE`

, `FALSE`

, or `UNKNOWN`

.

The required syntax for an <in predicate> is as follows.

```
<in predicate> ::=
row_expression [ NOT ] IN <in predicate value>
<in predicate value> ::=
<Table subquery> |
(row_expression [ {,row_expression}. . . ])
```

An <in predicate> compares a value to a collection of values and returns either
`TRUE`

, `FALSE`

, or `UNKNOWN`

. (If any argument is `NULL`

, the <in
predicate> returns `UNKNOWN`

.) `IN`

searches for data that matches any one
of a specified collection of values. `NOT IN`

searches for data that doesn’t
match any of the values in the collection. Note the words we’ve used in this
explanation; `IN`

can be used to replace the quantified comparison “=`ANY`

” — that is, these two expressions are equivalent:

```
... WHERE 'A' IN (SELECT column_1 FROM Table_1) ...
... WHERE 'A' = ANY (SELECT column_1 FROM Table_1) ...
```

This example shows one of the two variants of `IN`

. The other variant is:

```
IN (<list of values>)
```

This syntax has nothing to do with subqueries, but while we’re on the subject, these two expressions are also equivalent:

```
... WHERE column_1 IN ( 'A', 'B', 'C') ...
... WHERE column_1 = 'A' OR column_1 = 'B' OR column_1 = 'C' ...
```

With `IN`

, all the expressions must be comparable. You can use `[NOT] IN`

to compare one or more values to a collection; `row_expression`

may be any
expression which evaluates either to a single value or to a row of values.

The <in predicate> is `TRUE`

if the value of `row_expression`

is found
within the <in predicate value>. For example, these <in predicate>s are both
`TRUE`

:

```
1 IN (1,2,3)
1 IN (SELECT column_1 FROM Table_1 where column_1 = 1)
```

`NOT IN`

is simply the negation of `IN`

so these predicates are also
`TRUE`

:

```
1 NOT IN (5,6,7)
1 NOT IN (SELECT column_1 FROM Table_1 where column_1 = 15)
```

In other words, the <in predicate> has two variants, both of which are merely
shorthands for some other comparison syntax, so the same rules that apply for
comparisons of specific <data type>s apply to `IN`

as well. In both cases,
`IN`

appears to be more popular than the wordings it replaces.

If you want to restrict your code to Core SQL, make sure that all expressions
in an <in value list> are either <literal>s, references to host variables or
SQL parameters, <Field reference>s, or `CURRENT_PATH`

, `CURRENT_ROLE`

,
`CURRENT_USER`

, `SESSION_USER`

, `SYSTEM_USER`

, `USER`

.

Here’s some examples of the <in predicate> using the sample database we defined in Chapter 29 “Simple Search Conditions.” To find the addresses of the employees working in either department C or department D (retrieve values which match any of a list of specified values):

```
SELECT dept,empnum,address
FROM Employee
WHERE dept IN ('C','D');
```

The result is:

`DEPT` |
`EMPNUM` |
`ADDRESS` |

C | 3 | 567 NORTH |

D | 4 | 963 SOUTH |

To find the employee numbers of the employees whose pay rate is not 5.00, 9.00, or 16.00 (retrieve values which do not match any of a specified list):

```
SELECT empnum
FROM Payroll
WHERE rate NOT IN (5,9,16);
```

The result is:

`EMPNUM` |

1 |

4 |

To find the names of employees earning a rate of 8.00 (retrieve values which match any returned by a subquery):

```
SELECT surname
FROM Employee
WHERE empnum IN
(SELECT empnum
FROM Payroll
WHERE rate=8);
```

The result is:

`SURNAME` |

MORGAN |

In this example, the subquery is first evaluated to find the payroll records
with a rate of 8.00. The employee numbers of the result are then compared to
the employee numbers of the `EMPLOYEE`

Table to retrieve the surnames for the
final result.

To find the employees located in the warehouse:

```
SELECT surname FROM Employee WHERE Empnum IN
(SELECT empnum FROM Payroll WHERE location='WAREHOUSE');
```

The result is:

`SURNAME` |

JONES |

FRANCIS |

To find the names of the employees who report to A Smith:

```
SELECT gname,surname FROM Employee WHERE dept IN
(SELECT dept FROM Department WHERE manager='SMITH A');
```

The result is:

`GNAME` |
`SURNAME` |

SARA | KOO |

ALICE | SMITH |

`IN`

expressions can be nested. To find the manager of employees working
in the warehouse:

```
SELECT manager FROM Department WHERE dept IN
(SELECT dept FROM Employee WHERE empnum IN
(SELECT empnum FROM Payroll WHERE location='WAREHOUSE'));
```

The result is:

`MANAGER` |

BROWN C |

GREEN E |

To find the names of employees who don’t work on the 10th floor (retrieve values which don’t match any returned by a subquery):

```
SELECT gname,surname FROM Employees WHERE empnum IN
(SELECT empnum FROM Payroll WHERE location<>'10TH FLOOR');
```

The result is:

`GNAME` |
`SURNAME` |

JOHN | MARSH |

MABEL | JONES |

CHUCK | MORGAN |

ALICE | SMITH |

BOB | JONES |

CHRIS | FRANCIS |

LINDA | TURNER |

The required syntax for an <exists predicate> is as follows.

```
<exists predicate> ::=
[ NOT ] EXISTS <Table subquery>
```

An <exists predicate> is a test for a non-empty set and returns either `TRUE`

or `FALSE`

. `EXISTS`

is `TRUE`

if the Table subquery returns at least one
row; otherwise it is `FALSE`

. `NOT EXISTS`

is `TRUE`

if the Table
subquery returns zero rows; otherwise it is `FALSE`

.

By tradition, the <Table subquery> following an <exists predicate> begins with
`SELECT *`

. In this case, the asterisk is not a shorthand for a list of
Columns, it merely stands for “some Column.” Unless you’re using Core SQL, it
doesn’t actually matter what you put here — but in Core SQL, the Table
subquery’s select list must either be just an asterisk or it must evaluate to a
single derived Column. Whatever you use, the result is the same; if the
subquery returns any rows, `EXISTS`

is `TRUE`

— regardless of the number
of Columns that the subquery result contains. Here’s an example:

```
SELECT column_1
FROM Table_1
WHERE EXISTS (SELECT * FROM Table_2);
```

If there are any rows at all in `TABLE_2`

, then the search condition is
`TRUE`

and all values of `TABLE_1.COLUMN_1`

will be selected. This is a
rare example of a rather static subquery. Much more often, the Table subquery
will be a correlated subquery.

If you use `EXISTS`

, followed by a correlated subquery with a single outer
reference, you are accomplishing the same thing as you would accomplish with a
quantified `ANY`

comparison. For example, here are two equivalent SQL
statements:

```
-- query using "> ANY" and a simple subquery
SELECT *
FROM Table_1
WHERE column_1 > ANY
(SELECT column_1)
FROM Table _2
WHERE Table_2.column_2 = 5);
-- the same query, using EXISTS and a correlated subquery
SELECT *
FROM Table_1
WHERE EXISTS
(SELECT *
FROM Table_2
WHERE Table_2.column_2 = 5 AND
Table_1.column_1 > Table_2.column_1);
```

The `EXISTS`

version of the request is more cumbersome, but it’s a better
choice if there are several items to compare between the outer statement and
the subquery.

`NOT EXISTS`

is merely the negation of `EXISTS`

, but it deserves attention
as a separate operator. The interesting case is a double-nested `NOT EXISTS`

predicate, which can solve questions of the general form — find all the A’s
which are related to all of the B’s. Logicians call these FORALL questions. An
example of a FORALL question is, “List the students who are in a class for
every course that the school offers.” Expressed with a double-nested ```
NOT
EXISTS
```

predicate, this question is answered with:

```
SELECT student_name
FROM Students
WHERE NOT EXISTS
(SELECT *
FROM Courses
WHERE NOT EXISTS
(SELECT *
FROM Classes
WHERE Classes.course_id = Courses.course_id AND
Classes.student_id = Students.student_id));
```

In other words, “Look for the case where there exists no student where there exists no course where there exists a class for the course containing the student.” We can’t say that quickly three times, but it doesn’t matter; you can use this example as a template for FORALL questions.

Here’s some examples of the <exists predicate> using the sample database we defined in our Chapter on “Simple Search Conditions.” To find the names of employees earning a rate of 8.00 (retrieve a value only if some condition exists):

```
SELECT surname FROM Employee WHERE EXISTS
(SELECT * FROM Payroll WHERE rate=8 AND empnum=Employee.Empnum);
```

The result is:

`SURNAME` |

MORGAN |

To find the employees located in the warehouse:

```
SELECT surname FROM Employee WHERE EXISTS
(SELECT * FROM Payroll
WHERE location='WAREHOUSE' AND empnum=Employee.empnum);
```

The result is:

`SURNAME` |

JONES |

FRANCIS |

`EXISTS`

expressions can be nested. To find the managers of employees
working in the warehouse:

```
SELECT MANAGER FROM Department WHERE EXISTS
(SELECT * FROM Employee WHERE dept=Department.dept AND EXISTS
(SELECT * FROM Payroll
WHERE location='WAREHOUSE' AND empnum=Employee.empnum));
```

The result is:

`MANAGER` |

BROWN C |

GREEN E |

The required syntax for a <unique predicate> is as follows.

```
<unique predicate> ::=
[ NOT ] UNIQUE <Table subquery>
```

A <unique predicate> is a test for duplicate rows and returns either `TRUE`

or `FALSE`

. `UNIQUE`

returns `TRUE`

if the Table subquery returns zero
rows or one row or if every row returned by the Table subquery is unique (that
is, every row contains a different set of non-null values than every other
row); otherwise it returns `FALSE`

. `NOT UNIQUE`

returns `TRUE`

if any
row returned by the Table subquery is an exact duplicate of another row;
otherwise it returns `FALSE`

. The <unique predicate> shouldn’t be used if the
Table subquery returns Columns with a <data type> of `BLOB`

, `CLOB`

,
`NCLOB`

, or `ARRAY`

.

For an example of how `UNIQUE`

works, suppose that `TABLE_1`

has one
Column, defined as `DECIMAL(6,2)`

. Here’s what will happen, for different
values in the rows of `TABLE_1`

, for this expression:

```
... WHERE UNIQUE (SELECT column_1 FROM Table_1) ...
```

- If
`TABLE_1`

contains the values {`100.00, 200.00, 300.00`

}, the expression is`TRUE`

; all of`TABLE_1`

’s rows are different. - If
`TABLE_1`

contains only {`100.00`

}, the expression is`TRUE`

too; all of`TABLE_1`

’s rows are different because there is only one. - If
`TABLE_1`

contains no values, the expression is also`TRUE`

; when the set is empty,`UNIQUE`

is`TRUE`

. - If
`TABLE_1`

contains the values {`100.00, 2000.00, 100.00`

}, the expression is`FALSE`

; at least two of`TABLE_1`

’s rows are the same. - If
`TABLE_1`

contains the values {`100.00, NULL, 300.00`

}, the expression is`TRUE`

; when`NULL`

s are involved,`UNIQUE`

ignores them and looks at the remaining values. In this case, all of`TABLE_1`

’s remaining rows are different. - If
`TABLE_1`

contains the values {`100.00, NULL, 100.00`

}, the expression is`FALSE`

; after eliminating`NULL`

s, at least two of`TABLE_1`

’s remaining rows are the same.

The <unique predicate> was introduced to SQL with SQL-92 but is still not much
used in application programs. Its main purpose is to expose the operation which
the DBMS uses to handle `UNIQUE`

Constraints. For example, in the following
pair of SQL statements, the first is a `UNIQUE`

Constraint definition and the
second is a statement that the DBMS implicitly uses to enforce the Constraint:

```
-- UNIQUE Constraint definition
ALTER TABLE Table_1 ADD CONSTRAINT unique_constraint
UNIQUE (column_1);
-- implicitly-used enforcement statement
SELECT * FROM Table_1
WHERE UNIQUE (SELECT column_1 FROM Table_1);
```

If you want to restrict your code to Core SQL, don’t use the <unique predicate>.

The required syntax for a <match predicate> is as follows.

```
<match predicate> ::=
row_expression MATCH [ UNIQUE ] [ SIMPLE | PARTIAL | FULL ]
<Table subquery>
```

A <match predicate> is a test for matching rows and returns either `TRUE`

or
`FALSE`

. The `row_expression`

can be any expression which evaluates to one
row of values; this row must be comparable to the rows returned by the Table
subquery — that is, `row_expression`

and Table subquery must return rows
with the same number of values and each pair of corresponding values must have
comparable <data type>s. The <match predicate> can’t be used if either
`row_expression`

or the Table subquery returns values with a <data type> of
`BLOB`

, `CLOB`

, `NCLOB`

, or `ARRAY`

.

The <match predicate> has eight possible forms:

`row_expression MATCH`

Table subquery`row_expression MATCH SIMPLE`

Table subquery`row_expression MATCH UNIQUE`

Table subquery`row_expression MATCH UNIQUE SIMPLE`

Table subquery`row_expression MATCH PARTIAL`

Table subquery`row_expression MATCH UNIQUE PARTIAL`

Table subquery`row_expression MATCH FULL`

Table subquery`row_expression MATCH UNIQUE FULL`

Table subquery

The first and second forms are equivalent, as are the third and fourth forms;
if none of {`SIMPLE | PARTIAL | UNIQUE`

} is specified, the default is
`SIMPLE`

.

The expression “`row_expression MATCH SIMPLE`

Table subquery” is `TRUE`

only in two cases:

- It is
`TRUE`

if the result of`row_expression`

contains at least one`NULL`

— e.g., if`row_expression`

evaluates to {`100, NULL, 300`

}. - It is
`TRUE`

if the result of`row_expression`

contains no`NULL`

s and the Table subquery returns at least one row that is equal to`row_expression`

.

The expression “`row_expression MATCH UNIQUE SIMPLE`

Table subquery” is
`TRUE`

only in two cases:

- It is
`TRUE`

if the result of`row_expression`

contains at least one`NULL`

. - It is
`TRUE`

if the result of`row_expression`

contains no`NULL`

s and the Table subquery returns exactly one row that is equal to`row_expression`

.

The expression “`row_expression MATCH PARTIAL`

Table subquery” is `TRUE`

only in two cases:

- It is
`TRUE`

if the result of`row_expression`

contains only`NULL`

s — e.g., if`row_expression`

evaluates to, {`NULL, NULL, NULL`

}. - It is
`TRUE`

if the Table subquery returns at least one row whose values equal their corresponding non-null values in`row_expression`

— e.g., if`row_expression`

evaluates to {`100, NULL, 300`

}, at least one row returned by the subquery must be {`100`

, <any value at all>,`300`

}.

The expression “`row_expression MATCH UNIQUE PARTIAL`

Table subquery” is
`TRUE`

only in two cases:

- It is
`TRUE`

if the result of`row_expression`

contains only`NULL`

s. - It is
`TRUE`

if the Table subquery returns exactly one row whose values equal their corresponding non-null values in`row_expression`

.

The expression “`row_expression MATCH FULL`

Table subquery” is `TRUE`

only in two cases:

- It is
`TRUE`

if the result of`row_expression`

contains only`NULL`

s. - It is
`TRUE`

if the result of`row_expression`

contains no`NULL`

s and the Table subquery returns at least one row that is equal to`row_expression`

.

The expression “`row_expression MATCH UNIQUE FULL`

Table subquery” is
`TRUE`

only in two cases:

- It is
`TRUE`

if the result of`row_expression`

contains only`NULL`

s. - It is
`TRUE`

if the result of`row_expression`

contains no`NULL`

s and the Table subquery returns exactly one row that is equal to`row_expression`

.

Assume a Table called `TABLE_1`

, defined with three Columns and containing
this data:

TABLE_1 | ||
---|---|---|

`COLUMN_1` |
`COLUMN_2` |
`COLUMN_3` |

10 | 10 | 10 |

10 | 10 | 10 |

10 | NULL | 10 |

10 | 10 | 20 |

Using `TABLE_1`

, here are some examples of <match predicate> expressions,
all of which are `TRUE`

:

```
... WHERE ROW(10,NULL,10) MATCH SIMPLE
(SELECT * FROM Table_1) ...
... WHERE ROW(10,10,10) MATCH SIMPLE
(SELECT * FROM Table_1) ...
... WHERE ROW(10,NULL,10) MATCH UNIQUE SIMPLE
(SELECT * FROM Table_1) ...
... WHERE ROW(NULL,NULL,NULL) MATCH PARTIAL
(SELECT * FROM Table_1) ...
... WHERE ROW(10,NULL,10) MATCH PARTIAL
(SELECT * FROM Table_1) ...
... WHERE ROW(NULL,NULL,NULL) MATCH UNIQUE PARTIAL
(SELECT * FROM Table_1) ...
... WHERE ROW(NULL,NULL,NULL) MATCH FULL
(SELECT * FROM Table_1) ...
... WHERE ROW(10,10,10) MATCH FULL
(SELECT * FROM Table_1) ...
```

Still using `TABLE_1`

, here are some examples of <match predicate>
expressions, all of which are `FALSE`

:

```
... WHERE ROW(10,10,10) MATCH UNIQUE SIMPLE
(SELECT * FROM Table_1) ...
... WHERE ROW(10,NULL,10) MATCH UNIQUE PARTIAL
(SELECT * FROM Table_1) ...
... WHERE ROW(10,NULL,10) MATCH FULL
(SELECT * FROM Table_1) ...
... WHERE ROW(10,10,10) MATCH UNIQUE FULL
(SELECT * FROM Table_1) ...
```

If you want to restrict your code to Core SQL, don’t use the <match predicate>.

The required syntax for a <quantified predicate> is as follows.

```
<quantified predicate> ::=
FOR ALL <Table reference> list (<search condition>) |
FOR ANY <Table reference> list (<search condition>) |
FOR SOME <Table reference> list (<search condition>)
<Table reference> list ::=
<Table reference> [ {,<Table reference>} ... ]
```

A <quantified predicate> is a three-valued comparison test; it takes the
Cartesian product of “<Table reference> list,” compares the resulting rows with
the search condition specified and returns either `TRUE`

, `FALSE`

, or
`UNKNOWN`

. The <Table reference> list may contain one or more <Table name>s
or expressions that evaluate to Tables. The <quantified predicate>s work the
same way that the regular quantifiers do — that is:

`FOR ALL`

returns `TRUE`

if the search condition is `TRUE`

for every
row of the result of “<Table reference> list,” returns TRUE if the result of
“<Table reference> list” is zero rows, returns `FALSE`

if the search
condition is `FALSE`

for at least one row of the result of “<Table
reference> list,” and otherwise returns `UNKNOWN`

.

`FOR ANY`

returns `TRUE`

if the search condition is `TRUE`

for at least
one row of the result of “<Table reference> list,” returns `FALSE`

if the
search condition is `FALSE`

for every row of the result of “<Table
reference> list,” returns `FALSE`

if the result of “<Table reference> list”
is zero rows and otherwise returns `UNKNOWN`

. (As usual, `FOR SOME`

is a
synonym for `FOR ANY`

.)

Subqueries are good for answering many complex analytical questions, but they
can be hard to understand. As a good example, consider the “double ```
NOT
EXISTS
```

” method, which handles FORALL questions in an ugly manner. The SQL3
Standard tries to resolve this by introducing `FOR ALL`

and `FOR ANY`

—
two new predicates, which work only with Tables. Syntactically, the quantified
predicates don’t have to involve subqueries. But there are times when they
could be replacements for subqueries (particularly the confusing `NOT EXISTS`

syntax). Here are some simple examples:

```
... FOR ALL
(SELECT * FROM Employee AS Emps) (empnum>50)
```

evaluates to `FALSE`

if there are no employees with employee numbers greater
than 50.

```
... FOR ANY
(SELECT * FROM Payroll AS Pay) (location='BASEMENT')
```

evaluates to `TRUE`

if at least one employee works in the basement.

If you want to restrict your code to Core SQL, don’t use the <quantified predicate>.

Often, SQL statements containing subqueries can be re-formulated as statements containing joins or vice versa. The choice of which to use normally depends on taste or optimization considerations. There are some scenarios, though, which call for subqueries rather than joins:

When you want duplicates, but not false duplicates. Suppose

`Table_1`

has three rows — {`1,1,2`

} — and`Table_2`

has two rows — {`1,2,2`

}. If you need to list the rows in`Table_1`

which are also in`Table_2`

, only this subquery-based`SELECT`

statement will give the right answer (`1,1,2`

):SELECT Table_1.column_1 FROM Table_1 WHERE Table_1.column_1 IN (SELECT Table_2.column_1 FROM Table_2);

This SQL statement won’t work:

SELECT Table_1.column_1 FROM Table_1.Table_2 WHERE Table_1.column_1 = Table_2.column_1;

because the result will be {

`1,1,2,2`

} — and the duplication of 2 is an error. This SQL statement won’t work either:SELECT DISTINCT Table_1.column_1 FROM Table_1.Table_2 WHERE Table_1.column_1 = Table_2.column_1;

because the result will be {

`1,2`

} — and the removal of the duplicated 1 is an error too.When the outermost statement is not a query. The SQL statement:

UPDATE Table_1 SET column_1 = (SELECT column_1 FROM Table_2);

can’t be expressed using a join unless some rare SQL3 features are used.

When the join is over an expression. The SQL statement:

SELECT * FROM Table_1 WHERE column_1 + 5 = (SELECT MAX(column_1) FROM Table_2);

is hard to express with a join. In fact, the only way we can think of is this SQL statement:

SELECT Table_1.* FROM Table_1, (SELECT MAX(column_1) AS max_column_1 FROM Table_2) AS Table_2 WHERE Table_1.column_1 + 5 = Table_2.max_column_1;

which still involves a parenthesized query, so nothing is gained from the transformation.

When you want to see the exception. For example, suppose the question is, “What books are longer than

*Das Kapital*?” These two queries are effectively*almost*the same:SELECT DISTINCT Bookcolumn_1.* FROM Books AS Bookcolumn_1 JOIN Books AS Bookcolumn_2 USING(page_count) WHERE title = 'Das Kapital'; SELECT DISTINCT Bookcolumn_1.* FROM Books AS Bookcolumn_1 WHERE Bookcolumn_1.page_count > (SELECT DISTINCT page_count FROM Books AS Bookcolumn_2 WHERE title = 'Das Kapital');

The difference between these two SQL statements is, if there are two editions of

*Das Kapital*(with different page counts), then the self-join example will return the books which are longer than the shortest edition of*Das Kapital*. That might be the wrong answer because the original question didn’t ask for “… longer than`ANY`

book named*Das Kapital*” (it seems to contain a false assumption that there’s only one edition).

The following examples are derived, with much editing, from an SQL application suite for a public library. They illustrate the most common situations where subqueries have proven to be useful tools.

When a query is on one Table, but requires a quick look at a Column in another Table — for example, “Show the number of books checked out for a particular patron.” Here’s a subquery that answers this:

```
SELECT COUNT(*) FROM Circulations WHERE patron_id = ANY
(SELECT patron_id FROM Patrons WHERE Surname = 'Jones');
```

When detail and summary data are retrieved or compared in the same SQL statement — for example, “Set a report item to show the average cost of books.” Here’s a subquery that does this:

```
UPDATE Reports SET
average_book_cost = (SELECT AVG(book_cost) FROM Books);
```

When a selection involves some calculation or complexity that has nothing to do with the contents of what is ultimately selected — for example, “Who has taken out books from either branch 9 or branch 10.” Here’s a subquery that answers this:

```
SELECT DISTINCT patron_id FROM Circulations WHERE book_copy_id = ANY
(SELECT book_copy_id FROM Book_Copies
WHERE branch = 9 OR branch = 10);
```

Make sure your scalar subqueries return a maximum of one row. You can help by
*(a)* using `DISTINCT`

to remove duplicates, *(b)* selecting a value returned
by a set-function, and *(c)* including a primary key among the selected
Columns.

If your DBMS won’t support row subqueries, you can still avoid repeating the same subquery twice. For example, replace a construct like this:

```
SELECT * FROM Table_1 WHERE smallint_column_1 =
(SELECT smallint_column_1 FROM Table_2) AND smallint_column_2 = ALL
(SELECT smallint_column_2 FROM Table_2);
```

with a construct like this:

```
SELECT * FROM Table_1
WHERE smallint_column_1 * 100000 + smallint_column_2 = ALL
(SELECT smallint_column_1 * 100000 + smallint_column_2
FROM Table_2);
```

Put the fastest subquery at the deepest level. Because most DBMSs will process non-correlated subqueries by travelling from innermost statements to outermost statements, you’re (to some extent) controlling the order of evaluation by choosing which expression will be inner and which will be outer.

Use scalar subqueries rather than Table subqueries. Because scalar subqueries are more restrictive, you are giving your DBMS more information when you use a construct that requires a scalar subquery. The DBMS might pass that extra information to its optimizer.

Consider alternatives to correlated subqueries. There are some SQL statements that work better (or look more natural) if you use joins, set functions, or Table operators.

SQL-89 was very restrictive about subqueries. SQL-92 removed most of the restrictions and SQL3 changed very little. So most of the information in this chapter will apply for most of the DBMSs in use today. But it doesn’t hurt to know what some of the historical restrictions are. Cautious programmers will avoid these situations:

- Row subqueries.
- Subqueries nested to a depth greater than 16 levels.
- Statements with more than 15 <Table reference>s (this is a maximum in the FIPS specification).
- Subqueries that contain
`UNION`

,`EXCEPT`

, or`INTERSECT`

. - Scalar subqueries that contain
`DISTINCT`

,`GROUP BY`

, or`HAVING`

. - Subqueries which do not appear on the right side of a comparison operator
within a
`WHERE`

clause. This last restriction is the most significant because it used to be the case that “`WHERE`

expression = (`SELECT`

…)” was legal syntax, but almost nothing else was.