Chapter 35 – Changing SQL-data

Note

You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.

The book and the standard does not reflect the features of CrateDB, we are just publishing it as a service for the community and for reference purposes.

On the one hand, CrateDB does not implement the SQL-99 standard thoroughly, on the other hand, it extends the standard for implementing cluster features and others.

For more information specific to CrateDB, check out the CrateDB Reference documentation.

The SQL-data change statements are INSERT, UPDATE and DELETE. Many people call these the SQL “update” statements – which is okay, provided you don’t mix up “update” (lower case, meaning INSERT or UPDATE or DELETE) with UPDATE (upper case, meaning only UPDATE). We prefer the term “data change statements” because it’s unambiguous.

In this chapter, we’ll discuss changes to SQL-data in all aspects, including hidden changes, changes of Views, the syntax of INSERT, UPDATE and DELETE, access (Privilege) rules and the new SQL3 feature: changes to joined Tables. But we won’t talk about four important and relevant matters because they rate chapters of their own:

Multiple data changes and transactions, discussed on our chapter on “SQL Transactions”.

Data changes and Constraints, discussed on our chapter on “SQL Constraints and Assertions”.

Data changes and Triggers, discussed on our chapter on “SQL Triggers”.

Data changes through Cursors, discussed on our chapter on “Embedded SQL Binding Style” – in particular, data changes with “positioned” UPDATE/DELETE.

After reading this chapter, you’ll know everything that there is to know about the SQL data-change statements taken in isolation. But we give you fair warning: we’ll revisit some of these points when we put them in context later on.

Table of Contents

The SQL-data Change Statements

In SQL, there are fundamentally only three conceivable data-change operations that can be performed at the row level: the addition of a new row, the editing of an existing row and the removal of an existing row. So the only SQL statements that you’ll ever need to change SQL-data are (respectively): INSERT, UPDATE and DELETE. Before getting into the details of each individual statement, we’ll note those features or restrictions which are common to two or more of them.

The “set at a Time” Rule

If multiple rows are involved in a data change operation, you must abandon “row at a time” thinking. For example, a WHERE clause is completely evaluated before a DELETE operation beings to remove rows, and an UPDATE operation is not affected by a change to the last row for “uniqueness” purposes.

The “Updatable Table” Rule

Although the object of a data-change statement can be a View (or a derived Table), ultimately the changes actually happen to one or more Base tables. Sometimes, though, an operation won’t make sense if applied to the appropriate Base table and so the SQL-data change statement won’t work.

The “Read Only” Rule

SQL provides a statement that is used to prevent any data changes from happening: if you execute SET TRANSACTION READ ONLY, then all INSERT, UPDATE and DELETE statements will fail: your DBMS will return the SQLSTATE error 25006 "invalid transaction state-read-only transaction."

The “Default Values” Rule

Remember that any <Column definition> or Domain definition can include the clause “DEFAULT value”. Your DBMS uses the value specified to provide a value for a Column that is implicitly the object of a data change operation. You can also use this value explicitly by specifying the <keyword> DEFAULT in an INSERT statement or an UPDATE statement – this has the effect of setting a Column to its default value. If you didn’t use a DEFAULT clause when you defined a Column or Domain, that’s okay – it still has a default value: NULL (provided, of course, that there isn’t also a NOT NULL Constraint on the Column or Domain).

The “Column Integrity” Rule

At the same time that you defined a Column, you declared what its <data type> is. The <data type> limits the values that the Column can contain. For example, if it’s a DATE, then you can’t insert the character string 'Hello, world' into it. This limitation – that a Column’s data must conform to its <data type> – is enforced on any assignment: it is an implicit Constraint on the Column. This implicit Constraint isn’t like the explicit Constraints we talked about in out chapter on constraints and assertions, though. The big difference between them is that Column integrity is checked when the SQL-data change statement starts, while explicit Constraints are checked when the SQL- data change statement ends. In other words, your DBMS vets your INSERTs and UPDATEs; it won’t let utter garbage through no matter how you insist.

The “No Data” Condition

Any SQL-data change statement can be a perfect success – and still affect zero rows. INSERT, UPDATE and DELETE all depend to some extent on “search conditions”, so it should be clear that sometimes there will be no rows to insert or update or delete. In that case, the SQL-data change statement hasn’t failed – it just hasn’t found anything that fulfills your requirements for a change. In such cases, your DBMS will return the SQLSTATE warning 02000 "no data." (If you’re a programmer who’ll want to know the exact number of rows that any SQL-data change statement affects, see our discussion of the SQLRowCount function in our chapter on SQL/CLI diagnostics.)

The “Data Change Object” Condition

For each SQL-data change statement, there is a syntax diagram that has “<Table reference>” in a prominent place. This somewhat anticipates the conclusion of the chapter, since a <Table reference> can include <Correlation name>s, parentheses and various Table operators. As usual, we suggest that, until you’ve read through the entire chapter, you should interpret <Table reference> as <Table name>.

The “Tentative” Condition

Finally, you’ll note that in the description of each SQL-data change statement, we use the word “tentatively” when telling you what it does. This is not a standard term – we use it merely to indicate that any new row is “on probation”: it’s not too late to take your move back until COMMIT time.

INSERT Statement

The INSERT statement’s job is to tentatively add new rows to a Table. If INSERT succeeds, there will be a number (zero or more) of new rows in your target Table. The required syntax for the INSERT statement is:

INSERT INTO <Table reference>
[ (<Column name> [ {,<Column name>} ... ]) ] <query expression> |
DEFAULT VALUES

The INSERT statement has two forms. The first evaluates a query to construct one or more rows to be inserted into a Table. The second constructs one row – containing default values for every Column – and inserts it into a Table. The <Table reference> identifies your target Table: the Table that you want INSERT to add rows to. The target Table must be updatable – that is, it must either be a Base table, or a View that is not a read-only Table. If <Table reference> does not include a <Schema name> qualifier, your target Table must belong to the SQL-session default Schema. If your target Table is a typed Table, your INSERT statement may include the optional <keyword> ONLY.

To execute INSERT, your current <AuthorizationID> needs the INSERT Privilege on every Column directly affected by the insert operation – and, if an object Column is a derived Column (as it is in the case of a View), your current <AuthorizationID> also needs the INSERT Privilege on every underlying Table that makes up your target Table.

INSERT Column List

In the first form of INSERT, you can optionally specify a parenthesized object Column list, wherein you provide your DBMS with the unqualified names of the Columns that are the targets for each value returned by the <query expression>. Each Column named must belong to the Table named in the INTO clause; <Column name>s may not be repeated in the list but you may list the Columns in any order; the number of Columns in the list must match the number of result Columns returned by the query. INSERT places data into the Columns of the target Table in the order that you list them in the INSERT Column clause: the first value specified is assigned to the first Column named, the second value is assigned to the second Column named, and so on. Any Columns of the target Table that are omitted from the INSERT Column clause are assigned their default value. For example, these SQL statements create a new Table and insert one row of data into it (the default value for a Column that is specifically defined with one is NULL):

CREATE TABLE Table_1 (
   column_1 INTEGER, column_2 CHARACTER(7), column_3 DATE);

INSERT INTO Table_1 (column_3, column_1)
VALUES (DATE '1996-12-31', 20);

If a Column in the Column list is a self-referencing Column, your INSERT statement must include the OVERRIDE SYSTEM GENERATED VALUES clause.

If you omit the Column list clause from an INSERT statement, your DBMS will assume you mean “all Columns in the Table, in their ordinal position order”. In this case, INSERT places data into the Columns of the target Table in the implied order listed: the first value specified is assigned to the first Column of the target Table, the second value is assigned to the second Column of the target Table, and so on. For example, these SQL statements create a new Table and insert one row of data into it:

CREATE TABLE Table_1 (
   column_1 INTEGER, column_2 CHARACTER(7), column_3 DATE);

INSERT INTO Table_1
VALUES (20, 'GOODBYE', DATE '1996-12-31');

<query expression>

The <query expression> that makes up the first form of INSERT is usually a SELECT statement, but it’s also common to see the Table constructor “VALUES (value commalist)” here (usually with only a single row). In this latter case, you can use the specifications DEFAULT (for insert default value), NULL (for insert null value), ARRAY??(??) or ARRAY[] (for insert an empty array) within the “value commalist” to specify a value for a Column. The order and number of the values must correspond to the order and number of the Columns in the INSERT Column list. If you’re using a <query expression>, the INSERT's <Table reference> may not identify a Table that also appears in any FROM clause in that <query expression>. The <query expression> provides zero or more rows of data values when it is evaluated: its search conditions are effectively evaluated for each row before any rows are added to the target Table. Here’s an example:

INSERT INTO Table_1 (column_1, column_3)
   SELECT column_1, column_3 FROM Table_2;

You can insert a null value into a Column either by placing the <keyword> NULL in the INSERT ... VALUES clause, or by omitting a Column that has no defined default value from an explicit <Column name> clause. For example, given this Table definition:

CREATE TABLE Table_1 (
   column_1 INTEGER, column_2 CHARACTER(7), column_3 DATE);

these INSERT statements, which insert one row of data into TABLE_1 using <literal>s and/or the <keyword> NULL, are equivalent:

INSERT INTO Table_1 (column_2, column_3)
VALUES ('GOODBYE', DATE '1996-12-31');

INSERT INTO Table_1 (column_1, column_2, column_3)
VALUES (NULL, 'GOODBYE', DATE '1996-12-31');

INSERT INTO Table_1
VALUES (NULL, 'GOODBYE', DATE '1996-12-31');

Tip

All three of these examples result in a new record in TABLE_1, with the specified values 'GOODBYE' in COLUMN_2 and '1996-12-31' in COLUMN_3, and the null value in COLUMN_1.

  • In the first example, the null value is implied since no explicit value is specified for COLUMN_1 (this, of course, assumes that there is no other default value for COLUMN_1). It is necessary to provide a <Column name> clause list of the Columns you want to assign the values to if your INSERT statement contains fewer source values than the target Table has Columns.

  • In the second and third examples, the null value is explicitly stated as the value to be assigned to TABLE_1.COLUMN_1 and therefore the <Column name> clause can be (but does not have to be) omitted. The explicit use of the <keyword> NULL over-rides the assignment of a default value to a Column that is omitted from a <Column name> clause. This, of course, assumes that the Column has not been defined with a NOT NULL Constraint.

You can insert a default value into a Column either by placing the <keyword> DEFAULT in the INSERT ... VALUES clause, or by omitting a Column that has a defined default value from an explicit <Column name> clause. For example, given this Table definition:

CREATE TABLE Table_1 (
   column_1 INTEGER DEFAULT 35,
   column_2 CHARACTER(7),
   column_3 DATE);

these INSERT statements, which insert one row of data into TABLE_1 using <literal>s and/or the <keyword> DEFAULT, are equivalent:

INSERT INTO Table_1 (column_2, column_3)
VALUES ('GOODBYE', DATE '1996-12-31');

INSERT INTO Table_1 (column_1, column_2, column_3)
VALUES (DEFAULT, 'GOODBYE', DATE '1996-12-31');

INSERT INTO Table_1
VALUES (DEFAULT, 'GOODBYE', DATE '1996-12-31');

Tip

All three of these examples result in a new record in TABLE_1, with the specified values 'GOODBYE' in COLUMN_2 and '1996-12-31' in COLUMN_3, and the default value 35 in COLUMN_1.

In the first example, the default value is implied since no explicit value is specified for COLUMN_1.

In the second and third examples, the default value is explicitly stated as the value to be assigned to TABLE_1.COLUMN_1 and therefore the <Column name> clause can be (but does not have to be) omitted.

DEFAULT VALUES

You can write your INSERT statement with the <keyword>s DEFAULT VALUES instead of with a <query expression> if every Column belonging to your target Table is to get its default value for a single row. These two INSERT statements are thus equivalent:

INSERT INTO Table_1 (column_1,column_2,column_3)
VALUES (DEFAULT,DEFAULT,DEFAULT);

INSERT INTO TABLE_1 DEFAULT VALUES;

INSERT will fail if attempts are made to insert:

  • a value which does not match a Column’s <data type>;

  • the null value into a Column defined with a NOT NULL Constraint;

  • a duplicate value into a Column defined with a UNIQUE Constraint;

  • a value not found in the corresponding PRIMARY KEY Column, into a Column defined as part of a FOREIGN KEY Constraint; or

  • a value which does not fall into a Column’s CHECK Constraint guidelines.

Tip

If the access mode of the current SQL transaction is read-only and the target Table is not a temporary Table, an INSERT statement will fail: your DBMS will return the SQLSTATE error 25000 "invalid transaction state."

INSERT Examples

Here are some examples of INSERT statements. First, let’s assume that two Tables, named AUTHORS_1 and AUTHORS_2, have these definitions:

CREATE TABLE Authors_1 (
   id INTEGER CONSTRAINT constraint_1 PRIMARY KEY,
   name VARCHAR(6) DEFAULT 'none');

CREATE TABLE Authors_2 (
   id INTEGER DEFAULT 12,
   name VARCHAR(6));

To add a single row to AUTHORS_1, we have three choices:

-- INSERT with no INSERT Column list
INSERT INTO Authors_1
VALUES (1,'Jonson');

-- INSERT with INSERT Column list
INSERT INTO Authors_1 (id,name)
VALUES (1,'Jonson');

-- INSERT with INSERT Column list, reversed
INSERT INTO Authors_1 (name,id)
VALUES ('Jonson',1);

To add several rows at a time to AUTHORS_1, we can do this:

INSERT INTO Authors_1 (id,name)
VALUES (2,'Smith'),
       (3,'Jones'),
       (4,'Martin'),
       (5,'Samuels'),
       (6,DEFAULT),
       (7,NULL);

The DEFAULT specification inserts the Column’s default value (or NULL, if no default was defined). The NULL specification inserts a null value. At this point, AUTHORS_1 looks like this:

AUTHORS_1

ID

NAME

1

Jonson

2

Smith

3

Jones

4

Martin

5

Samuels

6

none

7

NULL

To add all the rows in AUTHORS_1 to AUTHORS_2, we can do either of these:

INSERT INTO Authors_2
   SELECT id,name FROM Authors_1;

INSERT INTO Authors_2 (id,name)
   SELECT id,name FROM Authors_1;

INSERT INTO Authors_2 (name,id)
   SELECT name,id FROM Authors_1;

INSERT INTO Authors_2 (id,name)
   TABLE Authors_1;

To add an “all default values” row to AUTHORS_2, try:

INSERT INTO Authors_2
VALUES (DEFAULT, DEFAULT);

INSERT INTO Authors_2 (id,name)
VALUES (DEFAULT, DEFAULT);

INSERT INTO Authors_2 (name,id)
VALUES (DEFAULT, DEFAULT);

INSERT INTO Authors_2 DEFAULT VALUES;

At this point, AUTHORS_2 looks like this (the first 7 rows come from AUTHORS_1, the last row is the default row we just inserted):

AUTHORS_2

ID

NAME

1

Jonson

2

Smith

3

Jones

4

Martin

5

Samuels

6

none

7

NULL

12

NULL

INSERT Physics

Most DBMSs simply append newly inserted rows to the end of the target Table. But some (more sophisticated) DBMSs will put new rows wherever they can find space (there might be gaps left by earlier DELETE statements). And a few DBMSs will put new rows in order, according to the target Table’s primary key (this is called “clustering”; variants of clustering are practiced by DB2 and Oracle).

A general recommendation – which we often ignore – is that the optional INSERT Column list should always be explicitly stated. That way, if someday the target Table is altered and a Column is dropped, the INSERT will fail and you’ll be reminded that you have to change that particular operation.

If you want to restrict your code to Core SQL, don’t use a <query expression> with EXCEPT, INTERSECT or CORRESPONDING in an INSERT statement, don’t use a <query expression> that names an underlying Table of your target Table in an INSERT statement, don’t use the <query expression> TABLE <Table name> in an INSERT statement, if you use the <query expression VALUES (value commalist) in an INSERT statement, make sure it constructs only one new row, don’t use the DEFAULT VALUES form of the INSERT statement and make sure all your INSERT <Table reference>s are actually <Table name>s, with no <Correlation name>s or <derived Column list>s.

UPDATE Statement

The UPDATE statement’s job is to tentatively edit existing rows in a Table. If UPDATE succeeds, there will be a number (zero or more) of changed rows in your target Table. The required syntax for the UPDATE statement is:

UPDATE <Table reference> SET
<Column name>=scalar_expression [ {,<Column name>=scalar_expression} ... ] |
ROW=row_expression
[ WHERE <search condition> ]

UPDATE works by changing one or more values in zero or more rows of the target Table, Column by Column. The <Table reference> identifies your target Table: the Table that you want UPDATE to change rows of. The target Table must be updatable – that is, it must either be a Base table, or a View that is not a read-only Table. If <Table reference> does not include a <Schema name> qualifier, your target Table must belong to the SQL-session default Schema.

There are two forms of UPDATE: the first lets you specify multiple Column changes that don’t necessarily change every value in a row, the second lets you specify changes to every value in a row with a single SET clause. To execute UPDATE, your current <AuthorizationID> needs the UPDATE Privilege on every Column directly affected by the update operation – and, if an object Column is a derived Column (as it is in the case of a View), your current <AuthorizationID> also needs the UPDATE Privilege on every underlying Table that makes up your target Table – or, if your target Table is a <joined Table> made with LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN or UNION JOIN, your current <AuthorizationID> also needs the INSERT Privilege on every underlying Table that makes up your target Table.

SET Column

In the first form of UPDATEUPDATE <Table> SET <Column>=<value> – you must specify at least one Column change expression. The Column named must, of course, belong to your target Table; you can change multiple Columns of the same Table (in any order), but you can’t change the same Column more than once in a single UPDATE statement. The scalar_expression you use to assign a new value to a Column may be any expression (except one that contains a set function) that evaluates to a single value that is assignable to that Column’s <data type>, including the specifications DEFAULT (for change to default value), NULL (for change to null value), ARRAY??(??) or ARRAY[] (for change to an empty array). Most often, “scalar_expression” will be a <literal>. For example, using the AUTHORS_1 and AUTHORS_2 Tables from our discussion of the INSERT statement, we could do this:

UPDATE Authors_1 SET
   name='Finch'
   WHERE id=7;

This UPDATE statement changes the last row of AUTHORS_1 from {7,NULL} to {7,'Finch'} – if you don’t specify that a Column should be changed, the DBMS leaves it alone; it keeps its original value.

SET ROW

In the second form of UPDATEUPDATE <Table> SET ROW=row_expression – you may specify only the one change expression: it will change the entire applicable row. The “row_expression” may be any expression that evaluates to a row of values, equal in number to, and <data type> compatible with, every Column in your target Table (usually, this is a <row value constructor, but it could also be a <row subquery>). For example, we could change a whole row with either of these two statements:

UPDATE Authors_1 SET
   ROW = ROW(9,'Allan')
WHERE id=7;

UPDATE Authors_1 SET
  id=9,
  name='Allan'
WHERE id=7;

As we’ve shown, you can add the optional WHERE clause in both forms of UPDATE, to define the rows you want to change. If you omit the WHERE clause, your DBMS will assume that the condition is TRUE for all rows (and so UPDATE every row of the target Table). The WHERE clause’s <search condition> can be any condition at all, as long as it doesn’t invoke an SQL-invoked routine that possibly modifies SQL-data. The WHERE clause’s search condition is effectively evaluated for each row of the target Table before any of the Table’s rows are changed; each <subquery> in a WHERE clause’s search condition is effectively executed for each row of the target Table and the results are then used to apply the search condition to the given row. Here’s some examples:

-- to change all names to upper case
UPDATE Authors_1 SET
   name = UPPER(name);

-- to change all names to lower case
UPDATE Authors_1 SET
  name = LOWER(name);

-- to change all names to 'Johnson'
UPDATE Authors_1 SET
  name = 'Johnson';

UPDATE Examples

To change the last row in AUTHORS_1 so that the ID Column is the last ID belonging to AUTHORS_2, and the NAME Column is NULL:

UPDATE Authors_1 SET
  id   = (SELECT MAX(id) FROM Authors_2),
  name = NULL
WHERE id = 9;

To add 1 to the ID Column and cast the ID value to a character string, putting the result in the NAME Column:

UPDATE Authors_1 SET
  id   = id + 1,
  name = CAST(id AS VARCHAR(6))
WHERE id =1;

This example is even sillier than it looks, since real people very rarely change the value in a primary key Column – but we put it here to bring up an important fact. Consider the result of this statement, which is the row: {2,'1'}. Why isn’t the result: {2,'2'}? Because the assignments in the SET clause happen all at once, rather than in a sequence – so the ID value in the CAST(id AS VARCHAR(6)) expression is the original value (1), rather than the value we’re changing it to (2). This “all at once” effect is typical of SQL. Because of it, Column value swaps are easy.

Let’s try to do some updates which will cause errors. The first UPDATE statement in this set of examples will result in a “constraint violation” error, the next two will result in a “syntax” error:

UPDATE Authors_1 SET
  id = NULL;
-- fails because ID is a primary key and must thus be non-null

UPDATE Authors_1 SET
  name = DATE '1994-01-03';
-- fails because a date can't be assigned to a character string Column

UPDATE Authors_1 SET
  id = 1,
  id = 1;
-- fails because the same Column can't be changed more than once in a
single UPDATE statement

Here’s an UPDATE statement that sets the AUTHORS_1.NAME Column to a corresponding AUTHORS_2.NAME value:

UPDATE Authors_1 SET
  name=(SELECT name FROM Authors_2 WHERE id<=7 and id=Authors_1.id);

This example will only work if AUTHORS_2 has a row, with a matching ID, for every applicable row in AUTHORS_1.

At this point, AUTHORS_1 and AUTHORS_2 look like this:

AUTHORS_1

AUTHORS_2

ID

NAME

ID

NAME

1

Jonson

1

Jonson

2

Smith

2

Smith

3

Jones

3

Jones

4

Martin

4

Martin

5

Samuels

5

Samuels

6

none

6

none

12

NULL

7

NULL

12

NULL

UPDATE Physics

Most DBMSs fit a changed row in the same place as the old one – a good idea, because if UPDATEs always caused changes in row location, there would be much more disk activity (especially if there are indexes which point to the original row location). But your DBMS might shift rows if you change the primary key or if you change a variable size Column and it becomes longer. (In the latter case, there might not be enough room to fit the changed row in the original location.) These, of course, are “implementation-dependent” considerations, but we suggest that you hesitate slightly before you change primary key or variable size Columns.

Some general recommendations – try to group all updates for the same Table into the same UPDATE statement, do not substitute a DELETE plus an INSERT for an UPDATE and keep in mind that UPDATE can be a relatively slow operation.

If you want to restrict your code to Core SQL, don’t use a <search condition> that names an underlying Table of your target Table in an UPDATE statement and make sure all your UPDATE <Table reference>s are actually <Table name>s, with no <Correlation name>s or <derived Column list>s.

DELETE Statement

The DELETE statement’s job is to tentatively remove existing rows from a Table. If DELETE succeeds, there will be a fewer number (zero or more) of rows in your target Table. The required syntax for the DELETE statement is:

DELETE FROM <Table reference>
[ WHERE <search condition> ]

The rules for DELETE are analogous to the rules for UPDATE:

  • The <Table reference must identify an updatable Table that belongs to the SQL-session default Schema if you don’t provide an explicit <Schema name> qualifier.

  • If the WHERE clause is omitted, the DBMS will assume that the condition is TRUE for all rows (and so DELETE every row).

  • The WHERE clause’s <search condition> can be any condition at all, as long as it doesn’t invoke an SQL-invoked routine that possibly modifies SQL-data.

  • The WHERE clause’s search condition is effectively evaluated for each row of the target Table before any of the Table’s rows are marked for deletion.

  • Each <subquery> in a WHERE clause’s search condition is effectively executed for each row of the target Table and the results are then used to apply the search condition to the given row.

To execute DELETE, your current <AuthorizationID> needs the DELETE Privilege on the target Table – and, if the target Table is a derived Table (as it is in the case of a View), your current <AuthorizationID> also needs the DELETE Privilege on every underlying Table that makes up your target Table.

DELETE Examples

Here’s some examples of DELETE statements:

DELETE FROM Authors_1 WHERE name = name;
   -- deletes if name is not null

DELETE FROM Authors_1 WHERE name IS NOT NULL;
   -- deletes if name is not null

DELETE FROM Authors_1 WHERE name IS NULL;
   -- deletes if name is NULL

DELETE FROM Authors_2 WHERE id < 2;
  -- deletes some rows

DELETE FROM Authors_2;
  -- deletes all rows

DELETE Physics

With most DBMSs, a deleted row disappears from view in the Table, but actually remains in the underlying file for a while. Consider: if row #1 is at file offset 0, row #2 is at file offset 100 and row#3 is file offset 200 – and you delete row #1 – should your DBMS now move row #2 to file offset 0, move row #3 to file offset 100, change all indexes and truncate the file? That would take a long time, so the typical response is to put a “record deleted” flag at file offset 0 instead.

There are some DBMSs which will eventually reclaim this space. Of course that’s usually good – otherwise they wouldn’t do it – but it could cause trouble if you use a non-standard “row id” address which is derived from a row’s location. If row ids can change without warning, you’ve got trouble. With such DBMSs, it might be useful to put a row in limbo rather than delete it: set all Columns to NULL. Once that’s done, the row effectively disappears because most searches are never TRUE for NULL Columns. But the row stays in position, reserved, and won’t be reclaimed.

If you want to restrict your code to Core SQL, don’t use a <search condition> that names an underlying Table of your target Table in a DELETE statement and make sure all your DELETE <Table reference>s are actually <Table name>s, with no <Correlation name>s or <derived Column list>s.

Data Change Operations

Generalizing shamelessly, we compare here the steps required to do an UPDATE with the step required to do a SELECT:

UPDATE

SELECT

Read row from file

Read row from file

Ensure new values are valid for <data type>

Add to log

Change indexes

Check for Constraint violation

Write row back to file

The bottom line is: changes cost more than queries. Be consoled by the thought that updates are less frequent than reads (in a bank that we used to work for, we measured the ratio as 1 to 1,000). In fact, both programmers and users are willing to sacrifice a lot of extra trouble during data changes – constructing an index, say – if the result is a little less trouble during queries. The investment should not merely be to save speed, but to save trouble. That is why there are many mechanisms for guaranteeing that the data is valid. The makers of SQL were well aware that updates need care and protection.

Bulk Changes

A bulk change is an operation that affects a significant percentage of the rows in a Table. Since all updates are slow, a fortiori, bulk changes are extremely slow. There are some DBMS-specific ways to reduce the pain:

  • Use a vendor utility for multiple insertions (such as Oracle’s SQLLOAD).

  • Pass and retrieve arrays of rows with one call (ODBC’s SQLBulkOperations).

  • Access the underlying DBMS files directly (possible if the DBMS uses a non-proprietary file format such as comma-delimited, or .dbf).

  • Drop indexes and re-create them after the update.

But none of those tricks are portable and the third one isn’t recommended. We’re more interested in ways to reduce data change time in standard SQL.

Here are two ways that work for everybody, sometimes.

1 – Reduce the Row Count

This suggestion is more than a bromide. A way to reduce the row count will suggest itself if you look hard at the SET clause. Here’s an example:

UPDATE Table_1 SET
  column_1 = 'Jonson';

And here’s an equivalent UPDATE statement that might be a bit faster:

UPDATE Table_1 SET
  column_1 = 'Jonson'
 WHERE column_1 <> 'Jonson' OR column_1 IS NULL;

The reasoning here is that if COLUMN_1 already has 'Jonson' in it, there’s no point setting it to 'Jonson'. It’s a mechanical process: just look at what’s in the SET clause, and put the reverse in the WHERE clause. Here’s another example:

UPDATE Table_1 SET
  column_1 = column_1 * 1.1;

And here’s an equivalent UPDATE statement that might be a bit faster:

UPDATE Table_1 SET
  column_1 = column_1 * 1.1
WHERE column_1 <> 0;

The reasoning here is that if COLUMN_1 is zero or NULL, then multiplication can’t affect it. There are analogous observations for addition, division and subtraction.

2 – Do Multiple Updates in One Pass

Think of year-end or month-end, when you must: (a) pay 1% interest to clients with positive balances and (b) charge 2% interest to clients with negative balances. Here’s how to do it in two passes:

UPDATE Clients SET
   balance = balance * 1.01
WHERE balance > 0;

UPDATE Clients SET
  balance = balance * 1.02
WHERE balance < 0;

And here’s how to do it faster, in one pass:

UPDATE Clients SET
  balance = CASE balance
              WHEN balance > 0 THEN balance * 1.01
              ELSE balance * 1.02
              WHERE balance <> 0
            END;

Often it’s just a matter of anticipating what other jobs are likely to come up in the near future, for the same data.

Dialects

Data change is a stable section of SQL. All vendors support it, few vendors offer any variations or extensions. The ones that you might run into are:

  • Dropped INTO clause, for example INSERT Employees VALUES (...);, supported by Sybase.

  • Multi-column SET clauses, for example UPDATE Employees SET (column1,column2) = (column3,column4), supported by Oracle.

  • TRUNCATE – to delete all rows but leave definition intact, supported by Oracle 7.

At the time of writing, we know of no vendor with complete support for the SQL3 “update a join” feature. Many have partial support, though. This is the sort of feature that attracts awed attention (it’s hard to implement), but the important things are before that: it’s best to look for a clean, simple implementation.