Chapter 27 – User-defined Types

Note

You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.

The book and the standard does not reflect the features of CrateDB, we are just publishing it as a service for the community and for reference purposes.

On the one hand, CrateDB does not implement the SQL-99 standard thoroughly, on the other hand, it extends the standard for implementing cluster features and others.

For more information specific to CrateDB, check out the CrateDB Reference documentation.

[Obscure Rule] applies for this entire chapter.

SQL began life as a procedurally oriented language, or – properly speaking – as a language with procedures that worked on sets. Nowadays the rage is for object oriented languages – like C++ and Java. The popularity of object orientation (“OO” for short) has been so great, that some have attempted to supersede SQL with pure “OO DBMSs” (POET is an example). On the other hand, a great deal of time and trouble has been invested getting SQL to where it is today, so most users and experts have looked to a more moderate solution. Namely: extending SQL so that it can handle databases the OO way, but without abandoning the current language. We’ll call this hybrid object/relational, or O/R for short.

The inspirations have come from languages like C++ and Object Pascal, which are extensions of C and Pascal. Those extended languages have been successful. It appears that there will be an equivalent success for the Object Oriented SQL extensions, once DBMS vendors implement them and users get acquainted with them.

We’re not at that stage yet. What you’re going to read about in this chapter is the very latest stuff, just accepted as part of the SQL Standard, and still quite wet around the edges. Since that’s so, we will depart from the usual chapter organization. The first parts of this chapter are merely a tutorial which (we hope) you’ll find is fairly light reading, peppered with liberating slogans like “a type is a class” or “a row is an object”. The middle parts of this chapter are more of a syntactical slog as we get into the details of CREATE TYPE and the various SQL statements associated with it; however, we have omitted some details which we think will not be part of real DBMS implementations for some years.

If you’ve ever used an object oriented language, you’ll find the concepts are familiar (which they’re supposed to be – that’s the point). However, you must start with a good recollection of the earlier chapters on <data type>s, procedures and SQL syntax.

Table of Contents

UDTs

Until now, all the SQL <data type>s we’ve described have predefined <data type>s, sometimes called “built-in” or “primitive” <data type>s – although some portions of them are definable by users (for example, the length of a CHAR <data type>), most of their inherent structure is already set. The O/R part of SQL, though, uses user-defined data types, or UDTs. UDTs are just as easy to use as predefined data types; however, they are a lot harder to set up.

Just consider some of the things which are already handled for you (programmed in the DBMS) if you use a predefined data type:

  • There’s a way of storing the data physically in a Table (this is the concept of instantiation, which we’ll have some trouble with later).

  • There are comparison operations for the <data type>s, so you can tell whether two values are the same, or one is greater than the other (this is the concept of ordering).

  • There are built-in operators for the <data type>s, for example + is used to add to numbers together and produce another number (this is the concept of methods that change values of certain data types).

  • There are cast operations so that values in one <data type> can be converted to another, or interchanged with host language variables (this is the concept of cast methods and transforms).

With UDTs, you’re on your own in all these cases. When you make a UDT, you’re going to have to remember that all those things won’t be there, unless you put them there yourself.

However, when you do it, you’ll end up with a <data type> that’s just as good as the predefined types. You’ll be able to use your new UDT wherever you could use a predefined type before.

A Schema may contain zero or more UDTs. An SQL UDT is a named, user-defined set of valid values. UDTs are dependent on some Schema – the <UDT name> must be unique within the Schema the UDT belongs to (it may not be the same as any <Domain name> in its Schema either) – and are created, altered and dropped using standard SQL statements. The Objects that may belong to a UDT are known as Attributes; they depend on some UDT.

A UDT can be either a distinct type – a UDT that is based on a single predefined <data type> – or a structured type – a UDT that is based on a list of Attribute definitions.

A UDT is defined by a descriptor that contains twelve pieces of information:

  1. The <UDT name>, qualified by the <Schema name> of the Schema it belongs to.

  2. Whether the UDT is ordered.

  3. The UDT’s ordering form: either EQUALS, FULL or NONE.

  4. The UDT’s ordering category: either RELATIVE, HASH or STATE.

  5. The <specific routine designator> that identifies the UDT’s ordering function.

  6. If the UDT is a direct subtype of one or more other UDTs, then the names of those UDTs.

  7. If the UDT is a distinct type, then the descriptor of the <data type> it’s based on; otherwise an Attribute descriptor for each of the UDT’s Attributes.

  8. The UDT’s degree: the number of its Attributes.

  9. Whether the UDT is instantiable or not instantiable.

  10. Whether the UDT is final or not final.

  11. The UDT’s Transform descriptor.

  12. If the UDT’s definition includes a method signature list, a descriptor for each method signature named.

To create a UDT, use the CREATE TYPE statement (either as a stand-alone SQL statement or within a CREATE SCHEMA statement). CREATE TYPE specifies the enclosing Schema, names the UDT and identifies the UDT’s set of valid values. To destroy a UDT, use the DROP TYPE statement. None of SQL3’s UDT syntax is Core SQL, so if you want to restrict your code to Core SQL, don’t use UDTs.

UDT Names

A <UDT name> identifies a UDT. The required syntax for a <UDT name> is:

<UDT name> ::=
[ <Schema name>. ] unqualified name

A <UDT name> is a <regular identifier> or a <delimited identifier> that is unique (for all Domains and UDTs) within the Schema it belongs to. The <Schema name> which qualifies a <UDT name> names the Schema that the UDT belongs to and can either be explicitly stated, or a default will be supplied by your DBMS as follows:

  • If a <UDT name> in a CREATE SCHEMA statement isn’t qualified, the default qualifier is the name of the Schema you’re creating.

  • If the unqualified <UDT name> is found in any other SQL statement in a Module, the default qualifier is the name of the Schema identified in the SCHEMA clause or AUTHORIZATION clause of the MODULE statement that defines that Module.

UDT Example

Here’s an example of a UDT definition:

CREATE TYPE book_udt AS                -- the UDT name will be book_udt
title CHAR(40),                        -- title is the first attribute
buying_price DECIMAL(9,2),             -- buying_price is the second attribute
selling_price DECIMAL(9,2)             -- selling_price is the third attribute
NOT FINAL                              -- this is a mandatory Finality Clause
METHOD profit( ) RETURNS DECIMAL(9,2); -- profit is a method, defined later

This CREATE TYPE statement results in a UDT named BOOK_UDT. The components of the UDT are three attributes (named TITLE, BUYING_PRICE and SELLING_PRICE) and one method (named PROFIT).

  1. The three name-and-data-type pairs title CHAR(40) and buying_price DECIMAL(9,2) and selling_price DECIMAL(9,2) are the UDT’s Attribute definitions.

  2. The words NOT FINAL matter only for subtyping, which we’ll get to later. Briefly, though, if a UDT definition doesn’t include an UNDER clause, the finality clause must specify NOT FINAL.

  3. The clause METHOD profit () RETURNS DECIMAL (9,2) is a teaser. Like an Attribute, a “method” is a component of a UDT. However, this method – PROFIT – is actually a declaration that a function named PROFIT exists. This function isn’t defined further in the UDT definition – there is a separate SQL statement for defining functions: CREATE METHOD. All we can see at this stage is that PROFIT has a name and a (predefined) data type>, just as regular Attributes do. Some people would call PROFIT a “derived Attribute”.

Columns Based on UDTs

Let us begin by making a Table, one of whose Columns is a UDT:

CREATE TABLE T (
   book_column book_udt,
   serial_number INTEGER);

You can use a UDT wherever the syntax requires <data type>. So far so good. Now let’s INSERT a new row into the Table. This won’t be so simple.

BEGIN                                  /* compound statement: start */
  DECLARE u book_udt;                  /* temporary variable declaration */
  SET u = book_udt();                  /* constructor function */
  SET u = u.title('The Compleat SQL'); /* mutator function */
  SET u = u.buying_price(10.00);       /* mutator function */
  SET u = u.selling_price(20.00);      /* mutator function */
  INSERT INTO T VALUES (u,1);          /* ordinary-looking SQL statement */
END;                                   /* compound statement: end */

To understand the above compound statement, you’ll need to look closely at the declaration and the four function calls that precede the INSERT statement. They are simple things, but the terminology is fancy.

First: DECLARE u book_udt; is a declaration of a temporary variable named u. Nothing new about that (if you remember your PSM chapter), but it shows that variables too can be based on UDTs instead of predefined <data type>s.

Second: u = book_udt(); is a constructor function. The function we’re calling is named BOOK_UDT, the same as the UDT. That’s not a coincidence. The DBMS’s job is to create a constructor function automatically. When the CREATE TYPE book_udt1 was executed, this SQL statement happened implicitly:

CREATE FUNCTION book_udt1 ()
RETURNS book_udt1
...
RETURN V

In this CREATE FUNCTION statement, the RETURN V, is a value of type BOOK_UDT1, with all Attribute instance values equal to their default! What does that mean? It means that when we call the BOOK_UDT1 function, the return is a “default value” of the BOOK_UDT1 type. That’s useful, so it’s good to know that the DBMS makes a constructor function for every UDT.

Third: u.title('The Compleat SQL'); is a mutator function. The function we’re calling is named TITLE, the same as the first Attribute in the UDT. Once again, this is not a coincidence. The DBMS’s job is to create a mutator function automatically. When the CREATE TYPE book_udt1 was executed, this SQL statement also happened implicitly:

CREATE METHOD title
   (attribute CHAR(40))
      RETURNS book_udt
      SELF AS RESULT
      LANGUAGE SQL
      DETERMINISTIC
      CONTAINS SQL
      RETURN NULL ON NULL INPUT

This statement defines a “method” (or “method function”) because it is a component of a UDT. To invoke the method, we just have to say “<udt instance>.<method name> ( <new value> )” and the new value is a CHAR(40) string, which is compatible with the Attribute’s definition. The reason that this method is called a mutator function is that it changes the value in the object instance. So now the title in u is 'The Compleat SQL'.

You might wonder: why all the fuss? If the whole point is to set U.TITLE to a value, why not just:

SET u.title = 'The Compleat SQL';      /* not a good idea! */

Using a SET statement like this would violate an object-oriented principle called “encapsulation”, according to which: the only access to Attributes is via their functions (methods) – so SET is not legal. If you’re used to Windows programming, you’ll see the benefits of encapsulation: nothing can access the storage except pre-approved routines which change if the storage method changes.

Fourth and fifth: u.buying_price(10.00; and u.selling_price(20.00); are two more mutator functions. When the UDT was created, the DBMS made methods for BUYING_PRICE and FOR SELLING_PRICE too, just as it must do for every Attribute in a UDT.

The bottom line is: constructor functions are the ordinary way to make new instances of UDTs; mutator functions are the ordinary way to change Attributes in UDT instances. Using the constructor and the mutators, we have been able to set up a fully-initialized UDT instance – u – with the contents we want.

Sixth: INSERT INTO T VALUES (u,1); is dénouement. Clearly, it puts a value into Table T. What the value looks like, we have no idea: it’s encapsulated. We do know, though, that what we put in was {'The Compleat SQL', 10.00, 20.00}. Insertion phase complete.

Let us now get something back out of the UDT. You’ve probably guessed already that we aren’t going to just say:

SELECT book_column, serial_number
FROM   T;

To retrieve from a UDT, we need yet another function – the opposite of a mutator – to get the Attribute values out of BOOK_COLUMN into a representation that we can read. Such a function is called an observer function and, once again, the DBMS makes observer functions implicitly at the time that a UDT is created – one for each Attribute, like this:

CREATE METHOD title ()
   RETURNS CHAR(40)
   LANGUAGE SQL
   DETERMINISTIC
   CONTAINS SQL
   RETURN NULL ON NULL INPUT;

Since the observer method exists to help us get values out, we can use them in SELECT statements. To get all the values, we can use this SQL statement:

SELECT book_column.title(),
       book_column.buying_price(),
       book_column.selling_price()
FROM   T
WHERE  serial_number > 0;

In summary, these are the functions associated with our example UDT BOOK_UDT:

  • One constructor function, named BOOK_UDT.

  • Three mutator functions, named TITLE, BUYING_PRICE and SELLING_PRICE.

  • Three observer functions, also named TITLE, BUYING_PRICE and SELLING_PRICE. (Actually the above is only for the “default case”, we’ll worry about options later.) As the examples indicated, constructors and mutators and observers are all we need for simple storage and retrieval operations.

Routine Names and Signatures

One of the strengths of the Object/Relational system is that all the functions have the same name as the Attribute they’re associated with, which makes them easy to remember. But at first there’s a little confusion too! How do we tell them apart? If you look closely at these three references, you’ll see the differences:

  • title – the name of an Attribute.

  • title () – the name of an observer method.

  • title ('The Compleat SQL') – the name of a mutator method.

The DBMS distinguishes between names the same way that you do. The first distinction is easy: if it’s got parentheses after it, then it must be a routine name. The next distinction is that title() has no arguments, while title('The Compleat SQL') has one argument. That’s the difference between an observer and a mutator. Let’s express this latter distinction as a general set of rules.

Rule 1

It is legal for two routines to be “in the same name class” – that is, two routines with the same name may be in the same Schema.

Rule 2

Routines are distinguishable by their category. The four categories are “procedures”, “functions” and two types of “method”.

Rule 3

Routines in the same category are distinguishable by the count of parameters, and the declared type of those parameters, in their parameter list.

Since the name alone isn’t enough, we have to come up with a different term to describe “what it is that distinguishes routines”. The term that is in common use is “signature”. The signature of a routine is the routine name plus the routine category plus the parameter-data-type list, and it’s what the DBMS uses when it needs to figure out what routine you’re really trying to call.

Defining a Typed Table Based on a UDT

It’s one thing to base Columns on UDTs. It’s quite another thing to base Tables on UDTs. Tables based on UDTs are called “typed Tables” (or “referenceable Tables”). They are also called “the two sided coin” – because from one angle, typed Tables look like relations but from another angle they look like “instantiated classes” (to borrow a term from OO).

To make a typed Table, use the REF syntax: "CREATE TABLE <Table name> OF <UDT name> ... REF IS <column name>". For example:

CREATE TABLE Book
   OF book_udt REF IS self_referencing_column;

To execute this SQL statement, you must have the Privilege to create a Table – that is, you must be the Schema owner, and you must have a USAGE privilege on the UDT. There are several options and there are some side effects, but let us concentrate now on the OO significance: this instantiates the class.

As we said, “instantiated classes” are OO terminology. The OO equivalent for “user-defined type” is “class”. In pure OO languages, creating a class (with its Attributes and methods) takes care of everything, since classes are by default “instantiable” (“instantiable” is a short way to say “there can be instances of the class, namely objects based on the class definition”).

This form of CREATE TABLE takes the Attribute definitions from a UDT and transfers them to <Column definition>s in a Table. Since BOOK_UDT has three Attributes – TITLE, BUYING_PRICE, SELLING_PRICE – the BOOK Table will have four Columns – <self-referencing column>, TITLE, BUYING_PRICE, SELLING_PRICE. Of these, only the first Column, <self-referencing column>, needs a bit of explanation.

A self-referencing Column is the equivalent of what in object-oriented terminology would be an “object identifier”. All typed Tables have a self-referencing Column; that’s why typed Tables are sometimes called “referenceable Tables”. The point about this self-referencing Column is that it uniquely identifies a single row. Now, the row in the BOOK Table is going to be an instance of the “class” BOOK_UDT – which means a row is an object. So we’ll be able to reference this object later, by referring to the value in its self-referencing Column. The name of the self-referencing Column is whatever we specify in the REF IS clause – in this case, SELF_REFERENCING_COLUMN.

There are several options for generation of self-referencing Column values. Theoretically the best option is to derive the values from the Table’s primary key. But in this phase of the discussion we’ll just use the default option: VALUES ARE SYSTEM GENERATED. In effect, with the default option, a self-referencing column is an arbitrary row identifier.

Treating a Typed Table as a Table

This is the heads side of our two-sided coin. Table BOOK is just a Table. Therefore, all these operations are legal:

INSERT INTO Book (title, buying_price, selling_price)
VALUES ('The compleat SQL',10.00,20.00);

UPDATE Book SET
  title = 'The Incompleat SQL';

SELECT selling_price - buying_price
FROM   Book
WHERE  title LIKE '%SQL%';

Notice that in the INSERT statement example, we did not trouble to specify a new value for SELF-REFERENCING COLUMN. That is because the value is system-generated.

Now here is another operation: let us make a new Table which “references” the BOOK Table:

CREATE TABLE Book_chapter (
  book_ref_column REF(book),        /* a referencing Column */
  other_column CHAR(1024));         /* an ordinary Column */

INSERT INTO Book_chapter
  SELECT self_referencing_column, 'text for other_column'
  FROM   Book
  WHERE  title = 'The Incompleat SQL';

We described the REF <data type> in our chapter on <reference type>s, and you can look up the details there for how REF is defined. Here, at last, is an actual use for the REF <data type>. The new row in BOOK_CHAPTER will contain a “reference” to a single row in BOOK. This is awfully similar to referencing a primary key row from a foreign key, and in fact the BOOK_CHAPTER's BOOK_REF_COLUMN can be made to have the paraphernalia of a foreign key, including ON DELETE or CASCADE options.

Treating a Typed Table as an Instantiable class

This is the flip side of our two-sided coin. It is possible to refer to the UDT’s Attributes as they are stored (instanced) in the typed Table, using references. For example:

book_udt_reference_value -> profit()

will invoke the PROFIT method on the underlying value in BOOK_UDT.

Further: if we have a “REF (book)” value to steer by, we can perform “navigational” tricks. For example, this expression is legal:

book_ref_column -> title

The result is the value in the TITLE Column of BOOK, for the row whose self-referencing Column is the value in the BOOK_REF_COLUMN of BOOK_CHAPTER.

Now, getting deeply into the object-oriented business, we arrive at the concepts of “subclasses” and “superclasses” – or, since we’re using UDTs, “subtypes” and “supertypes”. The concept is fairly easy to grasp if we remind ourselves that we are merely modelling the real world, and in the real world these statements are manifestly true:

  • The BOOK_UDT type can be subdivided by topic – say, SCIENCE or HISTORY.

  • The HISTORY type can be further subdivided into ANCIENT and MODERN, while the SCIENCE type can be further subdivided into PHYSICS and (following Lord Kelvin’s famous dictum that “all science is either physics or stamp collecting”) STAMP COLLECTING.

In other words, we can think of a “family of types”, of which the proud single patriarch is BOOK_UDT (but we avoid the word patriarch and say “maximal supertype” instead). HISTORY and SCIENCE are subtypes of BOOK_UDT; therefore, by definition, BOOK_UDT is the supertype of HISTORY and SCIENCE. HISTORY, in turn, is the supertype of ANCIENT and MODERN, … and so on.

The point of all this is: BOOK_UDT is a data type, so – like any data type – its specification includes a description of its representation and of the operations that can be performed on it. That description is applicable as well for HISTORY and for STAMP COLLECTING, unless we explicitly say otherwise: the subtype inherits the Attributes and methods of the supertype.

We could, of course, reflect some subtyping concept using purely relational operations (by joining a BOOK Table to a PHYSICS Table, for example). But the advantage of object orientation is that such operations are implicit in the definition. By declaring that a is a subtype of b, you are saying that a is automatically linked to b, so the linking process is practically invisible.

To make the above discussion concrete, here are some SQL statements that make subtypes. In these examples, the <keyword> UNDER means “as a subtype of”.

CREATE TYPE history UNDER book_udt           NOT FINAL;
CREATE TYPE science UNDER book_udt           NOT FINAL;
CREATE TYPE modern UNDER history             NOT FINAL;
CREATE TYPE ancient UNDER history            NOT FINAL;
CREATE TYPE physics UNDER science            NOT FINAL;
CREATE TYPE stamp_collecting UNDER science   NOT FINAL;

Admittedly, these new types lack personality. We’ve kept it that way so that you can imagine for yourself what additional Attributes or methods you would want to add to the subtypes, on top of the “inheritance” that they receive from the supertypes.

Finally, we can flip the coin over again and say: “if a type can have subtypes, surely the instantiation of a type can have instantiations of subtypes.” In other words, Tables can have subtables:

CREATE TABLE History UNDER Book;
CREATE TABLE Science UNDER Book;
CREATE TABLE Ancient UNDER History;
CREATE TABLE Moderns UNDER History;
CREATE TABLE Physics UNDER Science;
CREATE TABLE Stamp_collecting UNDER Science;

Notice that the “subtables” and “supertables” defined are a family whose relationships match the relationships of the “subtypes” and “supertypes” they are based on. In fact, that is a requirement.

Here endeth the tutorial.

CREATE TYPE Statement

The CREATE TYPE statement defines a UDT. The required syntax for the CREATE TYPE statement is:

CREATE TYPE <UDT name>
[ UNDER <supertype UDT name> ]
[ AS {<predefined type> | <Attribute definition list>} ]
[ {INSTANTIABLE | NOT INSTANTIABLE} ]
{FINAL | NOT FINAL}
[ <reference type specification> ]
[ <cast option> ]
[ <method signature> [ {,<method signature> }... ] ]

   <Attribute definition list> ::=
   (<Attribute definition> [ {,<Attribute definition>}... ])

      <Attribute definition> ::=
      <Attribute name> { <data type> | <Domain name> }
      [ REFERENCES ARE [ NOT ] CHECKED [ ON DELETE
          {NO ACTION | CASCADE | RESTRICT | SET NULL | SET DEFAULT} ] ]
      [ DEFAULT default value ]
      [ COLLATE <Collation name> ]

   <reference type specification> ::=
   REF USING <predefined type> [ <ref cast option> ] |
   REF <Attribute name> [ {,<Attribute name>} ... ] |
   REF IS SYSTEM GENERATED

      <ref cast option> ::=
      [ CAST (SOURCE AS REF) WITH <cast-to-ref identifier> ]
      [ CAST (REF AS SOURCE) WITH <cast-to-type identifier> ]

   <cast option> ::=
   [ CAST (SOURCE AS DISTINCT)WITH <cast to distinct identifier> ]
   [ CAST (DISTINCT AS SOURCE) WITH <cast to source identifier> ]

   <method signature> ::=
   <original method signature> |
   OVERRIDING [ INSTANCE | STATIC ] <partial method signature>

      <original method signature> ::=
      [ INSTANCE | STATIC ] <partial method signature>
      [ SELF AS RESULT ] [ SELF AS LOCATOR ]
      [ LANGUAGE {ADA | C | COBOL | FORTRAN | MUMPS | PASCAL | PLI | SQL} ]
      [ PARAMETER STYLE {SQL | GENERAL} ]
      [ [ NOT ] DETERMINISTIC ]
      [ {NO SQL | CONTAINS SQL | READS SQL DATA | MODIFIES SQL DATA} ]
      [ {RETURN NULL ON NULL INPUT | CALL ON NULL INPUT} ]

      <partial method signature> ::=
      METHOD <routine name>
      (SQL parameter declaration list)
      RETURNS <data type>

CREATE TYPE defines a new UDT: a named set of valid data values. A UDT is owned by the Schema it belongs to.

  • The <UDT name> identifies the UDT and the Schema that it belongs to. A <UDT name> that includes an explicit <Schema name> qualifier belongs to the Schema named. A <UDT name> that does not include an explicit <Schema name> qualifier belongs to the SQL-session default Schema. The <UDT name> must be unique (for all Domains and UDTs) within the Schema that owns it. If CREATE TYPE is part of a CREATE SCHEMA statement, the <UDT name>, if explicitly qualified, must include the <Schema name> of the Schema being created; that is, it isn’t possible to create a UDT belonging to a different Schema from within CREATE SCHEMA.

There are actually three variants of CREATE TYPE, used in distinct ways:

Making structured types (garden variety UDTs)

Making structured types (subtype UDTs)

Making distinct types (not really truly UDTs)

CREATE TYPE <name>

CREATE TYPE <name>

CREATE TYPE <name>

AS <Attribute definitions>

UNDER <name>

AS <predefined type>

NOT FINAL

[NOT] FINAL

FINAL

If the representation clause is “AS <predefined data type>” – for example:

CREATE TYPE UDT_1 AS CHAR(15) FINAL

then this UDT is a distinct type. Usually when we talk about UDTs we mean the other kind of UDT – the structured type – because there’s not much that one can do with a distinct type.

If you use a subtype clause – for example:

CREATE TYPE a UNDER b ...

you are making a new subtype under an existing supertype. The supertype must exist, it must be a structured type UDT that was defined as NOT FINAL and you must have the UNDER Privilege on it. Notice that there can be only one supertype; this means that SQL, like Java, has a “single inheritance” rule. Notice too that you can have both a subtype clause and an Attribute definition list in the same CREATE TYPE statement; this means that the subtype can have both inherited Attributes (Attributes taken from the supertype) and original Attributes (Attributes taken from the new definition, which follow the inherited Attributes).

If the representation is “… AS <Attribute list> …”, then each Attribute definition must look very much like a <Column definition> looks in a CREATE TABLE statement – for example:

attribute_1 CHAR(1000) CHARACTER SET ISO8BIT,
attribute_2 TIME WITH TIME ZONE DEFAULT '12:00:00'

Constraints (such as NOT NULL) are illegal, though. An <Attribute name> is an <identifier>; all <Attribute name>s must be unique within their UDT. The Attribute’s <data type> may be a UDT, but cyclic references are illegal.

The Standard is contradictory about the instantiable clause. It’s safe to assume that “instantiable”, as in all OO languages, means “can be instantiated” (a “non-instantiable” or “abstract” UDT would have no instances but could have instantiable subtypes). Use of typed Tables is only possible if the type is intantiable.

The Standard is also contradictory about the finality clause. It probably means (as in Java) that the new UDT may have no proper subtypes. That is, no further UDT may be created UNDER it. (There is a belief that the clause is for distinguishing distinct types from structured types. We don’t believe that. We repeat that a distinct type s a UDT defined with “AS <predefined type>”, all other UDTs are structured types.) If the CREATE TYPE statement contains any Attribute definitions, then NOT FINAL is mandatory. Otherwise either FINAL or NOT FINAL is mandatory.

The reference specification is either “user-generated” (REF USING), “derived” (REF <Attribute list>) or “system-generated” (REF IS SYSTEM GENERATED). With the default – REF IS SYSTEM GENERATED – there is no further specification because values are implementation-dependent. With the main option – REF USING – there is a further specification: the <Attribute name> list. Because (as in pure relational systems) a row’s uniqueness should depend on Column values, the <Attribute name>s here would be an indirect list of a “key value”.

The cast option is legal only for distinct types. The cast’s source and target <data type>s are the <predefined type> specified earlier in the CREATE TYPE statement and the name of the UDT, respectively.

A UDT’s methods are defined in pretty well the same way as functions; indeed a method is a function, albeit a function which cannot live without its UDT and which is called in a slightly different way than a regular function. The default method characteristics are LANGUAGE SQL, PARAMETER STYLE SQL, NOT DETERMINISTIC, CONTAINS SQL, NOT NULL CALL. (Note: Although the Standard says that NOT DETERMINISTIC is the default, it’s hard to believe that this is the true intent.) The difference between an “overriding” and an “original” method is that an overriding method “overrides” an already-existing method with the same name, in some supertype. Method signatures must be distinct. Remember that, as noted in the tutorial section of this chapter, structured UDTs have implicitly-created methods: one constructor method for the UDT as a whole, n mutator methods (one for each Attribute), and n observer methods (one for each Attribute). As for distinct types: apparently no implicitly-created methods exist for them, but there may be a “transform” function for casting to/from host languages.

Distinct Types

The simplest way to make a UDT is with:

CREATE TYPE <UDT name> AS <predefined data type> FINAL;

which – since it has no Attribute list – is not a specification of a “structured type” with all the OO trimmings. Instead, UDTs made this way are called “distinct types”. The main idea behind distinct types is that they constitute enforceable domains. For example, suppose we define two currency data types:

CREATE TYPE euro AS DECIMAL(8,2) FINAL;

CREATE TYPE mark AS DECIMAL(8,2) FINAL;

If we now attempt to pass a euro value to a mark target, we will fail – the distinct type provides us with a simple form of type checking that we cannot achieve using an SQL Domain.

Distinct types have methods, just like structured types. However, they are limited in various ways. Usually, when we make generalized comments about object-orientation analogies – such as “UDTs are classes” – we have structured types in mind, not distinct types.

CREATE TABLE Statement

We’ve already shown you the CREATE TABLE statement in our chapter on Tables. However, there are two options to add to that description now, for Object/Relational Tables.

The first option has to do with the creation of typed Tables. Here’s the syntax:

CREATE TABLE <Table name>
OF <UDT name>
...
[ REF IS <self-referencing Column name>
   {SYSTEM GENERATED | USER GENERATED | DERIVED} ]

In addition to the usual rules for Table creation, here are other rules to follow when you use this syntax:

  • You must have the USAGE Privilege on <UDT name>.

  • <UDT name> must specify a structured UDT.

  • The optional REF clause may appear within the Table element list (that is, inside the parentheses along with the Column and Constraint definitions). The <self-referencing Column name> must be a valid and distinct <identifier>. The three options associated with “REF IS ...” are (like CREATE TYPE's REF clause) either “user-generated”, “derived” or “system-generated”. With the default – SYSTEM GENERATED – the Column values are implementation-dependent. With the main option – DERIVED – Column values come from the Table’s primary key. A typed Table always has one self-referencing Column, and its position within the Table is fixed: it is always the first Column in the Table. The “REF IS ...” clause only specifies a few details about the self-referencing Column. Note: typed Tables are known (sometimes) as “referenceable Tables” and (rarely) as “object UDTs”. The second option has to do with the creation of subtables. Here’s the syntax:

    CREATE TABLE <subtable name>
    ...
    UNDER <supertable name>
    ...
    

In addition to the usual rules for Table creation, here are other rules to follow when you use this syntax:

  • You must have UNDER privileges on <supertable name>.

  • Both the subtable and the supertable must be typed Tables.

  • The subtable/supertable relationship must mirror the subtype/supertype relationship. For example: if ty1 is a subtype of ty2, and ta1 is a typed Table based on ty1, and ta2 is a typed Table based on ty2, then it is legal to create ta1 UNDER ta2 – but it is not legal to create ta2 UNDER ta1.

CREATE CAST Statement

With UDTs, you’ll need some way of assigning UDT values to predefined <data type> targets, or vice versa. Further, if you have two distinct UDTs – say UDT1 and UDT2 – then you’ll also need some way to assign values based on UDT1 to UDT2 targets, or vice versa. This might all be complicated by the fact that subtypes contain their supertypes’ Attributes, which should imply a degree of mutual assignability. In sum, you need the ability to “cast” a UDT to or from another data type. The solution is to create a user-defined cast for the chosen <data type>s, with the CREATE CAST statement. The required syntax for the CREATE CAST statement is:

CREATE CAST (<source type> AS <target type>)
WITH <specific routine designator>
[ AS ASSIGNMENT ]

A UDT value is assignable to a UDT target only if the source value is a subtype of the target UDT. There can be only one user-defined cast for any given combination of source and target types. Either <source type> or <target type> must be either UDT or REF, but the other operand can be any <data type>. To execute CREATE CAST, you must be the owner of both the cast function identified by <specific routine designator>) and the <target type> (if it is a UDT).

The <specific routine designator> is usually a signature – for example, “FUNCTION f (SMALLINT)” can be a <specific routine designator>. It is also possible to identify a routine using a specific name, which is a unique, possibly mangled, name that is usually assigned by the DBMS. The cast function identified by the <specific routine designator> must have the general form:

FUNCTION <name> (<source type>) RETURNS <target type>
... DETERMINISTIC
... { NO SQL | CONTAINS SQL }
...

AS ASSIGNMENT (which is not the default) means that the cast is “implicitly invocable” during assignment operations. That is, if x is UDT_1, and there is an implicitly-invocable cast, then this is a legal assignment:

x = 'F'

Otherwise, this is a legal assignment:

x = CAST ('F' AS UDT_1)

[Obscure Rule] An assignment might involve a choice between several possible “implicitly invocable” cast functions. The DBMS picks the one that fits these criteria:

  • It’s an implicitly-invocable cast function – i.e.: it was mentioned with AS ASSIGNABLE.

  • The cast function’s result <data type> is the target’s declared type.

  • The cast function has one parameter, and that parameter has a “declared type”, and that declared type is in the “type precedence list” of the declared type of the source value. If there are two cast functions which meet all these requirements, the DBMS picks the one whose declared type is highest in the “type precedence list”.

Tip

If you decide to make a user-defined cast for “a to b”, be reciprocal: make a user-defined cast for “b to a” as well. Users expect all casts to be two-way.

Tip

For hierarchy’s sake: if A1 is a supertype of A2, and B1 is a supertype of B2, then make casts from A1 to B1 and from A2 to B2 – not A1 to B2 nor A2 to B1. That is, cast from super to super and from sub to sub.

Here is a user-defined cast for the BOOK_UDT type which we used in earlier examples. The UDT has three Attributes (title CHAR(40), buying_price DECIMAL(9,2), selling_price DECIMAL(9,2)). Since each Attribute’s type is either CHAR or castable to CHAR, we’ll make a function which simply concatenates each Attribute into one big character string. Here’s how:

CREATE FUNCTION f (book_udt) RETURNS CHAR(80)
   DETERMINISTIC CONTAINS SQL
   BEGIN
      DECLARE c CHAR(60);
      SET c = book_udt.title()
              ||
              CAST(book_udt.buying_price() AS char(20))
              ||
              CAST(book_udt.selling_price() AS char(20));
      RETURN (u);
   END;

Now that we have a function, we can make a cast for the function:

CREATE CAST (book_udt AS CHAR(60))
WITH FUNCTION f (book_udt);

Now that we have a cast, we can use BOOK_UDT in a CAST expression:

SELECT CAST(book_udt AS CHAR(60))
FROM   Books;

Thus, we can design our own external representation of a UDT, without worrying about its internal representation.

For distinct type UDTs only, the DBMS automatically creates two casts. For example, if you make this UDT:

CREATE TYPE longitude AS REAL FINAL;

the DBMS will make these two casts for it:

CREATE CAST (longitude AS REAL) ... AS ASSIGNMENT;
CREATE CAST (REAL AS longitude) ... AS ASSIGNMENT;

Now, suppose you have a variable lo of type LONGITUDE. Because of the automatic cast, it’s legal to cast a <literal> to LONGITUDE, for example:

SET lo = CAST (33.59 AS longitude);

Not only that, though! The casts are AS ASSIGNMENT casts, so this SQL statement is also legal:

SET lo = 33.59;

The AS ASSIGNMENT feature, which got into the SQL Standard at a very late stage, is bound to confuse some people: they’ll think that “distinct types” are just a way to rename predefined <data type>s. That would be a delusion. The reason that SET lo = 33.59; is legal is that there is an implicit cast (the DBMS makes it silently) which happens to be an implicitly invoked cast (it contains an AS ASSIGNMENT clause).

Tip

Implicitly invocable casts are convenient but error-prone. If you don’t want to allow loosely-typed phrases like money_column = 5.00, you should DROP the cast that the DBMS created and then explicitly make it again – but without specifying AS ASSIGNMENT in the definition.

CREATE ORDERING Statement

For UDT values, you’ll need some way of knowing that “udt-value-1 is greater than udt-value-2” – or less, or equal. Otherwise you’ll be unable to use UDTs in search conditions, or in ORDER BY clauses, or in GROUP BY clauses, or after the word DISTINCT. The solution is to create an ordering for the UDT with the CREATE ORDERING statement. The required syntax for the CREATE ORDERING statement is:

CREATE ORDERING FOR <UDT name>
{EQUALS ONLY BY | ORDER FULL BY}
<ordering category>

   <ordering category ::=
   RELATIVE WITH <specific routine designator> |
   HASH WITH <specific routine designator> |
   STATE [ <specific name> ]

A UDT value is comparable to another UDT value only if both UDTs are in the same subtype family. There can be only one ordering for a UDT. To execute CREATE ORDERING, you must be the owner of both the UDT and routine named in the <ordering category>. Since UDTs in the same type family are related, all orderings for UDTs within the same type family must all be defined either with EQUALS ONLY BY or with ORDER FULL BY.

A <specific routine designator> is usually a signature. For example, FUNCTION f (SMALLINT) can be a <specific routine designator>. It is also possible to identify a routine using a <specific name>: an additional routine name, used to uniquely identify a routine. A routine’s <specific name> is usually assigned by the DBMS. The ordering routine must be DETERMINISTIC and must not possibly modify SQL-data.

The <ordering category> “RELATIVE WITH <specific routine designator>” is legal only for maximal supertypes (i.e.: types that have no proper supertypes of their own). RELATIVE WITH functions have two parameters (both are UDT types) and return an INTEGER.

The <ordering category> “HASH WITH <specific routine designator>” is legal for subtypes, but only if their supertypes are also defined with HASH WITH orderings. Typically, the <specific routine designator> named will identify an observer method for one of the UDT’s Attributes. HASH WITH functions have one parameter (its type is UDT) and return a predefined <data type>.

The <ordering category> STATE is legal only for maximal supertypes. If you specify STATE, the DBMS implicitly creates a function named EQUALS. It is the duty of the EQUALS function to return TRUE if all Attribute values are equal.

Because the ordering routine is user-defined, it’s impossible to say exactly what the various ordering categories imply. The following is a generalization:

  • An EQUALS function returns TRUE if all values are equal; otherwise it returns FALSE. It never returns UNKNOWN.

  • A RELATIVE function returns an integer value less than zero for less than values, returns zero for equal values and returns an integer greater than zero for greater than values.

  • A HASH function returns an absolute ordering within the predefined <data type>.

Here’s an example of CREATE ORDERING:

CREATE ORDERING FOR book_udt
ORDER FULL BY HASH WITH
   FUNCTION title (book_udt);   /* observer function for title Attribute */

There is no need to create an ordering for a distinct type. As with casts, the DBMS implicitly does the following when CREATE TYPE is executed:

CREATE ORDERING FOR <UDT name>
ORDER FULL BY HASH WITH
   FUNCTION <Schema name>.<cast-to-source identifier> (<UDT name>);

Comparisons of distinct types work exactly the same way as comparisons of the predefined <data type>s they are based on.

Tip

When you create a maximal supertype, make sure to execute a CREATE ORDERING statement for that type. When you create a subtype, the matter is less urgent because subtypes inherit from their supertypes.

Other Processes for Object/Relational Users

There are several other SQL statements and expression which are useful, but not vital, when working with Object/Relational SQL. All of these statements are simply analogs of statements we have already seen; only the type of Schema Object is different. We therefore limit ourselves to noting their existence here, in the hope that you’ll find the details to be intuitive.

ALTER TYPE Statement

The ALTER TYPE statement lets you change the definition of a UDT. The required syntax for the ALTER TYPE statement is:

ALTER TYPE <UDT name> <alter type action>

   <alter type action> ::=
   ADD ATTRIBUTE <Attribute definition> |
   DROP ATTRIBUTE <Attribute name>

CREATE METHOD Statement

The CREATE METHOD statement lets you make a new method – it’s actually a special form of the CREATE FUNCTION statement. The required syntax for the CREATE METHOD statement is:

CREATE [ INSTANCE | STATIC ] METHOD <routine name>
(SQL parameter declaration list)
RETURNS <data type>
FOR <UDT name>
[ SPECIFIC <specific name> ]
<routine body>

(See our chapter on procedures for a definition of <routine body>.)

A method is a function which is associated with a UDT. Methods and functions can look quite different, even when they’re the same thing. Consider these two examples:

CREATE FUNCTION f                    CREATE METHOD f
(book_udt)                           ()
RETURNS FLOAT                        RETURNS FLOAT
                                     FOR book_udt

These examples – shorn of unnecessary detail – illustrate a confusing fact: the function and the method are the same thing! When you want to list a method’s parameters, you should “augment” the parameter list by adding one parameter, called the “self” parameter, in ordinal position 1.

CREATE TRANSFORM Statement

The CREATE TRANSFORM statement lets you make a method that will be used in casting for host languages. The required syntax for the CREATE TRANSFORM statement is:

CREATE {TRANSFORM | TRANSFORMS} FOR <UDT name>
<group name>
(<transform element> [ {,<transform element>} ... ])

   <transform element> ::=
   TO SQL WITH <specific routine designator> |
   FROM SQL WITH <specific routine designator>

A TRANSFORM is an SQL-invoked function that is automatically invoked when you transfer UDT values to and from a host language program. It identifies one or two functions – each identified by a <group name> (the name is either an <identifier> of the <keyword> DEFAULT).

The TRANSFORM's TO SQL function casts a predefined <data type> value to a UDT value and gets invoked whenever a UDT value is passed to the DBMS by a host language program or external routine. The TRANSFORM's FROM SQL function casts a UDT value to a predefined <data type> value and gets invoked whenever a UDT value is passed from the DBMS to a host language program or external routine.

DROP CAST Statement

The DROP CAST statement lets you destroy a user-defined cast. The required syntax for the DROP CAST statement is:

DROP CAST (<source type> AS <target type> {CASCADE | RESTRICT}

DROP ORDERING Statement

The DROP ORDERING statement lets you destroy an ordering for a UDT. The required syntax for the DROP ORDERING statement is:

DROP ORDERING FOR <UDT name> {CASCADE | RESTRICT}

DROP TRANSFORM Statement

The DROP TRANSFORM statement lets you destroy a transform. The required syntax for the DROP TRANSFORM statement is:

DROP TRANSFORM {ALL | <group name>} FOR <UDT name> {CASCADE | RESTRICT}

DROP TYPE Statement

The DROP TYPE statement lets you destroy a UDT. The required syntax for the DROP TYPE statement is:

DROP TYPE <UDT name> {CASCADE | RESTRICT}

NEW Statement

The NEW statement lets you invoke a method on a newly-constructed value of a structured type. The required syntax for the NEW statement is:

NEW <method invocation>

TREAT Statement

The TREAT statement lets you modify the declared type of a structured type expression to a type of one of its supertypes. The required syntax for the TREAT statement is:

TREAT (<subtype expression> AS <target UDT>)

DEREF Function

The DEREF function lets you obtain the data value referenced by a <reference type>. The required syntax is:

<deref function> ::=
DEREF (<reference type> expression>)

SPECIFICTYPE Function

The SPECIFICTYPE function returns an SQL_TEXT character string containing the fully-qualified name of the UDT to which a given value belongs. The required syntax is as follows.

<specifictype function> ::=
SPECIFICTYPE (UDT value expression)

Dereference Operation

The <dereference operation> allows you to access a Column of the row identified by a REF value. The required syntax is as follows (for details, see Chapter 12 “Reference Types”).

<dereference operation> ::=
reference_argument -> <attribute name>

TYPE Predicate

The TYPE Predicate returns TRUE if a given value is within the set of values defined by a UDT (or by a list of UDTs which are all in the same subtype family). The required syntax is as follows

<type predicate> ::=
<UDT value expression> IS [ NOT ] OF
([ ONLY ] <UDT-name-1> [ {,<UDT-name-2>} . . . ])

Is object/relational really object-oriented?

Is O/R really OO? Lets go through the features that are considered to be standard parts of an object-oriented specification and see whether SQL3 delivers them. (In this list, “UDT” means “UDTs which are structured types”.)

Classes

UDTs are classes. SQL3 vocabulary may include words like “type family” where most languages’ vocabulary would have “class family”, but the essential functionality is the same.

Encapsulation

SQL3 keeps data representation separate from data access, but does not allow for PRIVATE and PUBLIC Attribute definitions – those are matters for GRANT and REVOKE to handle.

Extensibility

It is possible to put together packages consisting of new type families, methods and representations. Such packages exist today, although to a large extent – for efficiency reasons – the methods are external functions written in some other language.

Inheritance

A UDT may be defined under another UDT. Subtypes inherit the methods and Attributes of their supertypes. Inheritance is single, as in most pure object-oriented languages.

Instantiation

SQL3’s UDTs may be used in place of predefined <data type>s in SQL-data statements. Rows in typed Tables may be treated as objects, complete with object identifiers.

Introspection

SQL already has a clean way to find out what a database’s structural components are: INFORMATION_SCHEMA. It should be no big deal to put wrappers around SQL statements that SELECT from INFORMATION_SCHEMA. Unfortunately, it is more likely that wrappers will instead be put around the unwieldy CLI Catalog functions (see our chapter on CLI Catalog functions).

Polymorphism

Multiple methods in a type family may have the same name. The DBMS will choose the specific method based on the signature. This is Java-like.

It should be admitted that, like any hybrid, SQL3 will look bad to purists on both sides. The relational theorists will note that SQL3’s new features – especially self-referencing columns – allow for “pointer-chasing” demons which we thought were buried along with pre-SQL DBMSs. The object oriented aficionados may decry the ease with which encapsulation can be broken, at least in appearance, by recourse to SQL-92 manipulation methods. From both directions, SQL3 will get flak as a chimera – a monster with a lion’s head on a goat’s body with a serpent’s tail – which certainly betrays its origins as an animal designed by a committee.

All such criticisms are beside the point, though. SQL3 was put together to meet certain needs which are urgent in the current market environment:

  • To meld with object-oriented host languages, especially Java.

  • To allow for multi-media extensions.

  • To pre-empt the chaos which would result from non-standard vendor definitions, or competition from pure OO DBMSs.

Given these goals, Object/Relational SQL3 is a reasonable development. We now wait to see whether users will find that it truly meets these urgent needs.

Dialects

Some SQL vendors have already introduced Object/Relational features. We give especial note to:

UniSQL

A smaller group, which gained note in 1992 for being the first to introduce an Object/Relational SQL (though some O/R features can be traced all the way back to 1980s versions of DB2 and Ingres).

IBM

Offers “relational extenders”, which are packages containing UDTs, user-defined functions, triggers and constraints. At the moment, IBM’s “universal database” is probably the implementation which is closest to the SQL3 Object/Relational spec.

Informix

Bought Illustra (one of the Object/Relational pioneers) and since then has won some praise for its “DataBlades”™ plug-ins, which are described as data-type extensions – that is, Informix emphasizes the utility of defining your own data types, rather than the purported benefits of OO.

Oracle

Oracle8’s “cartridges” are another name for what we might call “class libraries” in pure OO contexts. So far, most cartridges have been multimedia packages, for example “Video cartridges” and “Spatial cartridges”. For performance reasons, Oracle’s Object/Relational features have not been popular among its users, till recently.

And finally, watch Sun. The world is waiting to see the impact of “JavaBlend”™ which purportedly will merge Java with SQL3’s Object/Relational extensions.