Using the SQLAlchemy Dialect

SQLAlchemy is a popular Object-Relational Mapping (ORM) tool for Python.

The CrateDB Python client library provides support for SQLAlchemy. A CrateDB dialect is registered at installation time and can be used without further configuration.

The CrateDB Python client library works with SQLAlchemy versions 1.0, 1.1 and 1.2.

Note

This page documents the CrateDB SQLAlchemy dialect.

For help using the CrateDB Database API client, consult the client documentation.

See also

Supplementary information about the CrateDB SQLAlchemy dialect can be found in the data types appendix and the compatibility notes.

For general help using SQLAlchemy, consult the SQLAlchemy tutorial or the SQLAlchemy library .

Table of Contents

Connecting

Database URLs

An SQLAlchemy database is represented by special type of Uniform Resource Locator (URL) called a database URL.

The simplest database URL for CrateDB looks like this:

crate://<HOST>

Here, <HOST> is the node host string.

A host string looks like this:

<HOST_ADDR>:<PORT>

Here, <HOST_ADDR> is the hostname or IP address of the CrateDB node and <PORT> is a valid psql.port number.

Example host strings:

  • localhost:4200
  • crate-1.vm.example.com:4200
  • 198.51.100.1:4200

Tip

If <HOST> is blank (i.e. just crate://) then localhost:4200 will be assumed.

Getting a Connection

Create an Engine

You can connect to CrateDB using the create_engine method. This method takes a database URL.

Import the sa module, like so:

>>> import sqlalchemy as sa

To connect to localhost:4200, you can do this:

>>> engine = sa.create_engine('crate://')

To connect to crate-1.vm.example.com:4200, you would do this:

>>> engine = sa.create_engine('crate://crate-1.vm.example.com:4200')

If your CrateDB cluster has multiple nodes, however, we recommend that you configure all of them. You can do that by specifying the crate:// database URL and passing in a list of host strings passed using the connect_args argument, like so:

>>> engine = sa.create_engine('crate://', connect_args={
...     'servers': ['198.51.100.1:4200', '198.51.100.2:4200']
... })

When you do this, the Database API layer will use its round-robin implementation.

The client does not validate SSL server certificates by default. To configure this behaviour, SSL verification options can be passed in via connect_args too:

>>> engine = sa.create_engine(
...     'crate://',
...     connect_args={
...         'servers': ['198.51.100.1:4200', '198.51.100.2:4200'],
...         'verify_ssl_cert': True,
...         'ca_cert': '<PATH_TO_CA_CERT>',
...     }
... )

Here, <PATH_TO_CA_CERT> should be replaced with the path to the correct CA certificate.

Get a Session

Once you have an CrateDB engine set up, you can create and use an SQLAlchemy Session object to execute queries:

>>> from sqlalchemy.orm import sessionmaker

>>> Session = sessionmaker(bind=engine)
>>> session = Session()

See also

The SQLAlchemy has more documentation on sessions.

Tables

Table Definition

Here is an example SQLAlchemy table definition using the declarative system:

>>> from sqlalchemy.ext import declarative
>>> from crate.client.sqlalchemy import types
>>> from uuid import uuid4

>>> def gen_key():
...     return str(uuid4())

>>> Base = declarative.declarative_base(bind=engine)

>>> class Character(Base):
...
...     __tablename__ = 'characters'
...
...     id = sa.Column(sa.String, primary_key=True, default=gen_key)
...     name = sa.Column(sa.String)
...     quote = sa.Column(sa.String)
...     details = sa.Column(types.Object)
...     more_details = sa.Column(ObjectArray)
...     name_ft = sa.Column(sa.String)
...     quote_ft = sa.Column(sa.String)
...
...     __mapper_args__ = {
...         'exclude_properties': ['name_ft', 'quote_ft']
...     }

In this example, we:

  • Define a gen_key function that produces UUIDs
  • Set up a Base class for the table
  • Create the Characters class for the characters table
  • Use the gen_key function to provide a default value for the id column (which is also the primary key)
  • Use standard SQLAlchemy types for the id, name, and quote columns
  • Use the Object extension type for the details column
  • Use the ObjectArray extension type for the more_details column
  • Set up the name_ft and quote_ft fulltext indexes, but exclude them from the mapping (so SQLAlchemy doesn’t try to update them as if they were columns)

Tip

This example table is used throughout the rest of this document.

See also

The SQLAlchemy documentation has more information about working with tables.

Extension Types

In the example SQLAlchemy table definition above, we are making use of the two extension data types that the CrateDB SQLAlchemy dialect provides.

See also

The appendix has a full data types reference.

Object

Objects are a common, and useful, data type when using CrateDB, so the CrateDB SQLAlchemy dialect provides a custom Object type extension for working with these values.

Here’s how you might use the SQLAlchemy Session object to insert two characters:

>>> # use the crate engine from earlier examples
>>> Session = sessionmaker(bind=crate)
>>> session = Session()

>>> arthur = Character(name='Arthur Dent')
>>> arthur.details = {}
>>> arthur.details['gender'] = 'male'
>>> arthur.details['species'] = 'human'
>>> session.add(arthur)

>>> trillian = Character(name='Tricia McMillan')
>>> trillian.details = {}
>>> trillian.quote = "We're on a space ship Arthur. In space."
>>> trillian.details['gender'] = 'female'
>>> trillian.details['species'] = 'human'
>>> trillian.details['female_only_attribute'] = 1
>>> session.add(trillian)
>>> session.commit()

Note

The information we supply via the details column isn’t defined in the original SQLAlchemy table definition. These details can be specified when you create the column in CrateDB, or you can configure the column to support dynamic values.

Note

Behind the scenes, if you update an Object property and commit that change, the UPDATE statement sent to CrateDB will only include the data necessary to update the changed subcolumns.

ObjectArray

In addition to the Object type, the CrateDB SQLAlchemy dialect also provides a ObjectArray type, which is structured as a list of dictionaries.

Here’s how you might set the value of an ObjectArray column:

>>> arthur.more_details = [{'foo': 1, 'bar': 10}, {'foo': 2}]
>>> session.commit()

If you append an object, like this:

>>> arthur.more_details.append({'foo': 3})
>>> session.commit()

The resulting object will look like this:

>>> arthur.more_details
[{'foo': 1, 'bar': 10}, {'foo': 2}, {'foo': 3}]

Caution

Behind the scenes, if you update an ObjectArray and commit that change, the UPDATE statement sent to CrateDB will include all of the ObjectArray data.

Querying

When the commit method is called, two INSERT statements are sent to CrateDB. However, the newly inserted rows aren’t immediately available for querying because the table index is only updated periodically (one second, by default, which is a short time for me and you, but a long time for your code).

You can request a table refresh to update the index manually:

>>> refresh("characters")

Note

Newly inserted rows can still be queried immediately if a lookup by primary key is done.

Here’s what a regular select might look like:

>>> query = session.query(Character).order_by(Character.name)
>>> [(c.name, c.details['gender']) for c in query]
[('Arthur Dent', 'male'), ('Tricia McMillan', 'female')]

You can also select a portion of each record, and this even works inside Object columns:

>>> sorted(session.query(Character.details['gender']).all())
[('female',), ('male',)]

You can also filter on attributes inside the Object column:

>>> query = session.query(Character.name)
>>> query.filter(Character.details['gender'] == 'male').all()
[('Arthur Dent',)]

To filter on an ObjectArray, you have to do something like this:

>>> from sqlalchemy.sql import operators

>>> query = session.query(Character.name)
>>> query.filter(Character.more_details['foo'].any(1, operator=operators.eq)).all()
[(u'Arthur Dent',)]

Here, we’re using the any method along with the eq Python operator to match the value 1 against the foo key of any dictionary in the more_details list.

Only one of the keys has to match for the row to be returned.

This works, because ObjectArray keys return a list of all values for that key, like so:

>>> arthur.more_details['foo']
[1, 2, 3]

Querying a key of an ObjectArray column will return all values for that key for all matching rows:

>>> query = session.query(Character.more_details['foo']).order_by(Character.name)
>>> query.all()
[([1, 2, 3],), (None,)]

Aggregate Functions

SQLAlchemy supports different ways to count result rows. However, because CrateDB doesn’t support subqueries, counts must be written in one of the following two ways.

This counts the number of character records by counting the number of id values in the table:

>>> session.query(sa.func.count(Character.id)).scalar()
2

Note

If you’re doing it like this, the column you select must be the primary key.

And this counts the number of character records by selecting all columns, and then counting the number of rows:

>>> session.query(sa.func.count('*')).select_from(Character).scalar()
2

You can layer in calls to group_by and order_by when you use one of these methods, like so:

>>> session.query(sa.func.count(Character.id), Character.name) \
...     .group_by(Character.name) \
...     .order_by(sa.desc(sa.func.count(Character.id))) \
...     .order_by(Character.name).all()
[(1, u'Arthur Dent'), (1, u'Tricia McMillan')]