The Guide for Time Series Data Projects is out.

Download now
Skip to content
Blog

Crate for Pythonistas with SQLAlchemy

In this tutorial, I would like to show you how to interact with Crate using SQLAlchemy. SQLAlchemy is a Python SQL toolkit and Object Relational Mapper.

CrateDB is a scaling document-based database with an SQL interface that you can use out of the box. It's perfect for people who don't want to worry about sharding, replication, and scalability.

Installing Crate

The simplest way to launch Crate in a Docker container (this command will download the latest Docker image):

docker run --publish 4200:4200 --publish 5432:5432 --pull=always crate:latest -Cdiscovery.type=single-node

And check if it's running by opening the admin interface at:

http://localhost:4200/admin 

Alternatively, you can install Crate through your favorite package manager. (instructions).

Install SQLAlchemy and Crate Client

Create a virtualenv and install the Crate client and SqlAlchemy:

pip install -U crate sqlalchemy 


Creating a Simple App

For this tutorial we are going to play with a simple book database. The books will have a title, an author, a publication date and some categories.

Crate and SQLAlchemy Setup

Let's create a file  books.py with the following content:

import uuid
import sqlalchemy as sa 
from datetime import datetime 
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker

DBSession = scoped_session(sessionmaker())
Base = declarative_base()
engine = sa.create_engine( 'crate://', connect_args={ 'servers': ["http://localhost:4200"], }, )
DBSession.configure(bind=engine)
Base.metadata.bind = engine


Crate is a clustered database that usually consists of multiple nodes. It's recommended to add all nodes into servers lists. This enables the DB-API layer to use round-robin to distribute the load and skip a server if it becomes unavailable.

Now that we have a running database connection, we can create the Book model. The model has a primary key called id. We use random UUID's as keys. It also has two string fields, title and author, and a publication_date timestamp. Unlike other databases that are usually used with SQLAlchemy, CrateDB isn't an RDBMS instead, it is document-oriented which means that the database should be modeled without relationships.

CrateDB supports complex objects and arrays. We'll use an array of strings for the categories.

Add the following class to books.py:

class Book(Base):
    __tablename__ = 'books'

    id = sa.Column(sa.String, default=lambda: str(uuid.uuid4()), primary_key=True)

    title = sa.Column(sa.String)

    author = sa.Column(sa.String)

    publication_date = sa.Column(sa.DateTime)

    categories = sa.Column(sa.String, default=[])


CrateDB is a great solution for full-text search. To enable the fulltext search it's necessary to define a fulltext index with an analyzer. We use the built-in english analyzer but Crate also provides the possibility to create custom analyzers or to extend built-in analyzers. For further information on fulltext search take a look at the documentation.

Now, let's create a table:

CREATE TABLE books (
    id string primary key,
    publication_date timestamp,
    title string INDEX using fulltext with(analyzer='english'),
    author string,
    categories array(string)
); 


Now you can create some books:

hitchhiker = Book()
hitchhiker.title = "The Hitchhiker's Guide to the Galaxy"
hitchhiker.author = "Douglas Adams"
hitchhiker.publication_date = datetime(1979, 10, 12)
hitchhiker.categories = ["novel", "documentation", "comedy"]

restaurant = Book()
restaurant.title = "The Restaurant at the End of the Universe"
restaurant.author = "Douglas Adams"
restaurant.publication_date = datetime(1980, 1, 1)
restaurant.categories = ["novel", "documentation", "comedy"]

big_red = Book()
big_red.title = "Monty Python's Big Red Book"
big_red.author = "Monty Python"
big_red.publication_date = datetime(1971, 9, 30)
big_red.categories = ["comedy"]

DBSession.add(hitchhiker)
DBSession.add(restaurant)
DBSession.add(big_red)

DBSession.commit()

client = Base.metadata.bind.raw_connection().connection.client
client.sql("refresh table books")


If you are already familiar with SQLAlchemy you know that we add the created books to the current session with DBSession.add() and you may think that we call DBSession.commit() to commit the current instructions. But that's not fully true. Unlike other databases that are usually used with SQLAlchemy, CrateDB is connected stateless via HTTP. So there are no sessions and no transaction support.

In SQLAlchemy an Session object is used to query the database. This Session object contains a rollback method that in Crate won't do anything at all. Similarly, the commit method will only flush but not actually commit, as there is no commit in Crate.

After INSERT statements are sent to the database the newly inserted rows aren't immediately available for search because the index is only updated periodically (every 1 second per default) so we force a refresh of the table.

Querying Books

To fetch all books of a specific author sorted by publication date you can do the following query:

books = DBSession.query(Book).filter_by(author='Douglas Adams') \
    .order_by(Book.publication_date).all()

for book in books:
    print("{0} - {1}".format(book.title, book.publication_date))


Now you can execute the script:

python books.py
The Hitchhiker's Guide to the Galaxy
The Restaurant at the End of the Universe 


We also can use SQLAlchemy to query how many books each author has written by using a group_by clause:

authors = DBSession.query(sa.func.count(Book.id), Book.author) \
    .group_by(Book.author) \
    .order_by(sa.desc(sa.func.count(Book.id))).all()

print(authors)

[(2, u'Douglas Adams'), (1, u'Monty Python')]


To perform a full-text search we use the match function:

books = DBSession.query(Book.title) \
    .filter(sa.func.match(Book.title, 'monty')).all()

for book in books:
    print(book.title)
Monty Python's Big Red Book 


Further Information

If you want to get a deeper insight into Crate and SQLAlchemy, please visit our documentation.