Crate for Pythonistas with SQLAlchemy

Author
Philipp Bogensberger
Filed under
Date
July 18, 2014

In this tutorial I want you to show how to interact with Crate using SQLAlchemy. SQLAlchemy is a Python SQL toolkit and Object Relational Mapper.

Crate is a scaling document based database with an sql interface which you can just use out of the box. It's perfect for people who don't want to worry about sharding, replication and scalability.

Installing Crate

The simplest way to launch Crate in a sandboxed environment is to just execute (this will download the latest stable tarball, extract and start crate data):

bash -c "$(curl -L try.crate.io)"

And check if it’s running by opening the admin interface at:

http://localhost:4200/admin

Alternatively you can install Crate through your favorite package manager. (instructions).

Install SQLAlchemy and Crate Client

Create a virtualenv and install the Crate client and SqlAlchemy:

pip install -U crate sqlalchemy

Creating a Simple App

For this tutorial we are going to play with a simple book database. The books will have a title, an author, a publication date and some categories.

Crate and SQLAlchemy Setup

Let's create a file called books.py with the following content:

import uuid
import sqlalchemy as sa
from datetime import datetime
from sqlalchemy import create_engine, Column, String, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
DBSession = scoped_session(sessionmaker())
Base = declarative_base()
engine = create_engine(
    'crate://',
    connect_args={
        'servers': ["http://localhost:4200"],
    },
)
DBSession.configure(bind=engine)
Base.metadata.bind = engine

Crate is a clustered database that usually consists of multiple nodes. It's recommended to add all nodes into servers lists. This enables the DB-API layer to use round-robin to distribute the load and skip a server if it becomes unavailable.

Now that we have a running database connection, we can create the Book model. The model has a primary key called id. We use random UUID's as keys. It also has two string fields, title and author and a publication_date timestamp. Unlike other databases that are usually used with SQLAlchemy, Crate isn't a RDBMS but instead it is document oriented which means that the database should be modeled without relationships.

Crate supports complex objects and arrays. We'll use an array of strings for the categories.

Add the following class to books.py:

class Book(Base):
    __tablename__ = 'books'
    id = Column(String, default=lambda: str(uuid.uuid4()), primary_key=True)
    title = Column(String)
    author = Column(String)
    publication_date = Column(DateTime)
    categories = Column(String, default=[])

Currently the Crate Client (v0.10.5) doesn't support table creation with SQLAlchemy’s ORM and table reflection. So we have to use the admin interface (http://localhost:4200/admin), crash the Crate shell which is delivered with Crate or raw sql via SQLAlchemy.

Crate is great at fulltext searching. To enable the fulltext search it's necessary to define a fulltext index with an analyzer. We use the built-in english analyzer but Crate also provides the possibility to create custom analyzers or to extend built-in analyzers. For further information on fulltext search take a look at the documentation.

This is the create table statement:

CREATE TABLE books (
    id string primary key,
    publication_date timestamp,
    title string INDEX using fulltext with(analyzer='english'),
    author string,
    categories array(string)
);

Now you can create some books:

hitchhiker = Book()
hitchhiker.title = "The Hitchhiker's Guide to the Galaxy"
hitchhiker.author = "Douglas Adams"
hitchhiker.publication_date = datetime(1979, 10, 12)
hitchhiker.categories = ["novel", "documentation", "comedy"]

restaurant = Book()
restaurant.title = "The Restaurant at the End of the Universe"
restaurant.author = "Douglas Adams"
restaurant.publica tion_date = datetime(1980, 1, 1)
restaurant.categories = ["novel", "documentation", "comedy"]

big_red = Book()
big_red.title = "Monty Python's Big Red Book"
big_red.author = "Monty Python"
big_red.publication_date = datetime(1971, 9, 30)
big_red.categories = ["comedy"]

DBSession.add(hitchhiker)
DBSession.add(restaurant)
DBSession.add(big_red)
DBSession.commit()

client = Base.metadata.bind.raw_connection().connection.client
client.sql("refresh table books")

If you are already familiar with SQLAlchemy you know that we add the created books to the current session with DBSession.add() and you may think that we call DBSession.commit() to commit the current instructions. But that's not fully true. Unlike other databases that are usually used with SQLAlchemy, Crate is connected stateless via HTTP. So there are no sessions and no transaction support.

In SQLAlchemy a Session object is used to query the database. This Session object contains a rollback method that in Crate won't do anything at all. Similarly, the commit method will only flush but not actually commit, as there is no commit in Crate.

After INSERT statements are sent to the database the newly inserted rows aren't immediately available for search because the index is only updated periodically (every 1 seconds per default) so we force a refresh of the table.

Querying Books

To fetch all books of a specific author sorted by publication date you can do the following query:

books = DBSession.query(Book).filter_by(author='Douglas Adams') \
        .order_by(Book.publication_date).all()
for book in books:
    print("{0} - {1}".format(book.title, book.publication_date))

Now you can execute the script:

python books.py
The Hitchhiker's Guide to the Galaxy
The Restaurant at the End of the Universe

We also can use SQLAlchemy to query how many books each author has written by using a group_by clause:

authors = DBSession.query(sa.func.count(Book.id), Book.author) \
          .group_by(Book.author) \
          .order_by(sa.desc(sa.func.count(Book.id))).all()
print(authors)
[(2, u'Douglas Adams'), (1, u'Monty Python')]

To perform a fulltext search we use the match function:

books = DBSession.query(Book.title) \
        .filter(func.match(Book.title, 'monty')).all()
for book in books:
    print(book.title)
Monty Python's Big Red Book

Further Information

You can find the full example code here.

If you want to get a deeper insight into Crate and SQLAlchemy you should visit the lovely microblog tutorial. It shows how to integrate Crate and SQLAlchemy into a Pyramid web app. The microblogging tutorial also explains how to do user authentication and how to set up a production environment.

Back to topAll posts