Logical replication setup between CrateDB clusters

Logical replication is a method of data replication between multiple clusters. As a publish/subscribe model, it allows a publishing cluster to make certain tables available for subscription. Subscribing clusters pull changes from a publication and replay them on their side.

Table of contents

Requirements

Note

Logical replication is available in CrateDB 4.8 and later.

To replicate data from one cluster to another, we need two CrateDB clusters. For this example, we spin them up locally as two single-node Docker containers.

A cluster isn’t limited to be either a publisher or subscriber, but can take both roles for different tables. However, for the sake of simplicity, we will refer to the two clusters as cluster-publishing and cluster-subscribing here.

sh$ docker run \
      --name cluster-publishing \
      --detach \
      --publish 4201:4200 \
      --publish 5433:5432 \
      --env CRATE_HEAP_SIZE=1g \
      crate:latest \
      -Cdiscovery.type=single-node
sh$ docker run \
      --name cluster-subscribing \
      --detach \
      --publish 4202:4200 \
      --publish 5434:5432 \
      --env CRATE_HEAP_SIZE=1g \
      crate:latest \
      -Cdiscovery.type=single-node

We apply an offset to the ports of each cluster, so they don’t overlap. The Admin UIs of cluster-publishing and cluster-subscribing are accessible at http://localhost:4201/ and http://localhost:4202/ respectively. All SQL statements discussed below can be executed via the corresponding Admin UI.

As the subscriber pulls changes from the publisher, the publisher needs to accept incoming network connections from the subscriber. In our example setup, this is automatically given because both containers are running on the same host. For reaching each other, they will use the special host.docker.internal address.

Setting up a publication

Note

All SQL statements in this section are executed on cluster-publishing.

Before setting up the replication, we create a simple table that is going to be the subject of replication. A table with the same name must not exist yet on cluster-subscribing.

CREATE TABLE doc.temperature_data (
  ts TIMESTAMP NOT NULL,
  temperature FLOAT NOT NULL
);

Next, a publication is created with CREATE PUBLICATION. The publication marks our table as being available for replication, but otherwise does not imply any activity yet.

CREATE PUBLICATION temperature_publication FOR TABLE doc.temperature_data;

To verify the publication was created successfully, we query the pg_publication system table. It should contain one row with the publication just added.

SELECT *
FROM pg_publication;

With this, we are already all set on the publication side.

Setting up a subscription

Note

All SQL statements in this section are executed on cluster-subscribing.

A subscription needs connection information to the publishing cluster as well as the name of the previously created publication to subscribe to.

Specifying the mode parameter with CREATE SUBSCRIPTION, the connection can be established via either the transport protocol or the PostgreSQL protocol. By setting the parameter to pg_tunnel, we use the PostgreSQL protocol.

CREATE SUBSCRIPTION temperature_subscription
CONNECTION 'crate://host.docker.internal:5433?user=crate&mode=pg_tunnel'
PUBLICATION temperature_publication;

Note

The password parameter is omitted, as local connections using the crate user don’t require one. If you are connecting remotely, provide user and password of a user with DQL privileges on published tables.

After a few seconds, the table doc.temperature_data should appear on cluster-subscribing. At this point, it is still empty as we didn’t insert any data yet.

To verify the operational status of the subscription, the system tables pg_subscription and pg_subscription_rel can be queried. The below query returns the name of the subscription (subname), its state (srsubstate), as well as any potential error message (srsubstate_reason).

SELECT subname, r.srrelid::TEXT, srsubstate, srsubstate_reason
FROM pg_subscription s
LEFT JOIN pg_subscription_rel r ON s.oid = r.srsubid;

srsubstate returns the value r, indicating the initial replication of the empty table has finished and is awaiting new changes.

Any subsequent INSERT, UPDATE or DELETE operations on cluster-publishing will now replicate to cluster-subscribing. On cluster-subscribing the table is read-only, meaning only the publisher may add or modify rows.