Running Crate with Docker on AWS

Christian Haudum
Filed under
February 21, 2015

AWS Logo

Amazon's cloud (AWS) is a great way to run a Crate cluster. You can leverage all of the power of AWS's EC2, EBS (and more) to build a truly elastic data store. Here, we'll go over just the basics of getting a 3 node cluster running using Docker, although you can easily deploy Crate without Docker. We're just big fans of containers!

First, we'll assume that you have your AWS command line interface (awscli) set up and running properly, and that you're using the US West (N. Cal) availability zone. If you're using a different availability zone, you'll need to look up some different image IDs to match your zone. We'll mention it when it comes up.

We'll be setting up our instances using EC2-Classic, not VPC (Virtual Private Cloud). You can do it either way, but for simplicity we're just going to use classic EC2. For more information on this, see the AWS docs on EC2 & VPC.

Set up the user data script

Before we can set up the instances we need to supply a configuration script. This script is run on the instance after it is provisioned. You don't have to supply the script if, for example, you some other configuration tool like Puppet, Chef, Salt, etc. But for this example, we need to put the following code in a file named

# Update the instances:
yum -y update
# Crate the filesystems for Crate's data storage
mkfs.ext4 /dev/sdb
mkfs.ext4 /dev/sdc
# Make the directories to mount the drives
mkdir /data
mkdir /data/data{1,2}
# Mount the drives to the data directories
mount /dev/sdb /data/data1
mount /dev/sdc /data/data2
# Add a crate user and give it ownership of the data directories
adduser crate -s /sbin/nologin
chown -R crate.crate /data
# Install and start Docker
yum install -y docker
service docker start
# Get the latest Crate container for Docker
docker pull crate:latest

Since we aren't able to use multicast discovery, that's all we can do right now to set up Crate in advance. Once we have the instances running, we can then tell Crate where to find the rest of the cluster's nodes.

Set up the instances

Let's set up three instances. Don't forget to put your key name in to the –key-name flag. Notice that we pass the name of the user data script (above) to the argument –user-data:

$ aws ec2 run-instances --count 3 --image-id ami-4b6f650e \
  --instance-type m3.xlarge \
  --key-name YOUR_KEY_NAME \
  --user-data file:// \
  --block-device-mappings "[{\"DeviceName\": \"/dev/xvda\",\"Ebs\":{\"SnapshotId\": \"snap-14cff4d5\", \"VolumeSize\": 8, \"DeleteOnTermination\": true, \"VolumeType\": \"standard\"}}, {\"DeviceName\": \"/dev/sdb\",\"Ebs\":{\"VolumeSize\": 40, \"DeleteOnTermination\": true, \"VolumeType\": \"gp2\"}}, {\"DeviceName\": \"/dev/sdc\",\"Ebs\":{\"VolumeSize\": 40, \"DeleteOnTermination\": true, \"VolumeType\": \"gp2\"}}]"

If you're not in the US West availability zone, you'll need to look up the proper AMI image ID and EBS Snapshot ID for your zone. The AMI to search for is "amzn-ami-hvm-2014.09.1.x8664-ebs" and the snapshot is "amzn-ami-hvm-2014.09.1.x8664".

When you're finished you'll need to find the Public DNS names to SSH into the instances. To get a list of instances you can look them up in the EC2 Management Console, or use the command line:

$ aws ec2 describe-instances

Start Crate So, now let's log into an instance and get Crate configured and running:

$ ssh -i ~/.ssh/YOUR_KEY_FILE

First we'll set up some arguments to pass to Docker. We'll need to pass it the IP address of the instance, as well as the DNS names of the other nodes:


Obviously you'll need to replace the DNS entries with yours, just don't forget to add :4300 to the end of each entry.

Now we're ready to start Crate's Docker container:

$ docker run -d -p 4200:4200 -p 4300:4300 \
  --volume /data:/data \
  --env CRATE_HEAP_SIZE=8g \
  crate crate:latest \"/data/data1,/data/data2" \
    -Des.multicast.enabled=false \$PRIVATE_IP \$HOSTS

Run this on each instance, and you your cluster should be up and running.

Verify the cluster

To verify the cluster is running properly and that all nodes have found each other, run the following curl command:

$ curl -XPOST $PRIVATE_IP:4200/_sql?pretty -d '{ "stmt": "select name from sys.nodes" }'

You should get output that lists all three nodes, like so:

{ "cols" : [ "name" ],   "duration" : 23,   "rows" : [ [ "ElectroCute" ], [ "FooBar" ], [ "ToxicRainbow" ] ],   "rowcount" : 3 }

Your Crate cluster should be up and running!

As you can see, we had to do a bit of manual configuration to take into consideration that we did not have multicast discovery enabled. This causes us to have to know the hostnames of the instances in advance, so the cluster nodes can discover each other.

You can learn more about multicast and see that it is not supported by cloud providers, which is a bummer. Please check out our blog post on how to use Weave with Crate to circumvent this problem when using containers. Weave is a software-defined networking technology (SDN) that integrates with Docker. By using Weave, you can easily set up a truly elastic datastore leveraging all of the features of EC2.

Back to topAll posts