CrateDB Amazon Machine Images (AMIs)

An Amazon Machine Image (AMI) is a pre-configured virtual appliance that works with Amazon Elastic Compute Cloud (Amazon EC2).

CrateDB provides a number of AMIs. These AMIs come with Java 8 and CrateDB pre-installed, as well as a configuration that optimizes CrateDB for Amazon EC2.

The CrateDB AMIs are the recommended way to run CrateDB on Amazon EC2.

Table of Contents

AMI Naming Convention

The CrateDB AMI naming convention is:

crate-<VERSION>-<REVISION>-<AMI_REVISION>-<BASE_AMI>

In more detail:

  • <VERSION>-<REVISION> is the CrateDB version and its revision in the format w.x.y-z
  • <AMI_REVISION> is the AMI build revision
  • <BASE_AMI> is the full name of the Amazon Linux base image

For example:

crate-0.51.1-1-1-amzn-ami-hvm-2015.03.0.x86_64

Launching an AMI

AWS Web Interface

To use the AWS website to launch an instance, click the blue ‘Launch Instance’ button and find the available CrateDB AMIs under the Community AMIs section.

AWS GUI

Click the ‘select’ button on the AMI you wish to use and set the instance configuration. The most important options here are the number of instances you require and selecting a security group that opens ports 4200 and 4300.

AWS CLI

You can find the CrateDB AMI via the command line interface, in the format crate-<VERSION>-<REV>-<BASE_AMI>.

For example, to find all CrateDB AMIs available:

aws ec2 describe-images --filters "Name=name,Values=crate-*"
Terminal Output

If you are looking for particular CrateDB version, you can be more precise:

aws ec2 describe-images --filters "Name=name,Values=crate-0.51.1-1-amzn-*"

To run instances based on your AMI of choice, run the following command with the image-id of the CrateDB version you wish to run, the name of a security group that allows the ports CrateDB requires (4200, 4300) and if you want to use the EC2 API for inter-node discovery, a link to a user-data script.

aws ec2 run-instances --image-id ami-96702de1 --count x --instance-type m3.medium --user-data $(base64 user-data.sh) --key-name keyname --security-groups groupname

Configuration

Cloud-Init

The AMI creates a basic CrateDB configuration at first launch (using Cloud-Init). During startup the Cloud-Init process checks if ephemeral devices are attached to the instance. If the device contains no filesystem it will be formatted with the ext4 file system. Otherwise the filesystem already installed will be used. The mount point of each attached device is defined as /mnt/<DEV-NAME> where <DEV-NAME> is the name of the device as listed in /dev. The configuration settings (defined in /etc/crate/crate.yml are listed below and include:

  • Mount ephemeral devices and set the CrateDB data path on their mount points (Attached Devices)
  • Enable EC2 discovery
  • Set node name to instance hostname

In addition to the preconfigured Cloud-Init setup, it is possible to inject commands as a User-Data shell-script.

Note

Note that User-Data scripts are called before the pre-installed Cloud-Init scripts and therefore the injected User-Data settings might be overridden by Cloud-Init (see Cloud-Init).

The User Data File

Amongst other configuration options, the User Data file is primarily used for setting your AWS credentials to make use of the EC2 API for inter-node discovery.

For example:

#!/bin/bash
echo "
export AWS_ACCESS_KEY_ID=''
export AWS_SECRET_ACCESS_KEY=''
" >> /etc/sysconfig/crate

EC2 Discovery

The AMI uses EC2 discovery as a default unicast host discovery mechanism. (see Running CrateDB on Amazon EC2). For security reasons it is strongly recommended to use IAM roles instead of providing your AWS credentials manually on your instances (see Authentication).

Note

EC2 discovery is only available on CrateDB version 0.51.0 or higher.

Adapt CrateDB Configuration

The CrateDB configuration file /etc/crate/crate.yml can be adapted on startup by using the User-Data script (see Cloud-Init). The following example shows how to set the minimum_master_nodes and gateway configuration setting which are essential for a multi node setup. This configuration is used when a cluster with 3 or more nodes is set up.

#!/bin/bash
echo "
discovery.zen.minimum_master_nodes: 2
" >> /etc/crate/crate.yml

echo "
gateway:
  recover_after_nodes: 3
  recover_after_time: 5m
  expected_nodes: 3
" >> /etc/crate/crate.yml

Instance Types

The instance type specifies the combination of CPU, memory, storage and networking capacity. To receive better performance for running queries select an instance type which gives the possibility to attach ephemeral storage. On newer AWS instance types this storage is covered by Solid-State-Drives (short SSD). By choosing one of those instance types CrateDB will automatically mount and store its data on those devices if they are attached to the instance as a block device mapping (see also Attached Devices). Instance Types with additional instance store volumes (SSD or HDD) are currently all instances of type m3, g2, r3, d2 and i2.

Attached Devices

To add a block device mapping before launching an instance, it is possible to use the block-device-mappings parameter with the run-instances command. In this case ephemeral1 will be added as an instance store volume with the device name /dev/sdc. For additional info see device naming on linux instances.

sh$ aws ec2 run-instances --image-id ami-544c1923 --count 1 --instance-type m3.medium --block-device-mappings "[{\"DeviceName\": \"/dev/sdc\",\"VirtualName\":\"ephemeral1\"}]"

Note

Note that the data stored on ephemeral disks is not permanent and only persists during the lifetime of an instance.

If no block device mapping is configured on the EC2 instance, the default data directory of CrateDB is set to /var/lib/crate. The data paths are set in /etc/crate/crate.yml.