An Amazon Machine Image (AMI) is a pre-configured virtual appliance that works with Amazon Elastic Compute Cloud (Amazon EC2).
CrateDB provides a number of AMIs. These AMIs come with Java 8 and CrateDB pre-installed, as well as a configuration that optimizes CrateDB for Amazon EC2.
The CrateDB AMIs are the recommended way to run CrateDB on Amazon EC2.
Table of Contents
The CrateDB AMI naming convention is:
crate-<VERSION>-<REVISION>-<AMI_REVISION>-<BASE_AMI>
In more detail:
<VERSION>-<REVISION>
is the CrateDB version and its revision in the
format w.x.y-z
<AMI_REVISION>
is the AMI build revision<BASE_AMI>
is the full name of the Amazon Linux base imageFor example:
crate-0.51.1-1-1-amzn-ami-hvm-2015.03.0.x86_64
To use the AWS website to launch an instance, click the blue ‘Launch Instance’ button and find the available CrateDB AMIs under the Community AMIs section.
Click the ‘select’ button on the AMI you wish to use and set the instance configuration. The most important options here are the number of instances you require and selecting a security group that opens ports 4200 and 4300.
You can find the CrateDB AMI via the command line interface, in the format
crate-<VERSION>-<REV>-<BASE_AMI>
.
For example, to find all CrateDB AMIs available:
aws ec2 describe-images --filters "Name=name,Values=crate-*"
If you are looking for particular CrateDB version, you can be more precise:
aws ec2 describe-images --filters "Name=name,Values=crate-0.51.1-1-amzn-*"
To run instances based on your AMI of choice, run the following command with
the image-id
of the CrateDB version you wish to run, the name of a security
group that allows the ports CrateDB requires (4200, 4300) and if you want to
use the EC2 API for inter-node discovery, a link to a user-data
script.
aws ec2 run-instances --image-id ami-96702de1 --count x --instance-type m3.medium --user-data $(base64 user-data.sh) --key-name keyname --security-groups groupname
The AMI creates a basic CrateDB configuration at first launch (using
Cloud-Init). During startup the Cloud-Init process checks if ephemeral
devices are attached to the instance. If the device contains no filesystem it
will be formatted with the ext4 file system. Otherwise the filesystem already
installed will be used. The mount point of each attached device is defined as
/mnt/<DEV-NAME>
where <DEV-NAME>
is the name of the device as listed in
/dev
. The configuration settings (defined in /etc/crate/crate.yml
are
listed below and include:
- Mount ephemeral devices and set the CrateDB data path on their mount points (Attached Devices)
- Enable EC2 discovery
- Set node name to instance hostname
In addition to the preconfigured Cloud-Init setup, it is possible to inject commands as a User-Data shell-script.
Note
Note that User-Data scripts are called before the pre-installed Cloud-Init scripts and therefore the injected User-Data settings might be overridden by Cloud-Init (see Cloud-Init).
Amongst other configuration options, the User Data file is primarily used for setting your AWS credentials to make use of the EC2 API for inter-node discovery.
For example:
#!/bin/bash
echo "
export AWS_ACCESS_KEY_ID=''
export AWS_SECRET_ACCESS_KEY=''
" >> /etc/sysconfig/crate
The AMI uses EC2 discovery as a default unicast host discovery mechanism. (see Running CrateDB on Amazon EC2). For security reasons it is strongly recommended to use IAM roles instead of providing your AWS credentials manually on your instances (see Authentication).
Note
EC2 discovery is only available on CrateDB version 0.51.0 or higher.
The CrateDB configuration file /etc/crate/crate.yml
can be adapted on
startup by using the User-Data script (see Cloud-Init). The following
example shows how to set the minimum_master_nodes
and gateway configuration
setting which are essential for a multi node setup. This configuration is
used when a cluster with 3 or more nodes is set up.
#!/bin/bash
echo "
discovery.zen.minimum_master_nodes: 2
" >> /etc/crate/crate.yml
echo "
gateway:
recover_after_nodes: 3
recover_after_time: 5m
expected_nodes: 3
" >> /etc/crate/crate.yml
The instance type specifies the combination of CPU, memory, storage and
networking capacity. To receive better performance for running queries select
an instance type which gives the possibility to attach ephemeral storage. On
newer AWS instance types this storage is covered by Solid-State-Drives
(short SSD). By choosing one of those instance types CrateDB will
automatically mount and store its data on those devices if they are attached to
the instance as a block device mapping (see also Attached Devices).
Instance Types with additional instance store volumes (SSD or HDD) are
currently all instances of type m3
, g2
, r3
, d2
and i2
.
To add a block device mapping before launching an instance, it is possible to
use the block-device-mappings
parameter with the run-instances
command.
In this case ephemeral1
will be added as an instance store volume with the
device name /dev/sdc
. For additional info see device naming on linux
instances.
sh$ aws ec2 run-instances --image-id ami-544c1923 --count 1 --instance-type m3.medium --block-device-mappings "[{\"DeviceName\": \"/dev/sdc\",\"VirtualName\":\"ephemeral1\"}]"
Note
Note that the data stored on ephemeral disks is not permanent and only persists during the lifetime of an instance.
If no block device mapping is configured on the EC2 instance, the default data
directory of CrateDB is set to /var/lib/crate
. The data paths are set in
/etc/crate/crate.yml
.