Making Sense of IIoT, Big Data, and Time Series Databases with CrateDB Cloud

2020-02-25, by Matthijs Krul

This is an example of time series data.

Lengthy table of rows from the NYC taxi time series data, 2018
Not the easiest thing to read, at least in this format. You may have such data on your hands or encountered it before. But it's not always easy to understand what it means and how to make use of it. In this article, we will explain just that: what time series are, how to use them, and what you need out of a time series database.

So how do you turn that raw data into, for example, this?

NYC Taxi Data Visualization

The answer is through CrateDB Cloud, the database-as-a-service hosted in the cloud that is based on CrateDB - a time series database optimized for industrial internet of things (IIoT) use cases involving large amounts of machine data.

Well, that is a mouthful. So let's start with the basics. What if you find yourself with an IIoT use case, a lot - and we mean a lot - of sensor data on your hands, but you are unsure how best to store it for analysis? Or you've heard of time series and the internet of things and other such terms, but are unsure what all the above means or how it functions?

Let us explain what it's all about. We will explain where time series data come from, the fundamentals of time series databases, and how CrateDB Cloud addresses the need for an IIoT-optimized time series database. Of course, if you're already familiar with IIoT and just want to understand how CrateDB Cloud addresses your needs, feel free to skip to the section about CrateDB Cloud.

Understanding the internet of things and time series data

We live in a world of Big Data. Ever growing quantities of data are extracted from the world around us, and ever more services, institutions and companies are becoming dependent on the flow of large volumes of data for their operation. Often, the sources of these data flows already existed before. What is new is that their value for generating data over time is becoming recognized.

In other cases, the Internet of Things (IoT) and its associated data generation technology is making it possible to extract and transmit data points where none were available before. New sensors being available means new sensor readings, and therefore new data sources.

To illustrate the point, let's look at a small-scale and a large-scale example. A small source of data could be an IoT enabled fridge in your home. Equipped with sensors, the fridge continually measures certain relevant values - temperature, humidity, energy consumption, perhaps even capacity.

These data points are then continually transmitted to a central database on a server, which gathers and tracks them. (This could be the fridge manufacturer's database, or the IoT company's database, or even the consumer's own.)

The front screen of an LG IoT smart refrigerator
An LG 'smart fridge' display. Credit: David Berkowitz CC BY 2.0

In the past, fridges would still have had temperatures and energy consumption, but nothing would have tracked such data or stored it. Now, such data is recognized as valuable: among other things, IoT data can help increase productivity, support better equipment monitoring and failure prediction, allow for (further) automation, and even improve customer marketing. Moreover, new IoT technologies make it easy to record and transfer such information.

The large-scale example works along the same lines. Take for instance the New York City Taxi Commission, from whom we got the time series data shown at the start of this article. All the taxicabs licensed with the Commission digitally track every journey a customer makes with that cab.

NYC taxicabs lined up
Taxicabs in New York City, NY. (License-free image)

This means a lot of data: the fare, the tip, the distance travelled, the start and end times, and much more. And this for every journey with every cab in a city as large - and as popular with tourists - as New York. You can imagine this adds up quickly.

What is a time series database?

At the heart of all this Big Data collection are time series databases. As the name suggests, a time series database gathers and tracks data in the form of a time series. But what are such time series data?

Essentially, time series data are a way of presenting data points as points on a timeline. Represented in the form of a graph, this means the data points (the values) represent one axis, and time another. Of course, the exact dimensions depend on the nature of the data and the relevant timescale.

A simple example of time series data represented in this way is the stock market. The data points are the prices of a particular stock or index. The other axis is time - a day of trading, say. The result is a graph that looks something like this:

A time series graph displaying currency prices
Time series graphs visualizing price fluctuations in currency and commodities markets. (License-free image)

The data gathered through IoT are essentially of this nature. Let's return again to the NYC taxicab data, as they are a good example of this: values for different dimensions (cab fare, travel duration, etc.) gathered by sensors (in the cab) collected and organized sequentially by time (a timestamp for each journey start).

For example, from the graphs at the top of this article you can clearly see that while relatively few New Yorkers take taxi trips in the very early morning, the ones that do tend to spend a lot. Perhaps they have a long way to go home after going clubbing in Manhattan.

Since this kind of data very quickly becomes very large in volume, databases are necessary that are tailored to such large volumes of data and the way the inputs are continually transmitted over time.

There are many kinds of databases: a traditional type are relational databases, tailored to smaller volume use cases such as IT monitoring. But some databases are specialized in time series data. The advantages of such databases are in their scale, speed, and ease of use, given such large volumes. These are time series databases.

Time series databases and the IIoT use case

You can imagine, then, that such time series databases need to be able to do several things very well in order to cope with modern IoT requirements. Manufacturing and its associated IIoT is the largest source - and generates the largest volumes of - time series data.This quantity is growing rapidly every year.

It is no surprise then that those responsible for managing IIoT data find themselves looking for time series databases that can speedily and efficiently support storage, analysis, and querying for (very) high volume use cases.

So if you want a time series database that fits your IIoT use case, this means it should:

  • Quickly process very large amounts of data. Typical IoT use cases add up fast. This goes even more for IIoT. Take airlines for example: a single flight of a Boeing 787 can generate over half a terabyte of data from its different sensors, and more recent plane designs produce even more.
  • Show flexibility and scalability. As IoT applications expand, businesses and institutions want to increase their data generation capacity and their data sources. This means it must be easy to smoothly and quickly increase the amount of data processed.
  • Provide good tracking, visualization, and prediction capabilities. Most of the time, the added value of all this Big Data intelligence is in prediction based on pattern analysis, in statistical analysis of variance and range, and/or simply to provide a clearer overview of relevant values.
  • Be easy to set up and maintain. The purpose of IoT intelligence generation is to make businesses and institutions more effective and efficient, not to add extra overhead and management problems.
  • Be secure. Wherever the data is being stored, it needs to combine efficiency with security against data loss in case of hardware failure or other problems.

To meet all these requirements is not straightforward. It requires a time series database optimized for high end IIoT needs, one that is flexible, scalable, and lightning fast. Such use cases go well beyond the capabilities of most databases, which are optimized for IT monitoring.

Moreover, if you want to be able to focus on your business intelligence - making the most of the data available to you - rather than on server maintenance and setup, you will want a database-as-a-service. This means you need a time series database fully based in the cloud. But even in the cloud, you still need a service that can live up to the IIoT use case standards mentioned above.

This is not a small ask. Fortunately, there is such a service: CrateDB Cloud.

Get started

Launch CrateDB Cloud on Azure

CrateDB Cloud: a time series database-as-a-service optimized for IIoT

CrateDB Cloud is the database-as-a-service offering of CrateDB, fully hosted and maintained in the cloud. Based on the CrateDB time series database, it is optimized for high volume IIoT use cases on each of the criteria mentioned above:

  • It is fast, providing real-time query performance on data at any scale and with sub-second response times.
  • It is simply scalable CrateDB Cloud scales limitlessly, elastically and linearly, meaning any changes in your use case can be addressed any time at the click of a button. But it is also flexible: it combines the speed and scalability of NoSQL with the ease of use and interoperability of SQL databases.
  • It is designed for straightforward integration with IoT applications. It supports easy visualization via Grafana or other tools, provides built-in interfaces to the Azure IoT hub, and plays well with business intelligence applications such as PowerBI
  • Thanks to Azure Marketplace integration, setting up CrateDB Cloud takes no time or thought at all. The installation process is clearly described and the CrateDB Cloud Console is maximally user-friendly. To cap things off, you only ever pay for actual usage, and because our plans are calibrated to a broad range of typical IIoT use case needs, you are spared the hassle of figuring out exact hardware configurations.
  • The cloud service is maintained and secured in case of failure 24/7 by CrateDB Cloud engineers on all plans except our trial plan. Thanks to the sophisticated CrateDB architecture, data security is achieved without compromising real-time query performance. And in our upcoming release the Cloud Console will also support a full range of monitoring and alerting services.

That NYC cab data we started our explanation with? We ran it on CrateDB Cloud, no problem. We did the visualization by connecting to CrateDB Cloud, too - using Jupyter Notebook.

How CrateDB Cloud answers your IIoT data needs

Of course, chances are you don't work for the New York City Taxi and Limousine Commission (although if you do, here's a ready-made solution). But if you have an IIoT use case, likely you have similar volumes of sensor data on your hands, if not more.

Since there are many kinds of databases, it may not be obvious from the get go what kind of database you need for such a case. But now you have a better idea of how data can be presented as a time series and inserted in a time series database. More importantly, you now have a clear picture of the advantages of using one for high volume use cases.

However, even then there is still the question of knowing which database to use. Here CrateDB Cloud comes to the rescue. As a highly scalable, fully managed cloud service, it's reliable and flexible enough for just about any IIoT use case. It's ANSI SQL compatible - meaning it is easy to work with and integrate into your existing IT competencies - but provides real time performance with its architecture built for complex time series data analysis.

The proof is in the pudding: CrateDB Cloud has 10x better price-performance than other dedicated time series databases. And remember: the best part is that thanks to the integration with Microsoft Azure marketplace, you've got it up and running in no time.

CURIOUS? TRY IT OUT NOW.

Newsletter

Stay up to date

Sign up here to keep informed about CrateDB product news,
events, how-to articles, and community update.