Relational databases are a go-to solution for many applications. However, traditional relational database architecture can often cause problems when you're working with the business intelligence demands of the Industrial Internet of Things (IIoT) and time series data.
But what if you could have your cake and eat it?
In this post, I will take a look at a few myths that contribute to the false idea that you must give up SQL and choose between a time series database and an operational historian.
While it's true that storage and compute power is cheap, IIoT and time series data applications are capable of continuously generating previously unfathomable amounts of data that needs to be ingested and analyzed in real-time.
For these use cases, traditional Relational Database Management Systems (RDBMSes) can start to be a pain.
Many traditional RDBMSes are designed to scale vertically, which means buying new hardware and upgrading your machines every time you want to increase capacity. That process can be slow, error-prone, and expensive.
Operational historians are expensive, difficult to operate, and typically behind the times. On the other hand, NoSQL time series databases require you to develop and maintain your own business intelligence software.
But there's no need to throw the SQL baby out with the bathwater.
With CrateDB, you get the power of an RDBMS (including ANSI SQL, replication, sharding, and snapshots) combined with all the benefits typically associated with NoSQL: distributed computing, limitless horizontal scalability, high-availability, containerization, and much more.
Traditional RDBMSes often struggle with query execution performance, write performance, read performance, or some combination of the three.
Why? Well, consider:
Most RDBMSes provide ACID semantics: atomicity, consistency, isolation, and durability. Basically, once you've successfully written something to the database, you can be sure that any subsequent reads provide an up-to-date and consistent view of the data.
ACID is a neat feature, but it is computationally expensive. Additionally, strong consistency can only be provided if a single write master handles all writes.
Because strong consistency limits your RDBMS to a single write master, the capacity and the performance of your database is constrained by what you can achieve with a single machine.
One solution for this is to switch to BASE semantics: basically available, soft state, eventual consistency. These sorts of systems can be distributed and run across a cluster of any size, neatly solving most traditional performance and scaling issues.
However, in doing so, most eventually consistent databases do away with the relational model, and SQL to boot. (Hence why this new breed of database picked up the moniker NoSQL.)
Baby, meet bathwater.
Fortunately, it doesn't have to be like this. CrateDB is one of a new breed of relational databases that can be clustered to provide distributed SQL query execution with eventual consistency.
People have been trying to use traditional RDBMes as operational historians for decades. And yes, they run into issues, because traditional RDBMSes are challenging to scale for the needs of industrial time series workloads.
But that doesn't mean that relational databases are the problem. Single write-master restrictions and vertical scaling are the problems.
To solve those problems, it is common to switch to a distributed, horizontally-scalable system.
And sure, most databases that can do this also ditch the relational model. But it doesn't have to be that way. With a distributed SQL database like CrateDB, you can have the best of both worlds.
We debunk this myth in detail in a previous post.
On the other hand, traditional historians come with batteries included but are costly and challenging to work with.
With CrateDB and the Crate IoT Data Platform, you get a relational database with the best of both worlds: it's specialized for time series data, and it can do everything a traditional operational historian can do.
Indeed, working with time series data often requires a specialized approach. And that is easier with a specialized database. However, there's no particular reason why a relational database cannot be specialized for time series data.
For instance, here are some of the ways in which CrateDB is specialized for time series data:
Distributed processing, data partitioning, and a multi-threaded architecture enable fast, linearly scalable time series data ingestion and limitless data storage without a performance penalty.
Parallel processing, in-memory columnar indexes, and window functions enable real-time, sophisticated analytics, and machine learning on time series data.
Adapt to changing data with dynamic schemas and complex objects queryable to an arbitrary depth with standard SQL.
Elastic scaling, sharding, and replication for fast, always-on performance that stays consistent as data volume and concurrent clients grow.
First-class interfaces to Azure IoT Hub, Azure Event Hubs, Prometheus, Telegraf, and other SQL tools (e.g., Grafana) via the Postgres wire protocol or HTTP REST API.
In this post, I detailed five myths about relational databases, time series data, and operational historians.
In particular, I argued against the false idea that you have to choose between using a relational database and working with series data or the business intelligence demands of discrete manufacturing.
Got questions? No problem! Get in touch, and we'll be happy to help.