Machine Learning

Integrate CrateDB with machine learning frameworks and tools, for MLOps and Vector database operations.

Machine Learning Operations

Training a machine learning model, running it in production, and maintaining it, requires a significant amount of data processing and bookkeeping operations.

CrateDB, as a universal SQL database, supports this process through adapters to best-of-breed software components for MLOps procedures.

MLOps is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently, including experiment tracking, and in the spirit of continuous development and DevOps.

Vector Store

CrateDB’s FLOAT_VECTOR data type implements a vector store and the k-nearest neighbour (kNN) search algorithm to find vectors that are similar to a query vector.

These feature vectors may be computed from raw data using machine learning methods such as feature extraction algorithms, word embeddings, or deep learning networks.

Vector databases can be used for similarity search, multi-modal search, recommendation engines, large language models (LLMs), retrieval-augmented generation (RAG), and other applications.

Anomaly Detection and Forecasting

MLflow

Tutorials and Notebooks about using MLflow together with CrateDB.

Blog: Running Time Series Models in Production using CrateDB

Part 1: Introduction to Time Series Modeling using Machine Learning

The article will introduce you to the concept of time series modeling, discussing the main obstacles running it in production. It will introduce you to CrateDB, highlighting its key features and benefits, why it stands out in managing time series data, and why it is an especially good fit for supporting machine learning models in production.

Fundamentals
Time Series Modeling

Notebook: Create a Time Series Anomaly Detection Model

Guidelines and runnable code to get started with MLflow and CrateDB, exercising time series anomaly detection and time series forecasting / prediction using NumPy, Salesforce Merlion, and Matplotlib.

README Notebook on GitHub Notebook on Colab

Fundamentals
Time Series
Anomaly Detection
Prediction / Forecasting

PyCaret

Tutorials and Notebooks about using PyCaret together with CrateDB.

Notebook: AutoML classification with PyCaret

Explore the PyCaret framework and show how to use it to train different classification models.

README Notebook on GitHub Notebook on Colab

Fundamentals
Time Series
Anomaly Detection
Prediction / Forecasting

Notebook: Train time series forecasting models

How to train time series forecasting models using PyCaret and CrateDB.

README Notebook on GitHub Notebook on Colab

Fundamentals
Time Series
Training
Classification
Forecasting

scikit-learn

Use scikit-learn with CrateDB.

Fundamentals
Regression Analysis

TensorFlow

Use TensorFlow with CrateDB.

Predictive Maintenance

Build a machine learning model that will predict whether a machine will fail within a specified time window in the future.

Fundamentals
Prediction

LLMs / RAG

One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific sources of information, using a technique known as Retrieval Augmented Generation, or RAG. RAG is a technique for augmenting LLM knowledge with additional data.

LangChain

Tutorials and Notebooks about using LangChain together with CrateDB. LangChain has a number of components designed to help build Q&A applications, and RAG applications more generally. This feature uses CrateDB’s Vector Store implementation.

Tutorial: Set up LangChain with CrateDB

LangChain is a framework for developing applications powered by language models. For this tutorial, we are going to use it to interact with CrateDB using only natural language without writing any SQL.

To achieve that, you will need a CrateDB instance running, an OpenAI API key, and some Python knowledge.

Navigate to Tutorial

Fundamentals
Vector Store
LLM
RAG

Notebook: Vector Similarity Search

CrateDB’s FLOAT_VECTOR type and its KNN_MATCH function can be used for storing and retrieving embeddings, and for conducting similarity searches.

README Notebook on GitHub Notebook on Colab Notebook on Binder

Fundamentals
Vector Store
LLM
RAG

Notebook: SQLAlchemy Document Loader

Database tables in CrateDB can be used as a source provider for LangChain documents.

README Notebook on GitHub Notebook on Colab Notebook on Binder

Fundamentals
Vector Store
Data I/O

Notebook: Conversational History

CrateDB supports managing LangChain’s conversation history.

README Notebook on GitHub Notebook on Colab Notebook on Binder

Fundamentals
Vector Store
History