Objects: Analyzing Marketing Data

Marketers often need to handle multi-structured data from different platforms. CrateDB’s dynamic OBJECT data type allows us to store and analyze this complex, nested data efficiently. In this tutorial, we’ll explore how to leverage this feature in marketing data analysis, along with the use of generated columns to parse and manage URLs.

Consider marketing data that captures details of various campaigns.

{
    "campaign_id": "c123",
    "source": "Google Ads",
    "metrics": {
        "clicks": 500,
        "impressions": 10000,
        "conversion_rate": 0.05
    },
    "landing_page_url": "https://example.com/products?utm_source=google"
}

To begin, let’s create the schema for this dataset.

Creating the Table

CrateDB uses SQL, the most popular query language for database management. To store the marketing data, create a table with columns tailored to the dataset using the CREATE TABLE command:

CREATE TABLE marketing_data (
    campaign_id TEXT PRIMARY KEY,
    source TEXT,
    metrics OBJECT(DYNAMIC) AS (
        clicks INTEGER,
        impressions INTEGER,
        conversion_rate DOUBLE PRECISION
    ),
    landing_page_url TEXT,
    url_parts GENERATED ALWAYS AS parse_url(landing_page_url)
);

Let’s highlight two features in this table definition:

metrics

An OBJECT column featuring a dynamic structure for performing flexible queries on its nested attributes like clicks, impressions, and conversion rate.

url_parts

A generated column to decode an URL from the landing_page_url column. This is convenient to query for specific components of the URL later on.

The table is designed to accommodate both fixed and dynamic attributes, providing a robust and flexible structure for storing your marketing data.

Inserting Data

Now, insert the data using the COPY FROM SQL statement.

COPY marketing_data
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_marketing.json.gz'
WITH (format = 'json', compression='gzip');

Analyzing Data

Start with a basic SELECT statement on the metrics column, and limit the output to display only 10 records, in order to quickly explore a few samples worth of data.

SELECT metrics
FROM marketing_data
LIMIT 10;

You can see that the metrics column returns an object in the form of a JSON. If you just want to return a single property of this object, you can adjust the query slightly by adding the property to the selection using bracket notation.

SELECT metrics['clicks']
FROM marketing_data
LIMIT 10;

It’s helpful to select individual properties from a nested object, but what if you also want to filter results based on these properties? For instance, to find campaign_id and source where conversion_rate exceeds 0.09, employ the same bracket notation for filtering as well.

SELECT campaign_id, source
FROM marketing_data
WHERE metrics['conversion_rate'] > 0.09
LIMIT 50;

This allows you to narrow down the query results while still leveraging CrateDB’s ability to query nested objects effectively.

Finally, let’s explore data aggregation based on UTM source parameters. The url_parts generated column, which is populated using the parse_url() function, automatically splits the URL into its constituent parts upon data insertion.

To analyze the UTM source, you can directly query these parsed parameters. The goal is to count the occurrences of each UTM source and sort them in descending order. This lets you easily gauge marketing effectiveness for different sources, all while taking advantage of CrateDB’s powerful generated columns feature.

SELECT
    url_parts['parameters']['utm_source'] AS utm_source,
    COUNT(*)
FROM marketing_data
GROUP BY 1
ORDER BY 2 DESC;

In this tutorial, we explored the versatility and power of CrateDB’s dynamic OBJECT data type for handling complex, nested marketing data.