Note: we have published an updated version of this blogpost
Apache Superset is an easy-to-use open-source business intelligence (BI) application that offers a rich set of customizable data visualizations. With Superset, you can explore, filter, and organize your data. You can also create and share custom monitoring and analytics dashboards.
CrateDB is a distributed SQL database that makes it simple to capture, store, and analyze massive amounts of machine data in real-time.
In this post, I'll show you how to get started with CrateDB and Apache Superset. And by the end of this post, you should have your very first Superset visualizations running on top of CrateDB data.
Note: I'm using macOS, but my instructions should be trivially adaptable for Linux or Microsoft Windows.
Set up a working directory
Create a new directory
$ mkdir crate-superset-demo
Change into it:
$ cd crate-superset-demo
We want to use this as a working directory for the rest of this tutorial, so make sure to run all subsequent commands from here.
Set up CrateDB
Get CrateDB set up with a single command:
$ bash -c "$(curl -L https://try.crate.io/)"
This command downloads CrateDB and runs it from the tarball. If you'd like to install CrateDB a bit more permanently, or you are using Windows, check out our collection of one-step install guides.
If you're using the command above, it should pop open the CrateDB admin UI for you automatically once it has finished. Otherwise, head over to http://localhost:4200/ in your browser.
If you ran the command above, it should pop open the CrateDB Admin UI for you automatically once it has finished. Otherwise, head over to http://localhost:4200/ in your browser.
You should see something like this:
Get some data
If you’re playing around with a fresh CrateDB install, likely, you don't have any data. So head on over to the Help screen by selecting the question mark icon on the left-hand navigation menu.
The help screen looks like this:
Select IMPORT TWEETS FOR TESTING and follow the instructions to authenticate your Twitter account.
Don’t worry. This isn’t going to post anything on your behalf. It doesn’t even look at your tweets. All this does is import a bunch of recent public tweets on Twitter.
Once you're done, select the Tables icon from the left-hand navigation, and then select the
tweets table. You should end up here:
Which should look like this:
Great! CrateDB is set up and running, and you have some test data.
For this demo, I want to show you how to graph a pie chart of the different Twitter clients used to post the tweets you just imported.
To do that, we need to group on the information in the
source column (i.e., the client identification). However, Twitter returns this data as HTML, which we don't want.
For expediency, let's create a view that strips out the HTML with a simple regular expression. (Note: You should not process HTML with regular expressions for any real-world application!)
Navigate to the Console tab, using the left-hand navigation, and then execute the following query in the query console:
CREATE VIEW tweets_source AS SELECT id, regexp_matches(source, '>(.*)<') AS source_txt FROM tweets;
You should see this:
You can test the view with this query:
SELECT * FROM tweets_source limit 100;
Nice. Let's move on.
Install Apache Superset
For this tutorial, we'll stick to the official documentation and install Superset inside a Python virtual environment.
Check which version of Python you're running:
$ python3 --version Python 3.7.2
If you're using Python 3.3 or newer, run this command to set up a Python virtual environment:
$ python3 -m venv env
If you're running an older version of Python, consult the Superset docs for help installing the
With that done, you can activate the virtual environment, like so:
$ source env/bin/activate
(For more help working with Python virtual environments, consult the Python documentation.)
Before we continue, and as a precautionary measure, make sure that your Python packaging tools are up-to-date:
$ pip install --upgrade setuptools pip
Unfortunately, at the time of writing, Superset doesn't specify its package dependencies correctly. So, we need a workaround.
First of all, fetch the most up-to-date
$ wget https://raw.githubusercontent.com/apache/incubator-superset/master/requirements.txt
Then, install the required Python packages:
$ pip install -r requirements.txt
Then, add this:
$ pip install sqlalchemy==1.2.18
Now we're ready to install Superset:
$ pip install superset
Finally, install the CrateDB Python Client library so that Superset can speak to CrateDB:
$ pip install crate
Great! Now, let's get Superset running.
Start by initializing the Superset database:
$ superset db upgrade INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. INFO [alembic.runtime.migration] Running upgrade -> 4e6a06bad7a8, Init INFO [alembic.runtime.migration] Running upgrade 4e6a06bad7a8 -> 5a7bad26f2a7, empty message INFO [alembic.runtime.migration] Running upgrade 5a7bad26f2a7 -> 1e2841a4128, empty message INFO [alembic.runtime.migration] Running upgrade 1e2841a4128 -> 2929af7925ed, TZ offsets in data sources [...] INFO [alembic.runtime.migration] Running upgrade 7fcdcde0761c -> 0c5070e96b57, add user attributes table INFO [alembic.runtime.migration] Running upgrade 0c5070e96b57 -> 1a1d627ebd8e, position_json INFO [alembic.runtime.migration] Running upgrade 1a1d627ebd8e -> 55e910a74826, add_metadata_column_to_annotation_model.py
Then, set up the environment:
$ export FLASK_APP=superset
Create an admin user:
$ flask fab create-admin Username [admin]: User first name [admin]: User last name [user]: Email [firstname.lastname@example.org]: Password: Repeat for confirmation: Recognized Database Authentications. Admin User admin created.
Set up the default roles and permissions:
$ superset init 2019-07-12 14:43:01,037:INFO:root:Creating database reference 2019-07-12 14:43:01,055:INFO:root:Syncing role definition 2019-07-12 14:43:01,121:INFO:root:Syncing Admin perms [...] 2019-07-12 14:43:02,510:INFO:root:Creating missing database permissions. 2019-07-12 14:43:02,530:INFO:root:Creating missing metrics permissions 2019-07-12 14:43:02,536:INFO:root:Cleaning faulty perms
Finally, run Superset:
$ superset run -p 8080 --with-threads --reload --debugger 2019-07-12 14:43:45,211:INFO:werkzeug: * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit) 2019-07-12 14:43:45,213:INFO:werkzeug: * Restarting with stat 2019-07-12 14:43:48,036:WARNING:werkzeug: * Debugger is active! 2019-07-12 14:43:48,058:INFO:werkzeug: * Debugger PIN: 280-982-752
Now, visit http://127.0.0.1:8080/ in your browser:
Log in using the admin account you just created.
Afterward, you should see this:
Connect to CrateDB
Select Sources → Databases from the top navigation menu:
Then, select the Add new record button (the icon the top right-hand corner of the screen).
On the Add Database screen, fill out the following information:
- SQLAlchemy URI:
Then, select the Test Connection button.
You should be presented with a modal popup window saying "Seems OK!"
For now, we can ignore all of the other configuration options. Scroll to the bottom of the page and select Save.
Here's your created database source:
Now, select Sources → Tables from the top navigation menu:
We're going to set up a source for the
tweets_source view we created earlier.
Select the Add new record button (the icon the top right-hand corner of the screen).
On the Import a table definition screen, fill out the following information:
- Database: CrateDB
doc(the default CrateDB schema)
- Table Name:
This works because views function as tables!
Because our view is so simple, there's no need to configure anything. You can ignore the blue notification at the top of the screen.
Create a pie chart
Select Charts from the top navigation menu.
Then, select the Add new record button (the icon the top right-hand corner of the screen).
On the Create a new chart screen, fill out the following information:
- Visualization type: Pie Chart
Select Create new chart.
You should be presented with the following screen:
- In the Time section of the left-hand configuration panel, select Last week, change it to No filter, and then select Ok.
- In the Query section, select Group by and choose
Now, select the Run Query button in the main workspace area.
Et voila! A pie chart.
You can move your cursor over the pie chart to get specific values.
In my case, Twitter for iPhone was responsible for 524 tweets, and Twitter for Android was responsible for 494.
From here, you can title the chart and save it.
Apache Superset is a modern, enterprise-ready business intelligence web application. And CrateDB is a distributed SQL database. Together, they provide a powerful open source platform for data warehousing and business intelligence.
In this post, I showed you how to:
- Get CrateDB up-and-running
- Import some test data
- Create a view
- Install Apache Superset
- Set up a connection to CrateDB
- Visualize your data
From here, you might want to go on to create your first Superset dashboard.
Questions? Looking for help? Get in touch!