Another one of the open source projects I readily follow is Apache Superset. Superset is part of innovative ecosystem surrounding PrestoDB.
So to start with, let’s cover off what Superset is and why it’s so darn awesome…
Superset (previously known as Panoramix and more recently Caravel) was built from the ground, and open sourced, by the folks over at Airbnb. It supports visualization, data exploration and collaboration. It has an extensible range of visualizations, so D3.js, maps etc, no worries. Security comes via Flask and there is support for LDAP, OpenID, OAuth and many more – it’s set for enterprise. There is support for a whole host of SQL speaking databases, along with Druid and of course Presto which you can find more on in my previous blog here. Presto is a “SQL on Anything” query engine, so that really opens avenues up for Superset. Superset also has an inbuilt SQL querying tool, so it’s multi purpose. In short, it’s feature rich, and highly configurable, therefore whatever your requirement is, it can be achieved.
Currently there is only support for *Nix and OSX architectures, I do understand that it is possible to run on Windows however I have read there’s some undocumented configuration required around Gunicorn, so mileage might vary, and I would expect other issues to present. Today I am using Kali Linux rolling 2019-03-12 (a pentest Debian based distro) to install Superset, I would not recommend using this distribution, I just happen to have a dual booted machine with this flavor installed.
Now let’s install Superset:
Grab all the security modules, if this step fails with something like package not available / found then you likely need to change the your local sources repo:
sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev
In my case libsasl2-dev was not available so I had to add some Debian sources, and then issue the apt-get above again:
echo "deb http://ftp.debian.org/debian unstable main contrib non-free" > /etc/apt/sources.list.d/debian.list echo "deb http://deb.debian.org/debian experimental main" >> /etc/apt/sources.list.d/debian.list apt update
If you have to update sources adding extra entries as above, be sure not to run a full upgrade (issuing the code statement below) unless you have quite a bit of time on your hands. I recommend removing whatever added entries from sources after making them so you don’t run into this issue in future:
apt update && apt full-upgrade
Once the security modules install correctly then you can move forward, if there are any issues don’t carry on till they are resolved as it will cause issues with Flask later down the track. The next package to go and grab is virtual environment (virtualenv), this enables isolated Python libraries. There’s options online to start from a Docker image for those with Docker expertise.
pip install virtualenv
Then we want to go and create a virtual environment, I’m calling my environment Superset, you can have as many virtual environments as you like
python3 -m venv superset
To delete a virtual environment (if you make a mistake):
rm -rf superset
Then activate the environment, from this point forward we are working in the virtual environment:
Next, we can go any update setup tools, just to make sure we are on the latest version:
pip install --upgrade setuptools pip
Now we are ready to attempt the installation into the virtual python environment, I have installed Caravel and Superset a number of times on a variety of Linux flavors, generally speaking if you have made it this far it **should** be smooth sailing from here on out, certainly if the install of Superset works without dependency issues then we are away laughing :
pip install superset
After successful installation of Superset we need to setup an Admin user, after issuing the below command it will ask from some things like first / last name, along with password credentials:
fabmanager create-admin --app superset
In this process I ran into an error:
Was unable to import superset Error: cannot import name ‘_maybe_box_datetimelike’ from ‘pandas.core.common’ (/root/superset/lib/python3.7/site-packages/pandas/core/common.py)
This is due to the latest Pandas not being compatible, so we need to:
pip uninstall pandas
pip install pandas==0.23.4
Then run user creation again:
fabmanager create-admin --app superset
Next we need to initialize the database:
superset db upgrade
And optionally load some visualization examples (demo of these below):
Now we can start Superset, which will run on localhost:8088 by default, you can however use a -pl switch to start on a different port.
superset runserver -d
Once Superset is running you can go to http://0.0.0.0:8088 and you the login screen should be present (accessible using the previously created credentials:
After logging in the main portal is viable, keep in mind this is an administrator view:
Hovering over the tabs we are greeted with a range of options:
Security – where we can add users (remember we can sync from AD, OAUTH etc), assign roles and the whole UX experience.
Manage – Where we can import dashboard from other environments, create some templates for Dashboards and the UX interface if custom is your thing!
Sources – Where we create connection to sources, these can be flat files, traditional relational databases or even more modern big data platforms.
SQL – Where we can query our sources via a SQL interface, and save some queries.
The SQL interface is relatively basic, but sufficient for most use cases when building dashboards:
Dashboards – Provided the examples were loaded during the install process there should be some dashboards to interact with:
Taking a look at these dashboards you can gather a feel for the rich visualizations available from the too, there are many hover over and filtering options akin to other visualization providers:
Hopefully this evaluation is helpful getting Superset up and running in your environment!
Examples of the types of out the box visualizations packed with Superset can be found here.
Link for the Windows installation can be found here.
Get the open source product from Github.
Thomas – MacGyver of code.