Recently I got the question from a customer to help him run Apache superset on Azure. Apache superset is an open source projects which allows you to create and run PowerBI like dashboards on top of different data sources. It is build in Python and according to the installation documentation you can run this in a docker container to get quickly started. So, my plan was to get this docker container up and running on Azure App Service, since it needs a web endpoint.
In this post will guide you through my efforts and setup. I will explain to you:
how to run superset locally
how to connect superset to an Azure Database for PostgreSQL
how to connect superset to an Azure Redis Cache
In a next post I will go through:
how to deploy superset to Azure App Service for Linux
how to attach cloud storage for the config file(s)
This example makes use of a docker-compose file to spin up 3 containers, one for superset, one for redis cache and one for a PostgreSQL database. The docker-compose file also maps the superset_config.py file to a local path in the superset container. This superset_config.py files contains the configuration for helping superset connect to the redis cache and PostgreSQL database.
Next to that there are a superset-init and superset-demo file that are copied into the original docker container. These scripts initialize superset with a login user and load it with demo data.
So, once I had this example running locally, I wanted to get this up and running in Azure. First thing I did for this, was create an Azure Database for PostgreSQL:
Once you created the posgreSQL server, you can create a database for superset:
Once you have created this database, you can remove the postgres service from the docker-compose file of the superset examples. The file now looks like this:
But, what we also need, is give superset the connection info to this azure postgreSQL database. This is done in the superset_config.py file. You need to change the SQLALCHEMY_DATABASE_URI. The format for the URI is:
So for my example this will be:
Figuring out the correct syntax and values for this connection string took some trial and error, but once you have this you can run everything with:
Next step will be to replace the redis container with an Azure Redis Cache.
Once you have your cache setup, you can also alter the docker-compose file and get rid of the redis service.
And again, in the superset_config.py file alter the connection information. Now, this is quite tricky, since Azure Redis Cache only allows connections over SSL. For this you need special syntax.
You need to use redis.StrictRedis to create a connection object and then pass this object to the HOST parameter. This does the trick. And now you can again spin everything up and try it out:
Ok, so now we already have our local superset container run with an Azure Redis Cache and an Azure Database for PostgreSQL. Next step will be to get the superset container running in Azure App Service.