A couple of years ago the OptimalBI team developed Metrics Vault using Optimal Data Engine (ODE) functionality. The purpose of Metrics Vault is to collect various statistics about data vaults configured in ODE database. This includes not only data vault tables metadata change but also actual data profiles.
Recently I have been building a reporting solution to utilize Metrics Vault functionality and at the same time aimed at improving data load monitoring and testing by shortening the error identification cycle.
The plan included following the steps below:
- Download and install Metrics Vault from GitHub.
- Schedule the load job to run once a day in SQL Server Agent.
- Prepare presentation layer views to report off Metrics Vault.
- Build visualization.
Qlik Sense has been set up as a reporting platform for the project I am currently on. I decided to give it a go and create my very first application with a little knowledge of the product.
The first stage involved pencil, several A4 sheets, a bit of time and drawing.
I was hoping to add as many relevant items to the dashboard as possible. In reality, the first prototype looked messy and I had to split it into several parts to make navigation easy, and still save the ability to track hub-source and hub-satellite dependencies.
After completing the work on application layout I had to think about structuring the data in a way Qlik is able to digest. The best practice for Qlik data model is a star or snowflake schema with the central fact table (or tables) and dimensions capturing descriptive information and having no circular references. I found the quality of data preparation to be very important, which helped me to build the app within a couple of hours. Countering that, the poor data layout will most likely lead to adding unobvious extra steps into the logic making the application less clean and harder to support.
Overall working with the product seemed to be less straightforward and intuitive as I had expected. It took time googling and seeking an expert’s assistance to figure out how to build the desired visualization.
In the end, I have a three-sheet application. The areas I focused capturing data on are:
(1) The growth and change rate of hubs (also it is valid for satellites, see below (2)) tables counting the number of rows over the period of time. Here a tester/or developer can choose the desired hub and look at volumes of data coming every run, track business keys counts by source.
(2) Satellite data profiles to deal with data quality issues.
(3) Schedule execution statistics split by tasks load times and errors for quicker identification of the changes in the task loading behaviour and/or to optimize loading procedures.
Metrics Vault views can be downloaded and installed from GitHub Release_Scripts folder.
Monitoring is powerful and can change the way you think about processes. Even simple table counts can show interesting patterns, revealing valuable business information. Constantly consider what more you could learn from your information.