photo by Desertrose7
On my first BI job, where I was an ETL developer, my team was using Microsoft SQL Server Reporting Service (SSRS) as a reporting tool. My job was to model the Data Marts and create the data flows into those tables. Our report analysts were good at SQL, I have even learnt a thing or two from them. In SSRS each visualisation could get its own data.
If we talk about reporting tool today, it will be one of the big three, Qlik Sense, Power BI or Tableau, or similar products. They are not exactly reporting tools, but dashboarding and analytical tools. They focus on the interactive experience with the data, rather than those tables and charts you used to print-out for meetings. They have similar underlying concepts:
- Because you may want to see very different KPIs on one dashboard, the best option is to connect to all the required database tables and model the relationships on the data preparation part of the tool.
- Applying the filter to one visualisation automatically applies it to everything else, so it benefits to have all the data interconnected.
- These tools are made for everyone with no specific skills required, so custom SQL is either well-hidden or not supported at all.
To achieve these goals, no matter what your database model looked like and how you have modelled the data on the data preparation level, all the loaded data is treated by tool as if it was flattened into one dataset.
For a few decades we were taught that the dimensional modelling is the best for the reporting layer. It used to be true, the report analysts with good SQL skills could retrieve any answers from it. What about these tools now?
Data Model
Let’s say my business provides services and sells some products. It is logical for me to have sales and services as two separate fact tables. But I want to see both on one of the dashboards. If I add two of these and a couple of dimensions, clients and stores, I run into the circular relationship error. Reporting tools either don’t allow me to add a circular relationship (e.g. Power BI), or create some sophisticated fix for it (e.g. Qlik Sense) or suggest degenerating dimension into one of the facts (e.g. Tableau).
So, if I want to analyse more than one sort of events that happened between same parties (dimensions), I run into a problem of incompatibility of the dimensional model and modern reporting tools with the default data loader. I want to design the reporting layer of my Data Warehouse in a way that suits my reporting tool the best.
I have created multiple views on my star schema, but once I found the best option, I can skip the dimensional step and materialise those by loading data directly into the new structure. I have used Qlik Sense to explore some options, because its data preparation layer is the most customisable.
Options
Option 1
I used the synthetic key proposed by the default data loader. Sales and the totals data are correct, but when I want to analyse services, these details are lost.
Option 2
I have created a view that combines all the data into one flat table by joining it to the client. Thus, some services ended up in the same row with some sales which could be not the two events happened together, also it created duplications. Now I have to use extra functions to supress duplications in the data on every visualisation. In Qlik I have used AGGR() function. I had to load the store information twice, therefore I’ve lost the ability to filter by store across the whole dashboard. Also, something happened to my total sum.
Option 3
Another view, which unions all the events into one table with the label “sale” or “service”. So, my charts are identical, they are using all the same fields, but have different filters set up for the event type. I have achieved the most functionality I was expecting.
Option 4
Flattened stars views, when all the dimensions are joined to fact tables. When I loaded them into Qlik, it has created a synthetic key on the dimension fields for me. I have achieved the functionality I was after. Also, I could finally filter out those customers who didn’t use services or purchased anything.
Options Summary
The best consumable data model for the modern reporting tool would be either a typed event supertable or flattened stars.
The first is, of course, a nightmare from the database administration perspective, so forget about it. Qlik Sense would allow you to do that in the data load editor, but this is also a very bad idea from the maintenance point.
Flattened stars data mart design means huge overhead of the data repetition. Also, if the dimension acquired another field, all the flattened tables have to be rebuilt to get it.
Conclusion
Dimensional model is not the only, and not necessarily the best suited for the modern reporting tools consumption. But the exact implementation depends on the team’s preference and skills. On one of my projects we had a report analyst who was very skilled in data transformation in Qlik, so all the tricks happened on the load stage. On another project we were creating a stored procedure per a dashboard, so all the complex calculations were made on the database engine, data model didn’t matter completely, but there was also an overhead of duplication in logic across procedures.
By making reporting tools be more UI-oriented and allowing less custom code, vendors could do more sales, but your team still needs skilled professionals to create something more than a basic visualisation and get your numbers straight.
Kate
Data masseuse