When it comes to the data storage in Business Intelligence, you have options of having an Operational Data Store (ODS), a Persistent Stage (PSA) and a Data Warehouse (DW); all of them are just databases though. I have noticed that sometimes these terms are used interchangeably. Well, the name as such doesn’t matter: if you call your Data Warehouse Lucy, it will be doing the same job. Although, someone from outside of your team would be confused when you say that you’ve got these numbers from Lucy instead of getting it from more reliable source, like a database. Anyway, let’s get the terms right.
Data Warehouse is a central repository of integrated data. It has enterprise scope. It means that people would expect to get the answer to any question they could possibly have on the organisation’s data. Data warehouses support strategical decisions. But it takes time to collect, process and integrate organisation’s data, so as a downside, data warehouse is updated once in a while, for example daily. Sometimes it’s not good enough – people want some reports or dashboards to show real time data to make tactical decisions, so an Operational Data Store serves this purpose. It doesn’t require all the data from everywhere, it only needs to integrate a small subset of data required for that dashboard.
Then, what is Persistent Stage? Maybe you haven’t heard this term before, I believe that it has appeared quite recently. The history of changes used to be captured in the source database audit tables if there were any, or in the Data warehouse once the one was developed. These days when we follow the Agile development methodology, some parts of DW are available earlier than the others. Persistent Stage is a measure to keep all the data including the historical values until the corresponding part of the data warehouse is developed and can carry on from there.
There are no strict rules here; I have combined what I believe to be the general thinking of these three object’s features in Business Intelligence world.
|Scope / Data set||Subject area / Business function||All tables from the source database if there’s no history tracing in it||All tables which make sense to business|
|Transformations / Business rules|
|Update Rate||Near real-time / frequent||Frequent, ideally triggered by change on data source||On appropriate schedule (hourly / daily / weekly)|
|Data Model||No restrictions||Copies the data source + effectiveness dates||Data Vault / Dimensional|
|Data “age”||Defined by requirements (e.g. only today’s transactions, not all sales ever)||All data||All data|
|Change data capture|
|User access||Department-level||No access / superusers only||Enterprise-level|
|Time slices / Point in time|
ODS and PSA are parts of Data Warehouse solution. But there are many cases when ODS is developed by one of the departments before the idea of the Enterprise-wide data warehouse comes to the mind of top management. The next thing that happens is a realisation of the need for historical data and data set extension to other departments’ requirements. This is a crucial point when the organisation should realise a need for the true data warehouse rather than keep adding more functionality to ODS. While managers could disagree with me and do anything to save the cost, I found it’s more effective (and easier!) to follow the best practice which is developed after decades of implementations. It’s definitely not the end of the world if your ODS does some other bits which it is not required to do. It’s not about breaking the rules, it’s about getting the right solution for your needs.