Why your Data Warehouse needs Data Vault
At OptimalBI we are huge fans of modelling the data warehouse using the Data Vault methodology. That’s because Data Vault allows us to provide you with a data warehouse that you can continue to use long after we are gone and continue to add to yourselves. We are often asked whether the “extra” Data Vault layers are necessary or just a pesky overhead in an end-to-end data warehouse architecture. To that we say: not only are the Data Vault layers necessary, they are arguably the most important layers in your data warehouse, and argue that I shall!
Yes, your current single source data warehouse, with a staging area and a few star schemas, has been performing quite nicely over recent months; you are getting the metrics you need for your reports, and management seems content with what it is receiving. But what happens when the CEO decides she is sick of getting multiple reports from different areas of the business, all of which are completely unrelatable to each other? Suddenly there is a business need to conform information from across the organisation and turn your humble creation into an Enterprise Data Warehouse (EDW). Something tells me the CEO won’t be happy to hear an estimate of months or years to deliver value from such a project…
This is where Data Vault comes into it. It is designed to optimise your Enterprise Data Warehouse using a flood of adjectives: agile, adaptable, auditable, flexible, historical, and integratable. Peel back the jargon though, and you find the real value. Whereas previous data warehouse methodologies tended to require a lot of “big design up front”, Data Vault allows for fast, repeatable builds in chunk sizes that you get to choose. What makes this safe to do is Data Vault’s primary concern: getting all of the data, all of the time.
You still have your source systems in their various forms and you still need your data marts, typically in the form of a dimensional model, for your business reporting needs. What you might be missing is the middle bit in between that allows you to continually add new sources to your data warehouse without breaking it, while capturing all of your previous data’s history. That’s right, Data Vault evolves as your organisation evolves! No more silos!
Data Vault is designed to quickly adapt to change within the business and to allow new areas of the business to be easily integrated into the EDW. It turns what can be a very painful process of trying to fit other areas of the business and their entirely different terminology and data into your data warehouse, into a relatively simple one. The beauty of Data Vault is it allows you to bring together your numerous and unrelated sources and conform them into logical groups (hubs and satellites to get technical) that suit the organisation as a whole, removing the tangled spaghetti mess that often arises when combining sources. The advantage of taking this smaller step to logical groups first using Data Vault, before driving all the way to fully conformed dimensions, is adaptability to change.
Data Vault is the real success story behind a truly integrated and agile Enterprise Data Warehouse.
Be sure to check out our other blogs about Data Vault!
Thanks for reading, Nic
Keep upto date
We were recently approached by a client with an interesting job - they wanted us to create a solution for Persistent Staging Area for them, and the requirements were quite broad. They had an MS SQL Server as their RDBMS and the loading tool was SSIS, just to stay in...read more
A couple of years ago the OptimalBI team developed Metrics Vault using Optimal Data Engine (ODE) functionality. The purpose of Metrics Vault is to collect various statistics about data vaults configured in ODE database. This includes not only data vault tables...read more
Almost everything a data warehouse is asked to do involves dates: days, weeks, months, years, periods of interest, special days. These date details "are attached to virtually every fact table to allow [easy] navigation." Data Vault has a different structure from the...read more
Previously we outlined how to create the SSIS packages for ODE to load data into staging tables as a part of the table load. The steps were defined explicitly for use with BimlStudio which is a paid product from Varigence, but if you planning on doing things in a more...read more
This week we released version 5 of our Optimal Data Engine. Follow the link to download it from the GitHub. Optimal Data Engine (ODE) is an open source Data Warehouse automation tool. It is a config-driven engine that builds and manages your Data Vault. ODE works on...read more
One of the things I admire the Data Vault modelling approach for is the modular structure being represented by sets of hubs, links and satellites. From a developer’s perspective it provides enough flexibility and power to automate various tasks easily, such as...read more
A new version (version 4) of ODE is now available for download! ODE, Optimal Data Engine, is our open source product for building a Data Vault. Anyone can download it, install on their instance of SQL Server and develop a Data Vault of their own. New features include:...read more
During our Data Vault journey, we found that having naming and coding standards is important when working in a team. Having a consistent environment improves collaboration. With consistent table and column names, it's easy to pick up objects to create a select...read more
Optimal Data Engine Version 2 is now available for download from GitHub! Here's what it means for you. The biggest change with the most impact is a new flag for Data Vault retired objects. The general rule is, once something gets into the Data Vault, it never goes...read more
I started working at OptimalBI a few months ago. My area of expertise is Data Warehouse development, using Microsoft SQL Server. I was, therefore, a good candidate to test how difficult it is to start using ODE (Optimal Data Engine). ODE is an open source application...read more
Following on Nic's excellent blog on Migrating Configuration Data, I would like to elaborate a bit on the style, which he selected. From the options Nic outlined, he chose the route of grouping objects into a release, packaging up the Config and releasing it. This is...read more
We think Data Vault is pretty cool and continually recommend it as the best way to model your data warehouse. Here are three reasons why! 1. Model Quickly Data Vault allows you to very quickly transform your source data into structures that store history and better...read more
I recently attended a course run by Hans Hultgren on Data Vault Modelling. I have a small confession to make at this point; Sorry Hans, I've never read your book. The good news for me was that the course doesn't require you to have read the book first. It does assume...read more
Last month we had Hans Hultgren, Data Vault extraordinaire, teaching his Data Vault Modelling and Certification course right here at our very own OptimalBI office in Wellington. I had the privilege of doing this course in Sydney last year and walked away Data Vault...read more
Just Ship It Already! Steve Jobs said "Real Artists Ship" . Ma.tt has a great blog about shipping One point oh. And there is a loverly comment at the end of the blog that goes: "A great entrepreneur once told me that “an idea without execution is worthless.” We can...read more
There are a number of techniques, which can be employed for building Star Schemas off a Business Data Vault.
Each has its merits, depending on your requirements.
The “Gold Standard” for building Star Schemas is to be able to make them Virtual.read more
We have recently been discussing various ways we can promote our configuration data for Optimal Data Engine (ODE) from one environment to the next. Our config data is the heart and soul of ODE, it is the foundation for the entire engine.
The config in ODE is the relational data model we have built that holds all of the configuration required to make ODE run. It includes definitions for all source and targets, as well as any mappings.read more
When we decided to start building ODE we knew a few things already. One of those things was that most of our customers already had data warehousing technology.
They had already invested in Microsoft, Oracle, IBM, SAS, Teradata, Informatica or any of the other raft of data warehouse repositories and ELT technologies that are abound in the market place.
We also knew that it would be a big decision on their part to throw out this technology and implement the technology we decided to pick to be able to use ODE and gain the AgileBI benefits that it provides.read more
About two years ago we started a journey into the world of AgileBI. It all started out of frustration (as most journeys do), frustration about the magic sauce.
The magic sauce was the reason why one data warehouse project would be a raving success and the next a ‘meh’. It was often the reason that after delivering a raving success and then returning 12 months later I would find the data warehouse had become a ‘meh’. By that I mean not updated, not well managed and not delivering to the stakeholders expectations.read more