Data Vault: Agile Data Warehousing for the Future

by | Aug 15, 2014

What happens when you take the best of 3rd Normal Form and the best of the Dimensional Star Schema, and mix them together?  You get Data Vault.

Data Vault is a data warehousing modelling technique that makes designing a robust, agile, historical and future-proof data warehouse a breeze.

Data Vault is modeled using Hubs, Links and Satellites.  These are the names given to a type of table, much in the same way a Star Schema is made up of Dimensions and Facts.

Hub Tables

The first step in modelling the Data Vault is breaking the business into core subject areas, such as Customer, Sale, Employee etc.  Each core subject area is then turned into a HubHub table.  The Hub  table contains just the business key from the source system(s), such as Customer_Id.  Accompanying the business key, is a generated surrogate key, which is something that is unique and has a one to one relationship with the business key; a load date time, for when the record entered the data warehouse; and a record source identifier, which identifies which source the record originated from. In short, the Hub is effectively a table that contains only a business key.  Nothing else.

Link Tables

Link tables form a relationship between Hubs.  This is how one business subject area connects to another within the Data Vault.  The Customer_Hub might Linkjoin to the Sale_Hub via the  Customer_Sale_Link table.  The link table contains the surrogate keys from each joining Hub that form that relationship.  Just like the Hub, it also has a generated surrogate key, load date time and record source id.  Each and every relationship that is used within the data warehouse is represented by a link table.  Keep in mind that a link table joins as many hubs as needed to form a specific relationship.

Satellite Tables

Satellite tables are where all the descriptive information about a Hub, or Link, are stored.  Each Hub can have one to many Satellite tables, that Satelliteall contain information about that business key.  In the context of the Customer_Hub, there could be a Customer_Name_Satellite and a Customer_Address_Satellite.  The key of a Satellite is made up of the surrogate key from the connecting Hub and the date time the record was loaded.  This enables the Data Vault to keep track of all history from the source systems.

An Example Snippet of a Data Vault Model

Data Vault

The concept of Hub, Link and Satellite based design enables the data warehouse to easily add sources in the future.  New Hubs and Satellites can be created from the new source systems and join already existing Hubs via Link tables.  In the same manner, new Satellites can also be created for already existing Hubs.  This prevents having to modify already existing tables.  As seen in the diagram below, one source system can be removed (red), and another one added (green).  The already existing Hubs, Links and Satellites in the Data Vault can be sliced and diced to accommodate the new Hubs, Satellites and Links.

Agile Data Vault

Instead of having to build your data warehouse 100% correct at the beginning, with Data Vault, you can continually edit and add pieces to suit the current needs of the business.  This agile approach allows for segments of a data warehouse to be built quickly, which will no doubt please the business owners who get to see results quickly.  Traditionally, with a data warehouse, developers can spend months or even years building and testing the logic, before the business sees any benefit.

If you are looking for a data warehouse solution to stand the evolving test of time, then Data Vault could be your solution.

Cheers, Nic

4 Comments
  1. Jaiwanth Rajashekar

    Superbly explained ,my many thanks as I am new to this concept with little or no help to say the least. This article made me aware of lot of things as it is so simple and succinctly written. Many thanks for this information

    Reply
    • Jack Prichard

      Hi Jaiwanth,
      Nic says thanks for reading his blog and commenting, it’s always good to get positive feedback.
      Have you had a look at our other Data Vault blogs? You can find them out here.

      Reply
  2. jaiwanth

    Sorry in case I didn’t reply ,I read most of the posts you mentioned in your comment ,many times re read it again. Thanks for such useful posts

    Reply
  3. Brent Leslie

    Very interesting, nice post. Would I be right in saying that data vault provides another layer of data abstraction pre reporting system? Having worked (and working on) various reporting systems I can definitely see the benefit…

    Reply

Trackbacks/Pingbacks

  1. ODE – The start of a journey | OptimalBI - […] We came across an approach called Data Vault.  Its an approach to structuring data based on ensemble modelling. […]
  2. Brian Bradley – helping you Master the Vault | OptimalBI - […] developed a lot of data warehouses over the years and is a huge fan of the Data Vault methodology which…
  3. Why your Data Warehouse needs Data Vault | OptimalBI - […] OptimalBI we are huge fans of modelling the data warehouse using the Data Vault methodology. That’s because Data Vault…
Leave a Reply to Brent Leslie Cancel reply

Your email address will not be published. Required fields are marked *