What happens when you take the best of 3rd Normal Form and the best of the Dimensional Star Schema, and mix them together? You get Data Vault.
Data Vault is a data warehousing modelling technique that makes designing a robust, agile, historical and future-proof data warehouse a breeze.
Data Vault is modeled using Hubs, Links and Satellites. These are the names given to a type of table, much in the same way a Star Schema is made up of Dimensions and Facts.
Hub Tables
The first step in modelling the Data Vault is breaking the business into core subject areas, such as Customer, Sale, Employee etc. Each core subject area is then turned into a
Hub table. The Hub table contains just the business key from the source system(s), such as Customer_Id. Accompanying the business key, is a generated surrogate key, which is something that is unique and has a one to one relationship with the business key; a load date time, for when the record entered the data warehouse; and a record source identifier, which identifies which source the record originated from. In short, the Hub is effectively a table that contains only a business key. Nothing else.
Link Tables
Link tables form a relationship between Hubs. This is how one business subject area connects to another within the Data Vault. The Customer_Hub might
join to the Sale_Hub via the Customer_Sale_Link table. The link table contains the surrogate keys from each joining Hub that form that relationship. Just like the Hub, it also has a generated surrogate key, load date time and record source id. Each and every relationship that is used within the data warehouse is represented by a link table. Keep in mind that a link table joins as many hubs as needed to form a specific relationship.
Satellite Tables
Satellite tables are where all the descriptive information about a Hub, or Link, are stored. Each Hub can have one to many Satellite tables, that
all contain information about that business key. In the context of the Customer_Hub, there could be a Customer_Name_Satellite and a Customer_Address_Satellite. The key of a Satellite is made up of the surrogate key from the connecting Hub and the date time the record was loaded. This enables the Data Vault to keep track of all history from the source systems.
An Example Snippet of a Data Vault Model
The concept of Hub, Link and Satellite based design enables the data warehouse to easily add sources in the future. New Hubs and Satellites can be created from the new source systems and join already existing Hubs via Link tables. In the same manner, new Satellites can also be created for already existing Hubs. This prevents having to modify already existing tables. As seen in the diagram below, one source system can be removed (red), and another one added (green). The already existing Hubs, Links and Satellites in the Data Vault can be sliced and diced to accommodate the new Hubs, Satellites and Links.
Instead of having to build your data warehouse 100% correct at the beginning, with Data Vault, you can continually edit and add pieces to suit the current needs of the business. This agile approach allows for segments of a data warehouse to be built quickly, which will no doubt please the business owners who get to see results quickly. Traditionally, with a data warehouse, developers can spend months or even years building and testing the logic, before the business sees any benefit.
If you are looking for a data warehouse solution to stand the evolving test of time, then Data Vault could be your solution.
Cheers, Nic