How to maintain Data Vault standards

by | Sep 6, 2016

During our Data Vault journey, we found that having naming and coding standards is important when working in a team. Having a consistent environment improves collaboration. With consistent table and column names, it’s easy to pick up objects to create a select statement. With a clear coding standard, any team member is able to fix and update the code. Also name patterns always help when you’re planning to automate something later (as an example, check this script for automated hub and satellite creation).
Our Optimal Data Engine helps in building Data Vaults as it allows any naming convention to be applied to it. But Data Vaults quickly become a mess if each developer uses their own naming convention. On the ODE.ninja website, SQL code samples follow the database standard we agreed to use in our team.
It’s good to have standards defined before the development has started, however, sometimes unexpected cases appear after. So, actually, it is good to have a second round of standards definition. We found it’s good to book a short meeting with all the developers in the room. You quickly find out that two developers have already interpreted the same standard differently. But also this meeting gives an opportunity to agree on any missing pieces, hard to find answers and share the ideas.
It is nice to have the standards document shared and available to everyone in the team. It also helps new people to pick up standards quickly when onboarding a new person. However, most of the time developers will just look at the existing database objects and apply the same standards. If you have one table that doesn’t match the standard, you can almost be certain they’ll pick that one; I’ve seen it happen so many times! So, it is very important to fix any objects that don’t follow the standard immediately, before too many other things rely on it.

DataVaultArch

Source: OptimalBI

We recommend you follow the Data Vault architecture standard when you develop your Data Vault with ODE. Put the “ODE” prefix to all the databases for Data Vault, so that they are visually bundled on the server object list.
Within the Raw Vault is a persistent storage for all the raw data; we found it is nice to keep the full source description in the Raw table name separated by double underscores, e.g. AdventureWorks__Sales__Customer. Some source systems have underscores in table names; with the double underscore separator there’s no ambiguity when reading the source table pedigree, e.g. Adventure_Works__Purchasing__Purchase_Order_Detail.
Business Vault object naming should be aligned with the business concepts. It is applicable for both table and column names. Choose your objects naming style, agree on NameCapitalisation, Words_Separator and descriptive prfx_Prefixes or Suffixes_sfx. Using meaningful names save time for everyone as developers and report writers don’t need to open the data dictionary to pick up the proper field or table.
Another part of the Data Vault standards are the coding standards. Usually, we implement business logic in stored procedures. It is good practice to compose and share a code template that can be copied for each new procedure. The template we use has the following content:

  • A Procedure Header where the developer puts the creation date, their name and a reasonable business description.
  • A piece of code that flushes the old data from the staging table.
  • A block of common table expressions to list all the source tables to be used in code. We found that common table expressions make the code more readable.
  • A placeholder for the code itself.

I’ll also mention inline comments. They don’t need to be too descriptive, but it’s always good to leave hints here and there about what’s going on in code.There are plenty of free tools for code formatting, they only take one click to make your code pretty.
Hope these tips will help you in your Data Vault journey.
Masseuse of all the Data – Kate
Kate blogs about the details that make the Data Warehouses work
You can read ODE Version 2: What’s New or all of Kate’s blogs here.
We run regular Data Vault course for business analysts, data architects, and business intelligence developers in Wellington and Auckland. Find out more here

2 Comments
  1. Tobias Maasland

    Hey Kate,
    first of all thanks for the list of tips. We use a similar approach but more oriented on the process of the Data Vault. So a staging table would begin with the prefix “Stg_”.
    What I was wondering about: You wrote that you flush old data from the staging area when using stored procedures for business rules. I understood the business rules should be implemented as far downstream as possible. So could it be that you meant “hard business rules” in this case?

    Reply
  2. Kate Loguteva

    Hi Tobias
    Thanks for your comment. I’m glad you found my tips useful.
    ODE doesn’t have built-in ETL features, it require processed or unprocessed data to be landed in the stage table. ODE merges stage table to the configured hub, link or satellite. But also ODE could run a user-defined stored procedure before the merge. We use these SPs just for the data acquisition from source into the Stage for the Raw Vault. For loading data from the Raw to Business Vault, ODE uses the same idea; SP processes Raw data, applies “hard” business rules and loads transformed data to the Stage table, then ODE merges data to the DV objects.

    Reply

Trackbacks/Pingbacks

  1. Data Replication: SSIS or Attunity? | OptimalBI - […] to read more? Try How to Maintain Data Vault Standards or more from […]
  2. Data Vault blogs you need to read | OptimalBI - […] How to maintain Data Vault standards […]
Submit a Comment

Your email address will not be published. Required fields are marked *