During our Data Vault journey, we found that having naming and coding standards is important when working in a team. Having a consistent environment improves collaboration. With consistent table and column names, it’s easy to pick up objects to create a select statement. With a clear coding standard, any team member is able to fix and update the code. Also name patterns always help when you’re planning to automate something later (as an example, check this script for automated hub and satellite creation).
Our Optimal Data Engine helps in building Data Vaults as it allows any naming convention to be applied to it. But Data Vaults quickly become a mess if each developer uses their own naming convention. On the ODE.ninja website, SQL code samples follow the database standard we agreed to use in our team.
It’s good to have standards defined before the development has started, however, sometimes unexpected cases appear after. So, actually, it is good to have a second round of standards definition. We found it’s good to book a short meeting with all the developers in the room. You quickly find out that two developers have already interpreted the same standard differently. But also this meeting gives an opportunity to agree on any missing pieces, hard to find answers and share the ideas.
It is nice to have the standards document shared and available to everyone in the team. It also helps new people to pick up standards quickly when onboarding a new person. However, most of the time developers will just look at the existing database objects and apply the same standards. If you have one table that doesn’t match the standard, you can almost be certain they’ll pick that one; I’ve seen it happen so many times! So, it is very important to fix any objects that don’t follow the standard immediately, before too many other things rely on it.
Within the Raw Vault is a persistent storage for all the raw data; we found it is nice to keep the full source description in the Raw table name separated by double underscores, e.g. AdventureWorks__Sales__Customer. Some source systems have underscores in table names; with the double underscore separator there’s no ambiguity when reading the source table pedigree, e.g. Adventure_Works__Purchasing__Purchase_Order_Detail.
Business Vault object naming should be aligned with the business concepts. It is applicable for both table and column names. Choose your objects naming style, agree on NameCapitalisation, Words_Separator and descriptive prfx_Prefixes or Suffixes_sfx. Using meaningful names save time for everyone as developers and report writers don’t need to open the data dictionary to pick up the proper field or table.
Another part of the Data Vault standards are the coding standards. Usually, we implement business logic in stored procedures. It is good practice to compose and share a code template that can be copied for each new procedure. The template we use has the following content:
- A Procedure Header where the developer puts the creation date, their name and a reasonable business description.
- A piece of code that flushes the old data from the staging table.
- A block of common table expressions to list all the source tables to be used in code. We found that common table expressions make the code more readable.
- A placeholder for the code itself.
I’ll also mention inline comments. They don’t need to be too descriptive, but it’s always good to leave hints here and there about what’s going on in code.There are plenty of free tools for code formatting, they only take one click to make your code pretty.
Hope these tips will help you in your Data Vault journey.
Masseuse of all the Data – Kate
Kate blogs about the details that make the Data Warehouses work
You can read ODE Version 2: What’s New or all of Kate’s blogs here.
We run regular Data Vault course for business analysts, data architects, and business intelligence developers in Wellington and Auckland. Find out more here