Advanced Data Vault Course
However, these were intense two days with not only slides to cover, but also discussions and modelling exercises.
Hans has updated us on some recent changes to the DV standards, they are also listed on DVStandards.com website. These standards are accepted by The Data Vault & Ensemble Enthusiasts consortium as the current guidelines of Data Vault modelling. Although, the core of the Ensemble modelling and Data Vault stays the same, some patterns could be considered outdated and some better solutions could be found over time. This comes from the experience, and the change to standards is voted to be the new recommended best practice.
One of the most important and somewhat complex updates is a keyed-instance hub, it’s probably not easy to understand from reading, but would be obvious with the exercises. A Keyed-Instance Hub is a Hub that represents a logical 1:1 relationship with a Link, such that the grain of key represented by that Link, can be described (Satellites) or associated (Links) to other concepts in the model. i.e. you have already had a keyed-instance hubs in your DV model, they just acquired the name now. For example, the Sale_Line_Item hub, we always had it among the examples of sales business area model, but now we know it is a keyed-instance of the Sale and Product natural business relationship. The important update is that every relationship should have a keyed-instance hub, even if one if not easily picked up from the core business concepts business talks about.
It is always emphasised that the Ensemble modelling is following the Agile paradigm. The original Agile manifesto was written for the software development, so the The Data Vault & Ensemble Enthusiasts consortium has slightly adjusted it to be fully applicable to data warehousing. I think this is cool, I have used a reference to Agile manifesto on my customer’s projects from time to time to defend the way we develop a Data Vault, so personally I found it very helpful.
The discussion topics included data privacy, the Data Vault architecture and multi-timeline problems: there is never a simple answer to these problems, but rather a solution should be found for each specific implementation. On the privacy discussion we had tried to guess if the “right to be forgotten” has a loophole of forgetting the debts and misconducts alongside the rest of the information about a person. But we all agreed that the privacy should always be considered when modelling the data. On the architecture topic we had a heated discussion on “unfolding” the Same-As links into one “golden” hub record: what layer is the most appropriate for that, and whether it’s a physical or virtual one. The answer depends on many factors, including the level of the logic complexity that needs to be applied to co-located data until it could be considered integrated. Data multi-timeline is related to the problem of the current record from the business perspective vs. current record from the machines point of view: these could be different depending on the data capture mechanisms. We have just agreed to admit that this could be a can of worms, so the rational answer would be not to open it until you face the problem, i.e. keep it simple if possible. Another thing was a JSON data type in the Data Vault implementation. I firmly believe that this would break the pattern and it is not acceptable in the Business Vault layer, but the rest of the group had different thoughts on this, because there’s more and more non-structured data to handle.
The most valuable part for everyone on this course was sharing the experience: no two Data Warehouses are the same and all of us have faced different obstacles when applying the theory into practice. Sharing the ideas on modelling challenges like these could also be valuable. It’s good to stay in the loop.
Hans has mentioned the Ensemble Logical Modelling (ELM) workshops, which are a way to gather requirements. Data Vault model is totally dictated by business: it’s no more than a representation of core business concepts users talk about. But until the last year requirement gathering for the Data Vault hadn’t been formalised, so every team did the best they could do collecting the information from the users. One of the options is the BEAM methodology, which is the business requirement gathering methodology for data warehouses; but the artifact of BEAM is a star schema model, so some tweaking was required to achieve an ensemble model at the end. ELM workshop is a way to collect the information from business that naturally produces the Ensemble model instead. One course that prepares ELM workshop facilitators is scheduled to run in Wellington in July, if you are interested.
In general this was a great trip: apart from all the new things I’ve learned, there were dishes and beverages to enjoy as well. I’m eager to apply new knowledge to the next Data Vault project. Apparently, we are doing it well in Wellington: our Australian colleagues complained that companies are not so quick to adopt changes (so, many new concepts were truly new to them, while us kiwis were aware of most of them already) and also there’s not enough certified DV specialists to fulfill the market demand. Keep vaulting!