ODE – Migrating Configuration Data
We have recently been discussing various ways we can promote our configuration data for Optimal Data Engine (ODE) from one environment to the next. Our config data is the heart and soul of ODE, it is the foundation for the entire engine.
The config in ODE is the relational data model we have built that holds all of the configuration required to make ODE run. It includes definitions for all source and targets, as well as any mappings. Being such an integral part of ODE, it is essential we are able seamlessly promote new or updated configuration code from one environment to the next. Ultimately, it comes down to making this process as clean and easy for our customers as possible. We don’t want them to have to deal with the nitty gritty of a code release. We want to package our config up in a nice little bundle, deliver it, and have them install it without breaking a sweat.
Over a number of discussions a list of different options was collected on potential ways to capture changes and migrate the config code:
- Compare scripts between environments to identify differences
- A flag column to identify new config entries to be promoted
- A back-end logging mechanism
- A trigger based mechanism
- A SQL merge statement
- SQL insert, update and delete scripts
- Version or release numbers to indicate a group of changes to be promoted
- Version or release dates to indicate a range of changes to be promoted
- Changes pushed as code into GIT and then pulled into the new environment
It obviously wasn’t as simple as picking one item off the list. Many options had different merits, and the end result ended up being a bit of a mash between the options and they continue to evolve as discussions continue. Two key challenges became evident during these discussions. Firstly, we need to get all our ducks in a line so we know what we are releasing, and that we are also releasing it in the right order. Secondly, we need to devise a simple solution for packaging it all up and releasing it into the next environment. None of this should impact the customer, or cause them any undue stress.
Challenge One – Versioning
Multiple development streams can occur at any one time on the config and this means each section of code can be at a different development phase to another. We need a way to distinguish between what is ready to be released and what is still being developed and tested. We also have to take into account the various dependencies that may exist between different pieces of config and ensure that piece one gets released before piece two etc. This makes it unrealistic to release the entire config code, as it currently stands, into the next environment. We need a process to select and choose what we want to release and when we want to release it.
In order to successfully do this, we need to develop a tightly controlled versioning process, firstly to keep track of what has already been released, and secondly to bundle together specific changes into releases while also taking into account dependencies between pieces of config. The simplest approach to this is to include a version column on all of our config tables. This could also be called a release number. We toyed with the idea of using version date, which would enable us to select all config after a specified date, however a version number gave us slightly more flexibility as we could distinguish between two new pieces of config that may have the same version date. The version number will be controlled by the developer, instead of a trigger for instance, as we need control of what changes go in what release. While a trigger may give us automation, it limits us in distinguishing between concurrent changes that need to be implemented separately.
In some cases, our config data may be created or updated on a regular basis. Decision tables are a prime example of this. Keeping control of versioning and when to release new rules from this table may become rather chaotic. For this challenge, we have discussed the idea of an “in progress” table where new rules and mappings in development are first created in their “in progress” table, only moving to the final decision table once they are ready for release.
It is essential we keep a history of all changes that have happened in the config tables. For this, we can use a trigger that automatically populates a history table. This provides us with a number of benefits including being able to determine if a row has been inserted, updated or deleted, and also gives us a base for rolling back a release. An alternative approach would be to create a view over the top of the history tables, with only the current rows. This means we would only have to maintain one table.
We also need to know what is the current release in subsequent environments. We can do this by implementing an audit table that keeps track of what release has been implemented in an environment. This table can track valuable information such as who commissioned the release and when the release happened.
Challenge Two – Packaging
The next step is determining how we are going to package the code so we can effortlessly promote and execute the code without unnecessary manual steps. We decided the best approach for this is to create a script that automatically generates a SQL merge statement containing any data pertaining to that release number. Using simple dynamic SQL, we can extract all the data we need from the config tables by filtering on the release number. Each version number encompasses one release, so each row that has a version of 5 will be released as a whole.
This code could then be pushed to Git, ready for another developer to pull it down and execute in the subsequent environment. A SQL merge statement gives us the flexibility to handle inserts, updates and deletes while also being able to package it all up nicely into a single piece of code.
While this approach makes it easy for a customer to accept or install a new release of config data, it also enables us to rollback any release using a combination of the version number and config history tables. Should a release fail, or if we decide to “clean out” a development version, we can delete everything with that particular version number and then re-implement any updates using the config history tables. A delete script can easily be written for any release using the version number as a filter, and the update script from the history table into the current config can be a standardised script.
That’s the original plan formulated, now comes the fun task of putting it all into action. No doubt we will come across a few hurdles, and we will have to adapt accordingly with some ideas replaced by others.
Until next time, Nic!
We were recently approached by a client with an interesting job - they wanted us to create a solution for Persistent Staging Area for them, and the requirements were quite broad. They had an MS SQL Server as their RDBMS and the loading tool was SSIS, just to stay in...read more
A couple of years ago the OptimalBI team developed Metrics Vault using Optimal Data Engine (ODE) functionality. The purpose of Metrics Vault is to collect various statistics about data vaults configured in ODE database. This includes not only data vault tables...read more
Almost everything a data warehouse is asked to do involves dates: days, weeks, months, years, periods of interest, special days. These date details "are attached to virtually every fact table to allow [easy] navigation." Data Vault has a different structure from the...read more
Previously we outlined how to create the SSIS packages for ODE to load data into staging tables as a part of the table load. The steps were defined explicitly for use with BimlStudio which is a paid product from Varigence, but if you planning on doing things in a more...read more
This week we released version 5 of our Optimal Data Engine. Follow the link to download it from the GitHub. Optimal Data Engine (ODE) is an open source Data Warehouse automation tool. It is a config-driven engine that builds and manages your Data Vault. ODE works on...read more
One of the things I admire the Data Vault modelling approach for is the modular structure being represented by sets of hubs, links and satellites. From a developer’s perspective it provides enough flexibility and power to automate various tasks easily, such as...read more
A new version (version 4) of ODE is now available for download! ODE, Optimal Data Engine, is our open source product for building a Data Vault. Anyone can download it, install on their instance of SQL Server and develop a Data Vault of their own. New features include:...read more
During our Data Vault journey, we found that having naming and coding standards is important when working in a team. Having a consistent environment improves collaboration. With consistent table and column names, it's easy to pick up objects to create a select...read more
Optimal Data Engine Version 2 is now available for download from GitHub! Here's what it means for you. The biggest change with the most impact is a new flag for Data Vault retired objects. The general rule is, once something gets into the Data Vault, it never goes...read more
I started working at OptimalBI a few months ago. My area of expertise is Data Warehouse development, using Microsoft SQL Server. I was, therefore, a good candidate to test how difficult it is to start using ODE (Optimal Data Engine). ODE is an open source application...read more
Following on Nic's excellent blog on Migrating Configuration Data, I would like to elaborate a bit on the style, which he selected. From the options Nic outlined, he chose the route of grouping objects into a release, packaging up the Config and releasing it. This is...read more
We think Data Vault is pretty cool and continually recommend it as the best way to model your data warehouse. Here are three reasons why! 1. Model Quickly Data Vault allows you to very quickly transform your source data into structures that store history and better...read more
I recently attended a course run by Hans Hultgren on Data Vault Modelling. I have a small confession to make at this point; Sorry Hans, I've never read your book. The good news for me was that the course doesn't require you to have read the book first. It does assume...read more
Last month we had Hans Hultgren, Data Vault extraordinaire, teaching his Data Vault Modelling and Certification course right here at our very own OptimalBI office in Wellington. I had the privilege of doing this course in Sydney last year and walked away Data Vault...read more
Just Ship It Already! Steve Jobs said "Real Artists Ship" . Ma.tt has a great blog about shipping One point oh. And there is a loverly comment at the end of the blog that goes: "A great entrepreneur once told me that “an idea without execution is worthless.” We can...read more
There are a number of techniques, which can be employed for building Star Schemas off a Business Data Vault.
Each has its merits, depending on your requirements.
The “Gold Standard” for building Star Schemas is to be able to make them Virtual.read more
At OptimalBI we are huge fans of modelling the data warehouse using the Data Vault methodology. That’s because Data Vault allows us to provide you with a data warehouse that you can continue to use long after we are gone and continue to add to yourselves.
We are often asked whether the “extra” Data Vault layers are necessary or just a pesky overhead in an end-to-end data warehouse architecture.
To that we say: not only are the Data Vault layers necessary, they are arguably the most important layers in your data warehouse, and argue that I shall!read more
When we decided to start building ODE we knew a few things already. One of those things was that most of our customers already had data warehousing technology.
They had already invested in Microsoft, Oracle, IBM, SAS, Teradata, Informatica or any of the other raft of data warehouse repositories and ELT technologies that are abound in the market place.
We also knew that it would be a big decision on their part to throw out this technology and implement the technology we decided to pick to be able to use ODE and gain the AgileBI benefits that it provides.read more
About two years ago we started a journey into the world of AgileBI. It all started out of frustration (as most journeys do), frustration about the magic sauce.
The magic sauce was the reason why one data warehouse project would be a raving success and the next a ‘meh’. It was often the reason that after delivering a raving success and then returning 12 months later I would find the data warehouse had become a ‘meh’. By that I mean not updated, not well managed and not delivering to the stakeholders expectations.read more