ODE – The start of a journey
There must be a better way!
About two years ago we started a journey into the world of AgileBI. It all started out of frustration (as most journeys do), frustration about the magic sauce.
The magic sauce was the reason why one data warehouse project would be a raving success and the next a ‘meh’. It was often the reason that after delivering a raving success and then returning 12 months later I would find the data warehouse had become a ‘meh’. By that I mean not updated, not well managed and not delivering to the stakeholders expectations.
My view was that often the delivery team had somebody who had built many data warehouses and just intrinsically knew what to do to be successful. They could also take the team with them on that journey to make sure the entire project was successful.
Ask them how they did it and you would either get a smirk or a shrug.
So we started a journey to see how we could make the process repeatable and able to be delivered by our entire team, and more importantly, by our customers once we had finished the initial delivery.
Welcome to the world of Agile
That lead us to the world of Agile. Agile is great and a mature approach, but Agile for Data Warehousing not so much, leading us to investing time in defining an AgileBI approach.
The first thing we did was understand what we needed to deliver this approach, and we ended up with our 5 AgileBI circles.
Nothing revolutionary there, but it gave us roadmap of what we were looking for.
We were lucky enough to find the BEAM* approach from Lawrence Corr, which gave us the ability to gather data driven business requirements with agility. It also gave us a portion of the data modelling approach we needed.
But what about the data?
BEAM* is great, but we were still stuck with the problem of how we modelled and integrated the data that we required without a big upfront design phase followed by month and months of unique ETL development.
We understand while a lot of the business context (i.e Customer, Policy, Cover) is always unique to an industry and a lot of business rules are unique to a customer, we always find there are some things we always build the same way for each project (need a date dimension anyone?).
You can have the best data integration team around, but you will find that they all have their own slightly different way of coding around the same problem. And for real fun try getting them to agree what naming standards should be used!
Enter the Data Vault
It has a bunch of benefits that I will outline in a future post, but one of the major benefits for us was the ability to have relatively simple code that could be used to automate a lot of the dross work we always did as part of our data warehouse builds.
The other benefit was there were quite a few smart people around the world who were doing some heavy thinking on how to improve the approach.
So we decided to do what we always do when we find something new, exciting and promising. We would give it a go.
We hired Brian who had spent time building a Data Vault for Xero and was a proven guru, and we sent a couple of our team off to training and certification in Australia.
Just use an ETL tool right
Now we had a team who knew what it was, and how to build one. Let the coding begin!
We tell our customers its better to buy than it is to build, so we spent some time looking for software we could use to automate the building of the vaults.
There are not many options out there and the ones we found were either a standard ETL tool (or ELT tool) that were used in a certain way to deliver the vault structures and data needed. Or they were data vault specific tools that were focussed on automating the data loading and not applying the business rules that were needed.
We were not enamoured with either approach.
So we did what all New Zealand companies do in this situation, bring out the number 8 fencing wire and roll our own.
Research It, Build it, Prove it, Rinse and Repeat
We have learnt that embarking on a massive project to build these types of products is asking for a hiding and is far from Agile. We have also learnt that a customer priority will always arise that means we have to halt development for a while and then pick it up later.
So we have become very good at managing the process of chunking work down into bits that we can build and use to prove each capability or component. Also this helps us invest in research work upfront each time we are approaching a new area that we have not done before. We have found that this research-it, then build-it approach has resulted in a much higher success rate. As well as the ability to stop when we hit a gnarly problem that will just suck effort with little chance of success.
Hell thats the art of Agile right.
We have also found that implementing each bit in anger on real projects also helps us harden the product, and focus on the next piece of development that would provide the highest value.
So we are now at the stage we have a base of pretty cool code that automates parts of the data vault process. We have also proven it works within projects.
We have designed a cool architecture for the product which means we can deploy it on multiple technology platforms (Microsoft, Oracle, SAS, R etc) while still retaining a core design and code base.
Don’t get me wrong we still have a long road to go before it does everything we need, let alone everything we want.
Lets make the world a better place
At the stage that we had to decide how to move the product to a production ready product and that means we had to decide on our go to market approach.
Our choices are as always:
- Commercial Licensed Product
- Software as a Service offering
- Open Source
- Some weird arse alternative
I love WordPress for so many reasons. One is their ability to produce a full open source product and then have a commercial backbone that makes sure it is constantly enhanced. They do this without having to resort to the n-1 or hold out enterprise features approach all the other Commercial Open Sources vendors spin.
Another reason is that the wordpress community add so many cool features and addons to the product that it really does grow at a rate of knots, that is bigger than the core wordpress team.
Data Vault and DW Automation have been around for a long time, but for some reason it is still not a widely adopted approach. I believe one of the reasons is because there is not any readily available software to easily help you adopt this approach.
So we have decided to open source our product and see if we can help make the world a better place (or data warehouse delivery easier, faster and more successful at least).
Say welcome to Optimal Data Engine. We pronounce it ODE as in the lyrical stanza.
(those that have known me for too long know I love Steve Jobs power of 3 and I also love post rationalisation of a decision, not to mention characterisation of products, ODE covers so many of those it isn’t funny!)
And the so journey begins
The journey so far has been far from smooth and we know its only going to get bumpier.
So I have decided to blog each week to record the things we find, good or bad.
Buckle up baby and lets get started!
We were recently approached by a client with an interesting job - they wanted us to create a solution for Persistent Staging Area for them, and the requirements were quite broad. They had an MS SQL Server as their RDBMS and the loading tool was SSIS, just to stay in...read more
A couple of years ago the OptimalBI team developed Metrics Vault using Optimal Data Engine (ODE) functionality. The purpose of Metrics Vault is to collect various statistics about data vaults configured in ODE database. This includes not only data vault tables...read more
Almost everything a data warehouse is asked to do involves dates: days, weeks, months, years, periods of interest, special days. These date details "are attached to virtually every fact table to allow [easy] navigation." Data Vault has a different structure from the...read more
Previously we outlined how to create the SSIS packages for ODE to load data into staging tables as a part of the table load. The steps were defined explicitly for use with BimlStudio which is a paid product from Varigence, but if you planning on doing things in a more...read more
This week we released version 5 of our Optimal Data Engine. Follow the link to download it from the GitHub. Optimal Data Engine (ODE) is an open source Data Warehouse automation tool. It is a config-driven engine that builds and manages your Data Vault. ODE works on...read more
One of the things I admire the Data Vault modelling approach for is the modular structure being represented by sets of hubs, links and satellites. From a developer’s perspective it provides enough flexibility and power to automate various tasks easily, such as...read more
A new version (version 4) of ODE is now available for download! ODE, Optimal Data Engine, is our open source product for building a Data Vault. Anyone can download it, install on their instance of SQL Server and develop a Data Vault of their own. New features include:...read more
During our Data Vault journey, we found that having naming and coding standards is important when working in a team. Having a consistent environment improves collaboration. With consistent table and column names, it's easy to pick up objects to create a select...read more
Optimal Data Engine Version 2 is now available for download from GitHub! Here's what it means for you. The biggest change with the most impact is a new flag for Data Vault retired objects. The general rule is, once something gets into the Data Vault, it never goes...read more
I started working at OptimalBI a few months ago. My area of expertise is Data Warehouse development, using Microsoft SQL Server. I was, therefore, a good candidate to test how difficult it is to start using ODE (Optimal Data Engine). ODE is an open source application...read more
Following on Nic's excellent blog on Migrating Configuration Data, I would like to elaborate a bit on the style, which he selected. From the options Nic outlined, he chose the route of grouping objects into a release, packaging up the Config and releasing it. This is...read more
We think Data Vault is pretty cool and continually recommend it as the best way to model your data warehouse. Here are three reasons why! 1. Model Quickly Data Vault allows you to very quickly transform your source data into structures that store history and better...read more
I recently attended a course run by Hans Hultgren on Data Vault Modelling. I have a small confession to make at this point; Sorry Hans, I've never read your book. The good news for me was that the course doesn't require you to have read the book first. It does assume...read more
Last month we had Hans Hultgren, Data Vault extraordinaire, teaching his Data Vault Modelling and Certification course right here at our very own OptimalBI office in Wellington. I had the privilege of doing this course in Sydney last year and walked away Data Vault...read more
Just Ship It Already! Steve Jobs said "Real Artists Ship" . Ma.tt has a great blog about shipping One point oh. And there is a loverly comment at the end of the blog that goes: "A great entrepreneur once told me that “an idea without execution is worthless.” We can...read more
There are a number of techniques, which can be employed for building Star Schemas off a Business Data Vault.
Each has its merits, depending on your requirements.
The “Gold Standard” for building Star Schemas is to be able to make them Virtual.read more
At OptimalBI we are huge fans of modelling the data warehouse using the Data Vault methodology. That’s because Data Vault allows us to provide you with a data warehouse that you can continue to use long after we are gone and continue to add to yourselves.
We are often asked whether the “extra” Data Vault layers are necessary or just a pesky overhead in an end-to-end data warehouse architecture.
To that we say: not only are the Data Vault layers necessary, they are arguably the most important layers in your data warehouse, and argue that I shall!read more
We have recently been discussing various ways we can promote our configuration data for Optimal Data Engine (ODE) from one environment to the next. Our config data is the heart and soul of ODE, it is the foundation for the entire engine.
The config in ODE is the relational data model we have built that holds all of the configuration required to make ODE run. It includes definitions for all source and targets, as well as any mappings.read more
When we decided to start building ODE we knew a few things already. One of those things was that most of our customers already had data warehousing technology.
They had already invested in Microsoft, Oracle, IBM, SAS, Teradata, Informatica or any of the other raft of data warehouse repositories and ELT technologies that are abound in the market place.
We also knew that it would be a big decision on their part to throw out this technology and implement the technology we decided to pick to be able to use ODE and gain the AgileBI benefits that it provides.read more