ODE & 3CPO – Talking multiple languages
Everybody talks a different language
When we decided to start building ODE we knew a few things already. One of those things was that most of our customers already had data warehousing technology.
They had already invested in Microsoft, Oracle, IBM, SAS, Teradata, Informatica or any of the other raft of data warehouse repositories and ELT technologies that are abound in the market place.
We also knew that it would be a big decision on their part to throw out this technology and implement the technology we decided to pick to be able to use ODE and gain the AgileBI benefits that it provides.
Don’t force people to do what they don’t want to
So we decided to develop ODE in a way that meant they could continue to leverage the technology they already had if they wanted to.
This meant we had to be able to deploy ODE on multiple of platforms, leverage multiple data transformation languages and tools, plus read and write to multiple data repositories.
So we had a multi language problem
We though about writing ODE in java. In theory this would allow us to deploy on any platform and any technology.
We knew from experience we would probably end up being forced down a java server path for transformations, and we also knew most of the hard core data warehouse customers we work with would want it to run natively in the data repository they already have invested in.
We knew that there were some language translation tools we might be able use. Tools where we could write code in say tSQL and it would automagically convert it to PL/SQL. We tried a couple and our overall experience was “yeah, nah”
We also knew that like our team, our customers would want us to tune any data transformation code to run fast, and this meant being able to tune the code for each technology. And in fact for specific scenarios within a technology. For example using columnar store when it was available in the target data repository.
And last but not least we knew maintaining multiple versions of the same product would be a complete nightmare for us.
Our solution was to look at our plan for ODE and work out what we could architect as a standard shared component and what had to be specific to each technology. We also discovered we would need to manage a couple fo different deployment options as well.
So we ended up with a design we call 3CPO, which stands for:
- Code Generation
- Code Execution
Config is the relational data model we have built that holds all the configuration required to make ODE run.
This will include definitions of all source and targets, as well as any mappings. For example
- Raw Vault Hub, Sat and Links
- Business rules to be applied in the business vault
- Measure calculations
- Star schemas to be deployed
It will also include unique options to deploy for a specific environment or to a specific design pattern. For example:
- Whether to virtualise the dimensional star schemas or persist them.
- End date each satellite record, create tombstone records or do both
- Utilise columnar storage for a satellite
- Create an index
Config is the heart of ODE, without it there is nothing.
Code Gen is the code that builds Code Exec.
It reads the config and generates the code that is needed to create the structures and move the data.
(of course if you select the the virtualised options it will create views that see the data)
Code Gen will come in multiple flavours, so you can deploy it on the technology you already have.
At the moment we are planning to include (overtime):
- Microsoft tSQL
- Oracle PL/SQL
- Oracle ODI
There is nothing stopping the wider community building Code Gen and Code Exec for another platform (say Informatica) reusing the code patterns we have already defined.
Code Exec is the code that runs to create the structures or move the data.
(of course if you select the the virtualised options it will be views that see the data)
Code Exec will also come in the same multiple flavours as Code Gen.
Orchestration is the engine that runs the code exec.
It can be run in two design modes.
Engine mode is when the Code Gen and the Code Exec are executed at the same time.
So effectively ODE will look at the config, create the code exec, execute it, and then rinse and repeat.
Code Deployment Mode
Code Deployment Mode is when Code Gen creates the Code Exec and then you manually promote the Code Exec across your different environments (i.e. Dev > Test > Prod)
The execution engine that actually tells Code Gen when to build the Code Exec, and also tells Code Exec when to execute will be over to you for a while.
We will deal with it when we have finished building all the features required to support Config, Code Gen and Code exec that will manage the entire process of moving the right data from source to star.
It can be as simple as Cron, or the use of an ETL engine such as SSIS.
Config is the heart of ODE
The configuration component is the core of ODE and the thing that will be maintained as a common component across all developed and deployed versions.
We are maintaining the config component via version controlled processes, with the code being stored in GIT (as we are with all out code of course).
Our team will fight long and hard as they decide to add a new feature into the config, to ensure ODE doesn’t become bloated but also to ensure we keep adding core features.
We are semi-lingual already
Our core Code Gen and Code Exec is currently written in tSQL. This is due to both the skills of the people we had available to kick off initial development and the customers we were working with for the initial deployments.
We have also done initial builds in PL/SQL and SAS, but need to move these up to the latest config release.
Hop on the bus
We are not quite ready to open the flood gates and let the world help start adding features to ODE.
We are working on our Test Driven Development (TDD) and Continuos Integration (CI) frameworks at the moment to ensure we can safely test any config and code changes as we add features. This is core before we can safely start committing contributions. (not to mention doing the documentation you will need to get started)
But we are keen to talk to anybody who might want to start the journey early with us.
Grab a ticket (they are free) and hop on board. Its going to be an exciting ride!
Keep upto date
We were recently approached by a client with an interesting job - they wanted us to create a solution for Persistent Staging Area for them, and the requirements were quite broad. They had an MS SQL Server as their RDBMS and the loading tool was SSIS, just to stay in...read more
A couple of years ago the OptimalBI team developed Metrics Vault using Optimal Data Engine (ODE) functionality. The purpose of Metrics Vault is to collect various statistics about data vaults configured in ODE database. This includes not only data vault tables...read more
Almost everything a data warehouse is asked to do involves dates: days, weeks, months, years, periods of interest, special days. These date details "are attached to virtually every fact table to allow [easy] navigation." Data Vault has a different structure from the...read more
Previously we outlined how to create the SSIS packages for ODE to load data into staging tables as a part of the table load. The steps were defined explicitly for use with BimlStudio which is a paid product from Varigence, but if you planning on doing things in a more...read more
This week we released version 5 of our Optimal Data Engine. Follow the link to download it from the GitHub. Optimal Data Engine (ODE) is an open source Data Warehouse automation tool. It is a config-driven engine that builds and manages your Data Vault. ODE works on...read more
One of the things I admire the Data Vault modelling approach for is the modular structure being represented by sets of hubs, links and satellites. From a developer’s perspective it provides enough flexibility and power to automate various tasks easily, such as...read more
A new version (version 4) of ODE is now available for download! ODE, Optimal Data Engine, is our open source product for building a Data Vault. Anyone can download it, install on their instance of SQL Server and develop a Data Vault of their own. New features include:...read more
During our Data Vault journey, we found that having naming and coding standards is important when working in a team. Having a consistent environment improves collaboration. With consistent table and column names, it's easy to pick up objects to create a select...read more
Optimal Data Engine Version 2 is now available for download from GitHub! Here's what it means for you. The biggest change with the most impact is a new flag for Data Vault retired objects. The general rule is, once something gets into the Data Vault, it never goes...read more
I started working at OptimalBI a few months ago. My area of expertise is Data Warehouse development, using Microsoft SQL Server. I was, therefore, a good candidate to test how difficult it is to start using ODE (Optimal Data Engine). ODE is an open source application...read more
Following on Nic's excellent blog on Migrating Configuration Data, I would like to elaborate a bit on the style, which he selected. From the options Nic outlined, he chose the route of grouping objects into a release, packaging up the Config and releasing it. This is...read more
We think Data Vault is pretty cool and continually recommend it as the best way to model your data warehouse. Here are three reasons why! 1. Model Quickly Data Vault allows you to very quickly transform your source data into structures that store history and better...read more
I recently attended a course run by Hans Hultgren on Data Vault Modelling. I have a small confession to make at this point; Sorry Hans, I've never read your book. The good news for me was that the course doesn't require you to have read the book first. It does assume...read more
Last month we had Hans Hultgren, Data Vault extraordinaire, teaching his Data Vault Modelling and Certification course right here at our very own OptimalBI office in Wellington. I had the privilege of doing this course in Sydney last year and walked away Data Vault...read more
Just Ship It Already! Steve Jobs said "Real Artists Ship" . Ma.tt has a great blog about shipping One point oh. And there is a loverly comment at the end of the blog that goes: "A great entrepreneur once told me that “an idea without execution is worthless.” We can...read more
There are a number of techniques, which can be employed for building Star Schemas off a Business Data Vault.
Each has its merits, depending on your requirements.
The “Gold Standard” for building Star Schemas is to be able to make them Virtual.read more
At OptimalBI we are huge fans of modelling the data warehouse using the Data Vault methodology. That’s because Data Vault allows us to provide you with a data warehouse that you can continue to use long after we are gone and continue to add to yourselves.
We are often asked whether the “extra” Data Vault layers are necessary or just a pesky overhead in an end-to-end data warehouse architecture.
To that we say: not only are the Data Vault layers necessary, they are arguably the most important layers in your data warehouse, and argue that I shall!read more
We have recently been discussing various ways we can promote our configuration data for Optimal Data Engine (ODE) from one environment to the next. Our config data is the heart and soul of ODE, it is the foundation for the entire engine.
The config in ODE is the relational data model we have built that holds all of the configuration required to make ODE run. It includes definitions for all source and targets, as well as any mappings.read more
About two years ago we started a journey into the world of AgileBI. It all started out of frustration (as most journeys do), frustration about the magic sauce.
The magic sauce was the reason why one data warehouse project would be a raving success and the next a ‘meh’. It was often the reason that after delivering a raving success and then returning 12 months later I would find the data warehouse had become a ‘meh’. By that I mean not updated, not well managed and not delivering to the stakeholders expectations.read more