ODE - C3PO

ODE & 3CPO – Talking multiple languages

by | Jul 9, 2015

ODE & 3CPO – Talking multiple languages

by Jul 9, 2015

Everybody talks a different language

When we decided to start building ODE we knew a few things already.  One of those things was that most of our customers already had data warehousing technology.
They had already invested in Microsoft, Oracle, IBM, SAS, Teradata, Informatica or any of the other raft of data warehouse repositories and ELT technologies that are abound in the market place.
We also knew that it would be a big decision on their part to throw out this technology and implement the technology we decided to pick to be able to use ODE and gain the AgileBI benefits that it provides.

Don’t force people to do what they don’t want to

So we decided to develop ODE in a way that meant they could continue to leverage the technology they already had if they wanted to.
This meant we had to be able to deploy ODE on multiple of platforms, leverage multiple data transformation languages and tools, plus read and write to multiple data repositories.

So we had a multi language problem

We though about writing ODE in java. In theory this would allow us to deploy on any platform and any technology.
We knew from experience we would probably end up being forced down a java server path for transformations, and we also knew most of the hard core data warehouse customers we work with would want it to run natively in the data repository they already have invested in.
We knew that there were some language translation tools we might be able use.  Tools where we could write code in say tSQL and it would automagically convert it to PL/SQL.  We tried a couple and our overall experience was “yeah, nah”
We also knew that like our team, our customers would want us to tune any data transformation code to run fast, and this meant being able to tune the code for each technology.  And in fact for specific scenarios within a technology.  For example using columnar store when it was available in the target data repository.
And last but not least we knew maintaining multiple versions of the same product would be a complete nightmare for us.

Enter 3CPO

 
ODE 3CPO
Our solution was to look at our plan for ODE and work out what we could architect as a standard shared component and what had to be specific to each technology.  We also discovered we would need to manage a couple fo different deployment options as well.
So we ended up with a design we call 3CPO, which stands for:

  • Configuration
  • Code Generation
  • Code Execution
  • Orchestration

Config

Config is the relational data model we have built that holds all the configuration required to make ODE run.
This will include definitions of all source and targets, as well as any mappings.  For example

  • Raw Vault Hub, Sat and Links
  • Business rules to be applied in the business vault
  • Measure calculations
  • Star schemas to be deployed

It will also include unique options to deploy for a specific environment or to a specific design pattern.  For example:

  • Whether to virtualise the dimensional star schemas or persist them.
  • End date each satellite record, create tombstone records or do both
  • Utilise columnar storage for a satellite
  • Create an index

Config is the heart of ODE, without it there is nothing.

Code Gen is the code that builds Code Exec.
It reads the config and generates the code that is needed to create the structures and move the data.
(of course if you select the the virtualised options it will create views that see the data)
Code Gen will come in multiple flavours, so you can deploy it on the technology you already have.
At the moment we are planning to include (overtime):

  • Microsoft tSQL
  • Oracle PL/SQL
  • Oracle ODI
  • SAS
  • R

There is nothing stopping the wider community building Code Gen and Code Exec for another platform (say Informatica) reusing the code patterns we have already defined.

Code Exec is the code that runs to create the structures or move the data.
(of course if you select the the virtualised options it will be views that see the data)
Code Exec will also come in the same multiple flavours as Code Gen.

Orchestration is the engine that runs the code exec.
It can be run in two design modes.

Engine Mode

Engine mode is when the Code Gen and the Code Exec are executed at the same time.
So effectively ODE will look at the config, create the code exec, execute it, and then rinse and repeat.

Code Deployment Mode

Code Deployment Mode is when Code Gen creates the Code Exec and then you manually promote the Code Exec across your different environments (i.e. Dev > Test > Prod)

Execution

The execution engine that actually tells Code Gen when to build the Code Exec, and also tells Code Exec when to execute will be over to you for a while.
We will deal with it when we have finished building all the features required to support Config, Code Gen and Code exec that will manage the entire process of moving the right data from source to star.
It can be as simple as Cron, or the use of an ETL engine such as SSIS.

Config is the heart of ODE

The configuration component is the core of ODE and the thing that will be maintained as a common component across all developed and deployed versions.
We are maintaining the config component via version controlled  processes, with the code being stored in GIT (as we are with all out code of course).
Our team will fight long and hard as they decide to add a new feature into the config, to ensure ODE doesn’t become bloated but also to ensure we keep adding core features.

We are semi-lingual already

Our core Code Gen and Code Exec is currently written in tSQL.  This is due to both the skills of the people we had available to kick off initial development and the customers we were working with for the initial deployments.
We have also done initial builds in PL/SQL and SAS, but need to move these up to the latest config release.

Hop on the bus

We are not quite ready to open the flood gates and let the world help start adding features to ODE.
We are working on our Test Driven Development (TDD) and Continuos Integration (CI) frameworks at the moment to ensure we can safely test any config and code changes as we add features.  This is core before we can safely start committing contributions.  (not to mention doing the documentation you will need to get started)
But we are keen to talk to anybody who might want to start the journey early with us.
Grab a ticket (they are free) and hop on board.  Its going to be an exciting ride!
 
 
 

Other steps in the ODE Journey

And sometimes a hop and a skip and a jump
Learning Azure with Pluralsight

Learning Azure with Pluralsight

Cloud solutions have little to do with the solutions we have on-premises. Cloud solutions are built with ambitious and monstrously huge goals in mind, so, often it feels like no previous technology has been reused. In addition to some understandable concepts, like...

read more
How to Model a Reporting Layer

How to Model a Reporting Layer

On my first BI job, where I was an ETL developer, my team was using Microsoft SQL Server Reporting Service (SSRS) as a reporting tool. My job was to model the Data Marts and create the data flows into those tables. Our report analysts were good at SQL, I have even...

read more
Qlik Replicate – Task Metadata

Qlik Replicate – Task Metadata

Recently I was working for a client and had a specific issue which led me to probe around various aspects of Qlik Replicate tasks, attempting to understand how Qlik Replicate works. While I did not manage to achieve what I was specifically attempting I did have a look...

read more
SAS Functions Book

SAS Functions Book

Over the years, we've been building up our comic-style SAS function one-page guides, drip-feeding them to you.  Well, guess what?!  We put them all together and made a book so they're all in the one place, to make referencing them easier for you. Thank you...

read more
Zen and the art of database model maintenance

Zen and the art of database model maintenance

So there you are having created a work of beauty in Sparx Enterprise Architect that models all of your databases, and then the fly lands in the ointment.  There's a new version of "stuff" coming down the pipeline and you need to update the model or things are...

read more
Checking the File Path from Database in PowerShell

Checking the File Path from Database in PowerShell

On one of my recent projects I was analysing my customer's data for migration. Their legacy application could handle documents. They were stored on the organisation's shared folder, and the file path was recorded in the database with the rest of data. As a part of the...

read more
The best lockdown baking recipes

The best lockdown baking recipes

The nation has been hit by a baking craze!  Lockdown gave some of us more time to do some of the simple things in life, like baking, and, if you were lucky enough to get flour and yeast you could create some delicious treats! I've always enjoyed baking, so loved...

read more
Snowflake and Qlik: End-to-End BI Solution

Snowflake and Qlik: End-to-End BI Solution

Recently I watched a webinar organised by Snowflake, it was called "Data Warehouse Automation, Ingestion and Industry Leading Analytics with Snowflake and Qlik". In the past I could never find time for webinars, and it's amazing how life can give you opportunities to...

read more
Being social while social distancing

Being social while social distancing

The world has been social distancing for some time now, and will be for a bit longer, depending where you live. To help put off or at least mitigate going stir crazy I thought I would share what I have been doing to keep up some sort of social life.   Before the lock...

read more
Our How To blogs

Our How To blogs

We do cool sh!t with data and over the years we have shared a lot of blogs to help you do cool sh!t with data too.  If you want to jump right in and see if we have something that piques your interest here are all our how to blogs.  If you...

read more
Free ETL tools in 2020

Free ETL tools in 2020

Traditionally business intelligence is an enterprise solution, as only big businesses have multiple sources that require integration and analysis. It is assumed that the rest are small fishes that could be able to analyse their data with simpler accessible tools or...

read more
SAS Utilities: cleanwork

SAS Utilities: cleanwork

When a user starts and runs code in a SAS session, a number of temporary folders and files are created in the SASWork and SASUtil locations.  If that session completed and exits normally these folders and files are removed, however some SAS sessions exit abnormally,...

read more
1080 By The Numbers

1080 By The Numbers

If you have lived in New Zealand for long enough, I am sure you would have heard of the poison 1080, used by the Department of Conservation (DOC) as a form of pest control. 1080 can be used in bait stations or by airdrop, which is used for many of the remote and hard...

read more
Keeping your team connected when they’re working remotely

Keeping your team connected when they’re working remotely

Lockdown started on the March 26 here in New Zealand, and following the Prime Minister's announcement on April 20th, we're looking at just an extra 2 business days in Level 4, before we move to Level 3 after the ANZAC long weekend. Level 3 means that...

read more
Adobe equivalent Free Open Source Software (Pt2)

Adobe equivalent Free Open Source Software (Pt2)

In part two of my Adobe equivalent Free Open Source Software (FOSS) blog I thought I would look at the video editing pipeline and find some free alternatives for your video editing needs. You can read Part One which covers Photoshop, Illustrator and InDesign here....

read more
Difference between ETL and CDC

Difference between ETL and CDC

Reporting and analytics computations are usually very resource-consuming, therefore they are never executed on the same server where the crucial business application is running. Instead, data is copied into dedicated servers to be dissected and studied for insights....

read more
Keeping your brain busy while in lock down

Keeping your brain busy while in lock down

Many places all over the world are currently in self-isolation mode in response to the COVID-19 pandemic. In New Zealand, lock down started on Thursday March 26, which means that some people are able to work from home. At OptimalBI, we are working from home and on the...

read more
Check out our new website!

Check out our new website!

Woohoo we've got a new website!  If you're reading this blog you're already on it and we reckon you should click a few extra buttons than you were planning to and go check it out... Data really does make anything possible, and we're here to help you harness data...

read more
Planning for the COVID-19 Pandemic at OptimalBI

Planning for the COVID-19 Pandemic at OptimalBI

We find ourselves in unprecedented times. All across the globe the impact of COVID-19 is being felt, it’s heart-breaking reading the stories of loss, the impact on societies, businesses and individuals as a result of measures put in place to restrict the momentum of...

read more
Building Services and Applications for Generic Data

Building Services and Applications for Generic Data

One of the things many developers love about MongoDB and other NoSQL databases is the ability to store flexible objects in their native format. When you walk around the office singing its praises, experienced developers chime in and tell you that this is a quick path...

read more
0 Comments

Trackbacks/Pingbacks

  1. ODE – Migrating Configuration Data | OptimalBI - […] have recently been discussing various ways we can promote our configuration data for Optimal Data Engine (ODE) from one…
Submit a Comment

Your email address will not be published. Required fields are marked *