ODE – The start of a journey
There must be a better way!
About two years ago we started a journey into the world of AgileBI. It all started out of frustration (as most journeys do), frustration about the magic sauce.
The magic sauce was the reason why one data warehouse project would be a raving success and the next a ‘meh’. It was often the reason that after delivering a raving success and then returning 12 months later I would find the data warehouse had become a ‘meh’. By that I mean not updated, not well managed and not delivering to the stakeholders expectations.
My view was that often the delivery team had somebody who had built many data warehouses and just intrinsically knew what to do to be successful. They could also take the team with them on that journey to make sure the entire project was successful.
Ask them how they did it and you would either get a smirk or a shrug.
So we started a journey to see how we could make the process repeatable and able to be delivered by our entire team, and more importantly, by our customers once we had finished the initial delivery.
Welcome to the world of Agile
That lead us to the world of Agile. Agile is great and a mature approach, but Agile for Data Warehousing not so much, leading us to investing time in defining an AgileBI approach.
The first thing we did was understand what we needed to deliver this approach, and we ended up with our 5 AgileBI circles.
Nothing revolutionary there, but it gave us roadmap of what we were looking for.
We were lucky enough to find the BEAM* approach from Lawrence Corr, which gave us the ability to gather data driven business requirements with agility. It also gave us a portion of the data modelling approach we needed.
But what about the data?
BEAM* is great, but we were still stuck with the problem of how we modelled and integrated the data that we required without a big upfront design phase followed by month and months of unique ETL development.
We understand while a lot of the business context (i.e Customer, Policy, Cover) is always unique to an industry and a lot of business rules are unique to a customer, we always find there are some things we always build the same way for each project (need a date dimension anyone?).
You can have the best data integration team around, but you will find that they all have their own slightly different way of coding around the same problem. And for real fun try getting them to agree what naming standards should be used!
Enter the Data Vault
We came across an approach called Data Vault. Its an approach to structuring data based on ensemble modelling.
It has a bunch of benefits that I will outline in a future post, but one of the major benefits for us was the ability to have relatively simple code that could be used to automate a lot of the dross work we always did as part of our data warehouse builds.
The other benefit was there were quite a few smart people around the world who were doing some heavy thinking on how to improve the approach.
So we decided to do what we always do when we find something new, exciting and promising. We would give it a go.
We hired Brian who had spent time building a Data Vault for Xero and was a proven guru, and we sent a couple of our team off to training and certification in Australia.
Just use an ETL tool right
Now we had a team who knew what it was, and how to build one. Let the coding begin!
We tell our customers its better to buy than it is to build, so we spent some time looking for software we could use to automate the building of the vaults.
There are not many options out there and the ones we found were either a standard ETL tool (or ELT tool) that were used in a certain way to deliver the vault structures and data needed. Or they were data vault specific tools that were focussed on automating the data loading and not applying the business rules that were needed.
We were not enamoured with either approach.
So we did what all New Zealand companies do in this situation, bring out the number 8 fencing wire and roll our own.
Research It, Build it, Prove it, Rinse and Repeat
We have learnt that embarking on a massive project to build these types of products is asking for a hiding and is far from Agile. We have also learnt that a customer priority will always arise that means we have to halt development for a while and then pick it up later.
So we have become very good at managing the process of chunking work down into bits that we can build and use to prove each capability or component. Also this helps us invest in research work upfront each time we are approaching a new area that we have not done before. We have found that this research-it, then build-it approach has resulted in a much higher success rate. As well as the ability to stop when we hit a gnarly problem that will just suck effort with little chance of success.
Hell thats the art of Agile right.
We have also found that implementing each bit in anger on real projects also helps us harden the product, and focus on the next piece of development that would provide the highest value.
So we are now at the stage we have a base of pretty cool code that automates parts of the data vault process. We have also proven it works within projects.
We have designed a cool architecture for the product which means we can deploy it on multiple technology platforms (Microsoft, Oracle, SAS, R etc) while still retaining a core design and code base.
Don’t get me wrong we still have a long road to go before it does everything we need, let alone everything we want.
Lets make the world a better place
At the stage that we had to decide how to move the product to a production ready product and that means we had to decide on our go to market approach.
Our choices are as always:
- Commercial Licensed Product
- Software as a Service offering
- Open Source
- Some weird arse alternative
I love WordPress for so many reasons. One is their ability to produce a full open source product and then have a commercial backbone that makes sure it is constantly enhanced. They do this without having to resort to the n-1 or hold out enterprise features approach all the other Commercial Open Sources vendors spin.
Another reason is that the wordpress community add so many cool features and addons to the product that it really does grow at a rate of knots, that is bigger than the core wordpress team.
Data Vault and DW Automation have been around for a long time, but for some reason it is still not a widely adopted approach. I believe one of the reasons is because there is not any readily available software to easily help you adopt this approach.
So we have decided to open source our product and see if we can help make the world a better place (or data warehouse delivery easier, faster and more successful at least).
Say welcome to Optimal Data Engine. We pronounce it ODE as in the lyrical stanza.
(those that have known me for too long know I love Steve Jobs power of 3 and I also love post rationalisation of a decision, not to mention characterisation of products, ODE covers so many of those it isn’t funny!)
And the so journey begins
The journey so far has been far from smooth and we know its only going to get bumpier.
So I have decided to blog each week to record the things we find, good or bad.
Buckle up baby and lets get started!
Much like Illustrator Photoshop is a great tool for design. Photoshop is not just for editing photos, but for working with images to create works of art for whatever project you happen to be working on. There are many great tools in Photoshop to help you in your...
Sometimes the hardest things about design work is not the design itself, but the countless hours where you are trying to think of what to even make. Sometimes it’s easy you sit at your desk and boom you are going. Other days you are sitting there coming up with ideas...
The time has arrived where I've needed to learn more Python for a project I'm working on. Here, I share some resources, thoughts and tips in case you find yourself also needing to learn Python, too. Python is an interpreted object-oriented language. This means you...
Most Adobe programs have some things that are easy to do once you know how to do them, but if you don’t it’s not very intuitive so I thought I would do a list of things and tell or show you how to do them. Paste in place This is a very easy thing to do and can be very...
Git is an unbelievably useful tool for version control. It is used to keep a backup of your work that you can revert to if needed. Working with git you are able to make branches of your work. A branch is an independent line of development used when working on any new...
I've found that most of the tutorials for python data work are in videos. While this can be good, it's not as easy to skim through for that one thing you are stuck on. So I thought I would cover some basics: how to read a csv and some simple data visualization. Read...
I recently created an Apex application for a fun project, which gave me the chance to install Apex on windows EC2. Here are the steps I followed to install Apex, having decided to use ORDS (Web Listener) to run Apex. The version of Oracle Application Express I used is...
At the end of last year, I attended a virtual conference organised by Snowflake for the Asia-Pacific region. The beauty of online event is that you can watch sessions on demand, which is handy as many interesting sessions were scheduled to be streamed at the same...
We've been BEAM*ing for over 7 years now, and really believe it's a solid way to gather business requirements when building Agile data warehouses. Here's why. What is BEAM✲? BEAM✲ stands for Business Event Analysis & Modelling, and it’s a methodology for...
In this blog, I'll walk you through how to create a timer item in your Oracle Apex application. In this example, I am creating a timer clock that starts from 00:00 that counts till the page is active. You can make modifications to your timer item to match your...
Most of my career I have spent in big organisations developing and maintaining enterprise-wide solutions. Therefore, so far I’ve mostly worked with MS SQL Server and only touched OracleDB a bit. But for a new, smaller project, I've uses PostgreSQL for the first time...
3 reasons why I enjoy coding with Python, and what I use it for (beginner) I can’t be called a code monkey when compared to my colleagues, but recently I’ve found Python to be really helpful with some of my work tasks, so I signed up for a Udemy course to learn...
2020 has been a big year for the world, so you can be excused if you missed out on what is new and hot in Qlik Sense! As always Qlik are bringing out new features, functionality, and bug fixes. In this blog I will try and summarise the new features for this year’s...
It is always a challenge to learn how to use new software products. Of course, all of them have documentation and how-to guides to explain the basics. However, many people would rather watch videos or find a teacher as this is the best way for our brains to consume...
The OptimalBI team loves open source and is also mad about data. Alison wrote a blog on Open source datasets already, and because there's more options out there, I thought I would give you some more places to get open data from for your various data needs. I have...
When you ventured out to the supermarket during lockdown, I bet you noticed waaaay less traffic on the roads. I live in the Wellington CBD, and it was really weird walking around with no people, no traffic for weeks. It got me wondering about the data; Thankfully...
Cloud solutions have little to do with the solutions we have on-premises. Cloud solutions are built with ambitious and monstrously huge goals in mind, so, often it feels like no previous technology has been reused. In addition to some understandable concepts, like...
On my first BI job, where I was an ETL developer, my team was using Microsoft SQL Server Reporting Service (SSRS) as a reporting tool. My job was to model the Data Marts and create the data flows into those tables. Our report analysts were good at SQL, I have even...
Recently I was working for a client and had a specific issue which led me to probe around various aspects of Qlik Replicate tasks, attempting to understand how Qlik Replicate works. While I did not manage to achieve what I was specifically attempting I did have a look...
Almost two years ago I was a part of the BI team in an organization that decided to move to the Cloud. As a first step they followed the lift and shift approach, so they moved their Data Warehouse and all the existing processes into the Azure environment almost...