This week we released version 5 of our Optimal Data Engine. Follow the link to download it from the GitHub.
Optimal Data Engine (ODE) is an open source Data Warehouse automation tool. It is a config-driven engine that builds and manages your Data Vault. ODE works on MS SQL Server. The best part is it’s open source! Anyone can download it and start building a Data Vault today.
ODE version 5 has two new features; SQL Server Integration Services (SSIS) support and SQL Server Change Data Capture (CDC), as well as a few bug fixes.
Previously, ODE has relied on a data acquisition tool to bring the data from the server for ODE to load into its Data Vault structure. In this version, we did a first pass at integrating ODE with SSIS. ODE now contains a set of BIML scripts that can generate SSIS packages based on the Data Vault configuration automatically. ODE won’t be able to run handcrafted SSIS packages, but we have provided some functionality for the customised source data filtering and derived columns. As before, ODE uses parallel data loads, so multiple SSIS packages can be executed simultaneously.
We’ve also found a good use for the SQL Server CDC feature. If your data source is an SQL Server database and you want to load source data into the Data Vault in portions, or capture any single change to a record, you can enable CDC on the source database. SSIS packages generated with our BIML scripts have the option to load data in CDC mode. An ODE SSIS package will read the data from the Change table and insert all the changes into the satellite. If the same record has been updated multiple times between the ODE loads, ODE will save all the history of updates into the satellite.
The principle of getting a dataset via the query function, as in SQL CDC, seemed so appealing that we have implemented it for ODE Data Vaults too. You can generate two table functions for a satellite; one provides satellite data as at a point in time, the second function provides a history of changes between two dates, very similar to the original SQL Server CDC query functions. This point-in-time function can be used for reporting, for example, if you need to build a trend visualisation or get the old state of the data. The ODE CDC function can also be used to implement business satellite rules; if a raw satellite is loaded in CDC mode, there is no need to re-calculate the business rules for the entire table, now it can be re-calculated just for the changed records.
We would also like to say ‘Thank-you’ to our community. We appreciate your help in making ODE better by contributing code and raising issues in ODE for us to fix. In this version we have fixed the configuration release mechanism, the calculated fields logic and extended the audit field size to accommodate long user names. Thanks for all your contributions.
Our code is open and our ears are open too! Let us know what you think of the new version of ODE!
Kate
Data masseuse