One thing I’ve been thinking about lately is Single Source of Truth (SSOT), it’s relationship with interpretation, and how that lands with data.
We often consider the data warehouse, in at least it’s traditional star schema sense to be a record of fact, obviously.. that’s what a data warehouse is geared towards, facts over time.
Numerous developers slave away to build data models which satisfy the SSOT ambition, star schema and the like. Based on an idea that the business users will always resolve the right number of customers, per se.
Does that always consistently resolve the “correct number”, frankly no!.. Why is that we might ask? Well, my opinion is because it’s largely due to the interpretation of the given analyst. In the current working climate people don’t spend their working lives in the same IT environment, people come, people go. One could argue that interpretations or questions alter. Analysis is largely based around interrogation, interrogation is subject to the question, and everyone questions and processes in a unique manner. These are not negative traits, it’s purely human nature.
With the above in mind, one has to question the SSOT concept and it’s subsequent effectiveness. If one analyst interprets a questions differently to another and the query is structured differently (assuming of course it is the result of multiple tables, although not mandatory in any way), the result set could well be different.
Then we need to ask what the purpose of an ineffective SSOT is, right? Can we consolidate the intervention hours on design and build? Would it have been more effective to direct the analysts to the raw landed source to come up with the “magic” number?
Sure, this channel of thought results in more intensive queries, more ability to write such queries, but isn’t the net result equally (potentially) erroneous? Well we exist in a world of scale now, where Multiple Parallel Processing (MPP), distribution and clustering are commonplace, highly available and dynamic in cases. Why bother structuring data for a mythical SSOT dream when we can all crunch and have our own perspectives?
Just channeling some thoughts, think about it, the future is here!
Thomas – MacGyver of code.
Thomas blogs about big data, reporting platforms, data warehousing and the systems behind them. Read his blog Learn and Earn with Kaggle, or connect with Thomas Evans on LinkedIn.