If you are in Business Intelligence and work for a big organisation, you’ve might, once or twice, seen a “magic” Excel spreadsheet that holds some valuable data for a few key reports. This data is called a reference master data. Typically it’s a relatively static dataset and it’s most likely an enterprise-wide reference data. It could be a lookup values description, extra information that is not available from source, or a hierarchy of some sort. Here are some examples of reference data from my previous workplaces:
- Special grouping of geographical areas, that makes sense within an organisation, but may not be valuable to anyone else
- Information from external sources, e.g. ICD-10 diagnoses classification
- Business contexts (definitions) of core concepts, that could appear on pop-up panels when users seek more information in the report, e.g. departments description and capacity
- The mapping between similar codes across an organisation’s applications.
As you see, this could be quite valuable data and it’s good to have the same reference data available for all the reports. If all reports use the same reference data, users can trust them to provide information at the same grain. If the same code mapping is used when a report references multiple sources, it improves the data quality and accuracy of the reports. You get more value when users are able not only to see the results of merging the source data with reference data but also to have visibility over it and be able to change it quickly when required; perform data management!
A Data Warehouse is a point of data integration from all the different sources an organisation has, and reference data is just one of the sources. Usually, a challenge with reference data is:
- It’s coming from external sources, not from internal applications
- The data could exist in different formats, e.g. Excel files, CSV files or, in the worst case, handwritten notes.
- It is relatively static, however, it can change over the time, and users may request different behaviour when data changes such as replacing the existing value or providing a value based on time.
There are many ways to handle this type of extra data. (I’m not talking about the values hard-coded into reports here, but I’ve seen many of these!) Some organisations choose proprietary solutions to upload the reference data from files, for example, create a SSIS package that loads the file from a special folder. The problem with this approach always starts when the file format changes and the ETL developer has to come and fix it (again!). Otherwise, developers have to take a control of file uploads, so the whole process loses visibility to the users. In cases when hierarchies and historical tracing come to the picture, spreadsheets do not provide a good solution at all.
Master data management, and reference master data management, in particular, is not a new thing. Many enterprise-oriented IT companies have implemented their versions of Reference Data Management (RDM) tools long ago and keep improving them. They enable multiple features to treat reference data in a highly professional way. As examples of Reference Data Management systems, I’ll mention IBM RDM Hub, Oracle Hyperion Data Relationship Management, SAP Master Data Governance, Orchestra Networks EBX, Collibra Data Governance Centre. All of these tools provide more or less the same functionality, as to does SQL Server Mater Data Services. It’s obvious that the first choice for the RDM tool will be the tool from the primary organisation’s vendor; if SAP Business Objects is your BI tool, then SAP MDG could be your first choice for RDM. As my prime Data Warehouse tool is SQL Sever, MDS is my choice for reference data. I found this tool very appealing from the first time I’ve touched it.
Microsoft has introduced Master Data Services for SQL Server 2008 R2 in 2011, and it had been improved many times since then. Although core features stay same:
- A web interface for easy MDS administration
- Brilliant integration with good old Excel, that enables enterprise users to input values in the same way they are used to
- User input validation via business rules (with the nice interface that lets users know what’s wrong with the data input)
- Automatic history tracing
- Hierarchy management
- Easy output via views that could be captured via ETL or a reporting tool
- Windows-integrated security groups
The only obvious downside of MDS I could find is it’s not widely used, due to the specific primary function it was built for. Choosing MDS is therefore difficult; if you ask experts for an opinion on MDS, they don’t have one, and if you have a question, there might be no answer yet on the Stack Overflow.
That said, consider using MDS if your organisation is using SQL Server for databases and there’s a need for any type of user input. The Excel interface makes this tool unique in the range of similar tools as it is familiar for any type of users without special training on tool usage.
Masseuse of all the Data – Kate
Kate blogs about the details that make the Data Warehouses work.