The peeps at OptimalBI have been proving our productisation process and one of our outputs was to create an NZ GeoPack for our Yellowfin partner and thought we’d share the approach we took to deciding on it’s content. If you’re not sure what a GeoPack is then check out my previous blog.
Before we started the design and build we did some research to make sure we’d thought out what the build should be. Our pre-reading started with the comprehensive documentation on the Yellowfin wiki:
These gave us a good understanding of the effort involved and the data that we’d need to source. After this, there were a few things we considered:
1. What data’s available in NZ and how’s it licensed?
When we set out we wanted to include some census data so our set needed to align with Statistics NZ. As a result we settled on Meshblock at the lowest level. Meshblock is an area that NZ Statistics use to segment up their census results – more details are available here:
The other part was to consider how it’s licensed. If you’re planning to re-distribute the data then the license needs to allow this. For some datasets they may charge so make sure you’re license allows you to re-distribute. The license available from Statistics NZ was Creative Commons, more information on this can be found here:
2. What data should you include?
A Yellowfin GeoPack should be made up of centroid co-ordinates (central point for the area), shape co-ordinates (perimeter points of the area) – at least one or both of these are needed, a description and title along with any census information about these areas you want to include. E.g. count of people in the area, deprivation index etc.
This is only limited by the data you can source and join to your geo-dataset. The GeoPack file should contain an aggregated count at the level for the area you’ve specified and those figures are added to each level in your hierarchy, which we’ll explain next.
3. What should your address hierarchy be?
The hierarchy we produced was based on the Statistics NZ standard, these rollup into one another. In our case we had 3 levels in our hierarchy; Territorial Authority (TA) > Area Units (AU) > Meshblocks (MB). For our example a TA has multiple AUs and an AU has multiple MBs.
Along with the files for each one of these levels, there’s another CSV which explains how they relate, allowing Yellowfin to construct the GeoPack from our dataset.
The hierarchy is based on your address set. The simplest way is to base it on the official address standard for the country. It’s important that the records are a similar construct to the above with each level only having one parent, otherwise you’ll need to do some data cleansing for the GeoPack to work properly.
Once you’ve decided on the approach and sourced your data, the next phase is to put it together in the right format and submit to Yellowfin to create the package. I’ll go in to detail about our build process in my next blog but for now if you want to know more the wiki links above should give you a good idea on how the construction works.
Keep exploring! Daniel.