Three difficult steps to improving open data in New Zealand

by | Oct 12, 2015

difficult_steps
These steps look difficult too. Are we going up or down? Source: Wikipedia (CC BY 2.0)
You read that right, the steps are difficult. I’m not going to lie to you! I’ve written before about the promise and disappointment of open data. To those of us who work with it, the value is so obvious that it can be hard to figure out why others seem to oppose it. But figure we must, if we want more of the right people pulling in our direction. If you’re already with us, check out Victoria’s post on becoming an open data hero.

The problem(s) to be solved

I won’t belabour this point further than a list of the key problems open data advocates face convincing NZ government agencies to open up more data:

  1. Agencies don’t know what data they hold, so requests to open it must be unreasonably targeted
  2. Even for data that agencies know they hold, there is rarely a repeatable procedure for publishing it
  3. Those agencies publishing the data they do know about often take a “dump it on the website” approach
  4. 1-3 above make any comprehensive all-of-government data directory impossible, leading users in need to create “lists of lists” and “lists of lists of lists“. I’ve created several of these, always thinking “this will be the last one!” For some cogent words on why this is a problem, see this blog post by Peter Ellis.

Enough whining, what to do about it? Here are the three difficult steps from the title…

Step 1: recognise that not all agencies are the same

Governments love to treat agencies equally, and there are important reasons to do so. With respect to open data, however, blanket rules/recommendations/guidelines are unlikely to work, for two reasons. First, agencies differ by the extent to which data defines their existence. Statistics NZ produces most of the Tier 1 statistics and so will naturally approach open data questions differently than the Ministry of Education, which holds and publishes plenty of data but whose core mission is a policy outcome. And both of these are different again from Callaghan Innovation, which consumes other agencies’ data to make funding decisions but does not publish the reasons.
Secondly, agencies differ in their capacity to publish open data. We might expect this capacity to be well-correlated with the importance of open data to an agency’s mission, but it is not. Statistics NZ, for example, have at least as many ways of finding their published data as there are days in the week. Maybe this serves the needs of multiple types of users, but it is incredibly confusing even for people like me who used to work there. Other agencies have the opposite problem: no process or tool for opening data, even if they wanted to.
I’m not confident that blanket recommendations will do anything given the power of these factors. What matters is individuals taking action to deliver open data, usually against the odds. So if you work in (or with) a government agency, ask yourself: “what data is definitional to our mission?” As a taxpayer funding your mission, I hope you have a good excuse for not making it open data (“dumped on the website” doesn’t count). Then ask yourself: “do we know how to do this?” and reach for help if the answer is “no”.

Step 2: work with each type of agency (differently) to make it easy for them to decide to do the right thing

I used to think open data problems could be solved with enough jumping up and down and a single ideal technology. While jumping up and down gets a single dataset released, it doesn’t change how an agency thinks about its open data obligations. When this happens through the adversarial Official Information Act process, requests may have the opposite effect, hardening opposition. And while “one platform to rule them all” is a nice dream (especially for those in charge of the purse-strings) it isn’t feasible. Agencies have different needs and conflicting interests, and it is too easy for an agency to go its own way.
So if lobbying for single datasets and a big technology spend won’t help, what will? Slow, difficult, careful work by open data advocates to a) understand the barriers and capabilities of each agency and b) help agencies make better choices. That’s the idea behind my new initiative www.thedata.nz, it’s on the shoulders of open data users to make a clearer case, and turn our complaints knowledge into resources that agencies can use. More on this later…

Step 3: automate, automate, automate

To be sustainable, publication of open data must become part of the business-as-usual procedure of government agencies. If they measure something potentially interesting and Step 2 has removed the barriers to opening the data, publication and all the steps afterwards should be run by machines, not people. This is the key to making the open data ecosystem self-sustaining and is where technology can make a difference. But investment in that technology only gives a good return when we solve the tough human and organisational problems first.
If I’m right or wrong about any of this, please let me know in the comments! And if you’d like to learn more, get in touch, and attend this series.
Until next time, keep asking better questions
Shaun – @shaunmcgirr
You can read all of Shaun’s blogs here.

2 Comments
  1. aimee whitcroft

    Great article, Shaun, thanks!
    I think some key related points here are:
    – work to improve current standards for open data
    – work to improve how data is licensed, including things like indications of the extent to which data is “clean” or “dirty”, and what people can/should do with that data in terms of decision-making, pattern- and insight-generation, etc. Something I’ve seen a lot is that agencies are very uncomfortable releasing raw data*, and the resources needed to “clea” it are often not available (because Money and/or Skills), so the data ends up not being released. It would be great to find ways both to get “dirty” data cleaned where necessary, but also to increase comfort levels around its release and use, where appropriate.
    And yep, I have some ideas around that 🙂
    * And I’m not talkg here about raw data involving personal details – the whole big data/aggregation/privacy thing around personal information is still so stunningly complex and unresolved.

    Reply
  2. Shaun McGirr

    Thanks Amy, that’s a great point. Look forward to seeing what you’ve come up with in the fullness of time! And hopefully Govhack has given you some ideas for next steps.

    Reply
Submit a Comment

Your email address will not be published. Required fields are marked *