Data is a Privilege

by | Dec 3, 2013

A dataset to an analyst is like being a 5 year old on Christmas morning – the anticipation of what gems might lurk amongst the billion rows, how it might be used, the eureka moment which enables a jump in the organisations actionable insight – its all too tempting to rush in and start ‘unwrapping’ with un-abandoned glee.
However, before you dive bomb into the data, you need to stop and think about the implications of the task you are about to embark on. Where did the data come from? How was the data collected? What purpose was the data collected for? What is likely to be done with the results? Who will be impacted? How will they be impacted? How will they react?
What you do with many of the answers is dependent on the ethics of the analyst and the organisation. Ethics is the murky ground of balancing what can be done with what should be done. Bolder Technology Inc has come up with a matrix illustrating what is ethical by what should/shouldn’t be done; what is legal/illegal.


What should be done is a quagmire of culture, information type, institution type and who it’s about. Ethics are reflective of society – as society changes so to our ethics.  There isn’t a static, ‘one size fits all’ approach. However, as analysts, incorporating ethics into your work should be as much a part of an analytical project as preparing, exploring, hypothesising, modelling and evaluating are.
As much as feasibly possible, consider the impact of actions resulting from your analytics projects. Remember there are real people at the end of all this whose lives you are affecting – from recommendations of possible products to loan approvals.
Before you start make sure you scan for possible ethical dilemmas. What variables should you take through to a model? For example, for models such as credit scoring you need to ensure that your models do not lead to discrimination based on age, ethnicity, gender etc. While it may seem that you simply drop those variables from your model care is needed that you don’t end up with others that serve as a proxy to these. By using an area code (for example) you may inadvertently be discriminating people by ethnicity as people can live in areas associated with particular ethnic identities. However for medical purposes it is generally accepted as ok to include age, ethnicity and gender as medical conditions can be specific to these.  So whether something ‘should be done’ is influenced by what the analytics piece of work is for.
Be aware of possible unintentional consequences of the use of your work. There is the infamous case of a father finding out his daughter was pregnant from a set of baby product coupons she was sent. Perhaps if the analysis included a scan of age before mailing out the coupons, then it may have been decided not to mail out to young mothers. Whether using data in this manner is ethical depends on how informed consumers are of the possible uses of their data and how transparent the company is with their policies.
You need to take into account the original purpose the data was collected for. While not always un-ethical the further away from the original purpose, the more questionable and grey things get. TomTom sold its data to the Dutch police who then used it to determine which areas to place speed cameras. TomTom was upfront in the fact it sold its data in an aggregate, anonymous way to other organisations. People had no problem with it going to organisations that used the data to improve roading/traffic flow but there was an outcry with it being used for placement of speed cameras.
How data is collected is extremely important. People should be given the ability to consent to their data being collected and used. Earlier this year, people of London were having data collected from their cell phones via state of the art rubbish bins. That’s a fairly Orwellian way of gathering information, and if organisations are not careful there will be a backlash!


Preferably before even starting down the analytics path, organisations should set up a framework of principles around the collection and use of data, including how these will be communicated. Analysts need to be a part of these discussions and visibly champion the application of ethics within their work.
These guidelines should ensure that organisations are transparent about the intentions they have when they collect data. These need to be communicated in a clear and simple way and not hidden in terms and conditions. They need to ensure that there are safeguards around data associated with children and sensitive data such as credit card information.
Organisations also need to be prepared to engage, pilot and test uses of data with a representative sample of their customers as it is such a murky area, constantly evolving and often unpredictable. While it will add a cost to the organisation it will not be as much as irreparable damage to its reputation.
People don’t have to give their data and may choose not to if they suspect it will be misused. It’s well worth investing in developing good data stewardship frameworks. BCG’s “The Trust Advantage: How to Win with Big Data” report highlights that consumers will value data stewardship within an organisation along with corporate social responsibility, environmental awareness and responsible labour practises. There is also the real risk of heavy regulation if the analytics community are not seen to be acting ethically.
Remember those immortal words of Ben Parker “with great power comes great responsibility” – having access to data is a privilege and should be treated accordingly.

  1. Mike O'Neil

    the use of “can” as the word to represent “legally allowed” is a language usage error that limits ability to make some more simple distinctions in this sort of discussion.
    “can” = “physically able ”
    “may” =” allowed or permitted”
    So the distiction to be made is that “can” typically representes technically able to, and “may” typically represents legally permitted.
    In an information sharing context, can usually means having the network infrastructure implemented so as to practically transmit data at the frequency and reliability required. It also represents having the data and software elements required to link the data identities across the sources. “may” usually represents that the legislation in force does not put you in a position that you might end up in court defending an action.
    I think that beyond this, the ethics and practice dimensions come into play.
    “should” represents ethics. Is it morally defensible. In some situations it might be argued that it would be morally indefensible not to act. (this is often not appreciated in discussions like this)
    “would” represents practice. If I do information sharing across different pactice domains, we might change behaviour of people in terms of what they are prepared to reveal when they know that information will not be shared.
    The legal x ethical matrix is too simplistic in my view. I prefer looking at it across the 4 dimensions “can”, “may”, “should”, “would”, adding the technically possible, and practice implications to the mix.

  2. Michelle

    Thanks Mike. I like the idea of including the other dimensions you suggested into the matrix!

Submit a Comment

Your email address will not be published. Required fields are marked *