A brief and biased history of humanity, science, statistics and machine learning

by | May 1, 2015

Source: Pixabay

I’ve never met a word cloud that didn’t make me wonder “why am I even looking at this”? Image is from the Machine Learning Group at University of Hamburg.
There’s a lot of buzz about Machine Learning (ML), not least because the two dominant public clouds (Azure and AWS) both have offerings in this space.
I’ll discuss how Azure and AWS offer ML soon, but first, here is a brief, incomplete and biased history of humanity, science, statistics, machine learning and data science:

  1. Humans come in to existence. One of our core skills is the ability to observe the world around us and organise what we perceive in to abstract concepts, which we then argue about (are tomatoes fruit or vege?)
  2. Some time later, the importance of observation and measurement was formalised in to a philosophical perspective called “Empiricism” characterised by Wikipedia as stating “knowledge comes only or primarily from sensory experience” (as opposed to “knowledge comes from Wikipedia”).
  3. Empiricism is handy because it enables the scientific method: everything we think we know about the world is only a tentative conclusion based on imperfect observation of incomplete data. The robustness of these conclusions depends on the evidence gathered while trying to invalidate them, and science is the methodology for engaging in this iterative process by which doubt leads to improved-but-still-tentative knowledge (science is not “people in lab coats”).
  4. Turns out that science is really tough, because nothing is caused by only one thing (think of all the factors leading to the average car crash). Furthermore, very few things always cause a given outcome (think of all the times people survive absolutely horrific car crashes that were similar to fatal crashes). This makes the world probabilistic rather than deterministic (the best we can ever say is that X is very likely to cause Y, but we can’t guarantee it).
  5. Statistics is invented as a way to deal with the inherent uncertainty of drawing tentative inferences based on imperfect observation of incomplete data. Statistical methods allow us to make statements about just how likely something is to happen (or have happened), given a tentative statement about what we expected (a hypothesis) and some observations (data). In the early days it’s really tough because there aren’t any computers and you have to look up big tables of numbers to estimate these likelihoods. Still, progress is made and Guinness develops a more consistent taste.
  6. Computers! Now the same tables can be looked up really quickly by computers using statistical software. In parallel, a person called a “computer scientist” comes in to existence, who uses lab coats to make computers faster but designs completely deterministic systems, meaning they’re not really doing science and thus don’t learn a lot about the probabilistic real world because you can’t code it all in FORTRAN on the mainframe.
  7. Faster computers! Now statistics can dispense with those big tables and instead throw massive computational power at problems of inference, using a clever theorem developed hundreds of years earlier by a statistician/philosopher/Presbyterian minister called Bayes. Statistics experiences a civil war between these new-school “Bayesians” and old-school “Frequentists” who mostly just talk past each other for the sake of pre-scientific argument (see “Humans come in to existence so we can argue about concepts” above).
  8. Computer scientists notice that statisticians are now using their deterministic equipment (computers) to make predictions about the (probabilistic) world. As they’ve always wanted to create god from the machine, aka “a self-aware robot that will clean my flat before the landlord comes around and fetch me a beer right as I realise I want one”, computer scientists borrow these statistical methods but can’t call them “statistics” because it sounds boring. As the end-goal of this borrowing is to get a machine to learn how to fetch you a beer right before you want one, the term “machine learning” is coined.
  9. The machine doing the learning is fed some tentative ideas about what is expected to happen in some fraction of the world (ie “Y usually happens right after X does”), and some data about similar things that have happened in the past (observations of X and Y), and reaches a tentative conclusion about the relationship between the two. Now the machine is considered “trained” and ready to make predictions about whether or not Y will happen after X happens, for new cases of X. This sounds a lot like what I said statistical methods do above, which is not accidental because they are the same thing!
  10. The only consequential difference between ML and statistics is the human in charge: statisticians generally care more about (tentatively) understanding the relationship between X and Y, while computer scientists generally care more about making (reliable) predictions about whether Y will happen after X happens. The former are terrified of the uncertainty of their understanding of the relationship between X and Y and this makes them risk-averse about prediction, while the latter couldn’t give a crap about uncertainty and will take whatever predictions they can get. Nevertheless, each sides fights with the other a lot (see 1 above). As you might have guessed, I’m closer to the statistical worldview, because I trained as a social scientist and so making a firm prediction about the world is terrifying! (As you might have also picked up, I’m coming round to the other way of doing things. After all, why do we try to understand social processes if not to predict what will happen next?)
  11. Unsatisfied with the number of venn diagrams already plaguing the world, Drew Conway draws another one and writes “data scientist” at the intersection of three sets of skills. I’ve yet to meet such a person. I’ve seen a lot of people on the internet claim the title because they know how to make an animated 3D pie chart.
  12. “Big data” and “cloud” became marketing terms and everything went to hell.

If you’ve got this far, thank you and congratulations! For the truly devoted, tune in next time when I actually compare how Azure and AWS implement machine learning, which is what I intended to do before this rant!
Keep asking better questions,



  1. 5 levels of analytical maturity | OptimalBI - […] has also written A brief and biased history of humanity, science, statistics and machine learning which give some excellent background…
Submit a Comment

Your email address will not be published. Required fields are marked *