Once you have data the next tricky bit is knowing what to do with it. While its tempting to jump into the hardcore analytics and play with all the new flashy software that’s around you first need to explore & summarise the data you have.
Exploring your data involves the use of good old fashioned descriptive & summary statistics. These are in the main the stuff you learnt in high school – averages, bar graphs, histograms etc.
Unfortunately descriptive & summary statistics have taken on the role of the poor cousin, the middle child, the b-grade celebrity when in fact they offer a good way to explore:
- Data quality by looking at outliers, missing values, duplicates
- Trends by graphing against time
- Relationships between data by graphing pairs of data against each other, tabulating variables against each other
- The shape of the data by graphing frequency histograms, box & whisker plots
- The spread and variability of the data by calculating the standard deviation & range, graphing frequency histograms, box & whisker plots
- What is typical through averages such as mean, median and mode
Through this exploration you learn which variables are useful, which ones may need to be transformed, which variables are inter-related with each other, what natural groupings occur within the data etc.
This part of the analytics process is very similar to an archaeological expedition – sometimes you have to dig a few holes to find the treasure – X does not always mark the spot. As the possible range of combinations and permutations to explore can be seemingly endless, coming up with a number of questions or hypotheses to test will help you navigate your way through the data.
So grab your fedora, whip & satchel and go exploring …..