Select Page

Here I would like to share one of the tricks I used in different project implementations to simplify aggregation, avoid manual coding and leave room for scaling.

I’ll explain the process using the following example. Lets say we work for a big online shop and would like to analyse reviews that buyers leave for different products and track overall performance metrics in real time. The database collects statistics about each event happening on the shop website. Therefore we are able to track user activity if they post a review, rate a product, comment on an existing review, subscribe to product updates, like or dislike it. Instead of dealing with individual events we pre-aggregate statistics every 1 hour.

The following is an example table to store the aggregate data.

CREATE TABLE user_activity (
stats_date TIMESTAMP, 
product_id VARCHAR(100), 
activity_type_id VARCHAR(30),
value integer 
); 

Where

activity_type ={like, dislike,comment ={negative, positive}, post, follow}.

Picture 1 User activity example

After analyzing the data we came up with several metrics to track the product performance. One of them is Overall Product Score.

 overall_product_score=a1*likes+a2*dislikes+a3*negative+a4*positive; 

Where

  a1,a2,a3,a4 – weights for each component in the score formula. likes, dislikes, negative (comments) and positive(comments) – overall number of respective events per 1 hour.

At this point we can apply the following technique. Instead of manually putting the above formula into aggregation query as a sequence of CASE statements  we will create a couple of tables – metrics table with the list of the metrics and metrics formula. Here we assumed that metric value is just a summary of activity_type components with the assigned weights to each of them.

CREATE TABLE metrics (
id SERIAL PRIMARY KEY,
name VARCHAR(30), 
description VARCHAR(50), 
is_active BOOLEAN ); 

CREATE TABLE metrics_formula (
id SERIAL PRIMARY KEY, 
activity_type_id VARCHAR(30),
metric_id VARCHAR(30),
weight NUMERIC,
is_active BOOLEAN ); 

Picture 2 Metrics example

 

Picture 3 Metrics formula example

Then aggregation query looks to be very short and simple. Just use JOIN.

SELECT
m.stats_date,
f.metric_id,
m.product_id,
sum(f.weight * m.value) AS overall_product_score
FROM user_activity m
JOIN metrics_formula f
ON i.attribute_id = f.attribute_id
GROUP BY m.stats_date, m.product_id,
f.metric_name;

This solution reserves the place for flexibility in case If we would like to add new metrics or retire old ones, introduce dependency on the product category, etc. Just simply add data into metrics and metrics_formula tables and there won’t be any need to re-writing the aggregate queries.

Anastasia

Anastasia blogs about Data Vault and Business Intelligence for both technical and non-technical people.
Did you know we run courses on Agile Data Warehouse Design and Data Vault?
%d bloggers like this: