Grouping Sets, Rollup, and Cube

By
Steven Ensslen
May 1, 2013

As BI professionals we spend a lot of time making information visible through reports. One of the main technologies behind those reports is Structured Query Language (SQL).
SQL is an industry standard that has come a long way since its beginnings in the 1980’s. I’ve recently had the opportunity to show some of these “new” features to seasoned developers who were unfamiliar with SQL’s recent advances.
SQL-99 added GROUPING SETS, ROLLUP, and CUBE as options for SELECT statements. These options allow SQL statements to create subtotals. Here is a quick example:
A SQL query has always been able to calculate one level of totals using aggregate functions and a GROUP BY.

select table_name, data_type, count(*) cnt
from INFORMATION_SCHEMA.COLUMNS
group by table_name, data_type
order by table_name, data_type

The basic extension is to allow multiple groupings. A GROUPING SET is a series of collections of columns, each of which is a valid group by clause.
The result set a grouping set creates resembles the result set that the least aggregated grouping creates. Most of the rows in this example are
the same as the rows from the previous query as they both GROUP BY table_name and data_type.

select table_name, data_type, count(*) cnt
from INFORMATION_SCHEMA.COLUMNS
group by grouping sets ((table_name),(data_type),(table_name, data_type))
order by table_name desc, data_type desc

The difference is that in addition to the rows from the previous query we also have total rows by table_name and total rows by data_type.
This GROUPING SET QUERY returns the same result as the following query:

select table_name, data_type, count(*) cnt
from INFORMATION_SCHEMA.COLUMNS
group by table_name, data_type
union all
select null, data_type, count(*)
from INFORMATION_SCHEMA.COLUMNS
group by data_type
union all
select table_name, null, count(*)
from INFORMATION_SCHEMA.COLUMNS
group by table_name

A key sign that you need a GROUPING SET is a query that UNIONS the same sources for different aggregations. The UNION ALL query, of course, incurs the cost of reading the table three times. In more complex examples with many joins there are significant performance gains that can be made by switching from a UNION of different groupings to a GROUPING SET. Also code maintenance is simplified and the equivalence of the different groupings is guaranteed.
The ROLLUP and CUBE are shorthands for different grouping sets.

select table_name, data_type, count(*) cnt
from INFORMATION_SCHEMA.COLUMNS
group by rollup (table_name, data_type)
order by table_name desc, data_type desc

This ROLLUP is equivalent to a GROUPING SET with the same columns as the ROLLUP, and a series of GROUPING SETS each without the last column of the previous set.

select table_name, data_type, count(*) cnt
from INFORMATION_SCHEMA.COLUMNS
group by grouping sets ((table_name, data_type),(table_name),())
order by table_name desc, data_type desc

Note the empty set “()”, which gives a single grand total for the entire result set.
CUBE is the aggregate of all of the GROUPING SETS that can be created with the columns:

select table_name, data_type, count(*) cnt
from INFORMATION_SCHEMA.COLUMNS
group by cube (table_name, data_type)
order by table_name desc, data_type desc

is the same as:

select table_name, data_type, count(*) cnt
from INFORMATION_SCHEMA.COLUMNS
group by grouping sets ((),(table_name),(data_type),(table_name, data_type))
order by table_name desc, data_type desc

GROUPING SETS are supported by the major databases:

  • Oracle since 8i
  • Microsoft SQL Server since 2008
  • DB2 since version 9
  • PostgreSQL since 8.1

Note that MySQL does not support GROUPING SETS.
We run regular Agile courses with a business intelligence slant in both Wellington and Auckland. Find out more here.

Copyright © 2019 OptimalBI LTD.