Comparing sets in SQL

by | Feb 24, 2015

Steven-Ensslen-Orange
I was shown the following technique for comparing tables when I was a junior programmer many years ago.

WITH A as (SELECT something)
, B as (SELECT another_similar_thing)
SELECT * FROM A MINUS SELECT * FROM B
UNION ALL
SELECT * FROM B MINUS SELECT * FROM A;

This technique is simple and powerful. I use it whenever I am tuning SQL statements. The old slow statement goes in one side, or into a table that is queried. The newly tuned statement goes in the other side.
This technique is powerful. I’ve taught it to many people.
So I was quite surprised to learn that it doesn’t quite work. MINUS removes duplicates. The following statements are equivalent:

WITH Q1 as (SELECT * FROM A)
, Q2 as (SELECT 1 * FROM B)
SELECT * FROM Q1 EXCEPT SELECT * FROM Q2
UNION ALL
SELECT * FROM Q2 EXCEPT SELECT * FROM Q1;

WITH Q1 as (SELECT DISTINCT * FROM A)
, Q2 as (SELECT DISTINCT * FROM B)
SELECT * FROM Q1 EXCEPT SELECT * FROM Q2
UNION ALL
SELECT * FROM Q2 EXCEPT SELECT * FROM Q1;
SQL-92 has the keyword EXCEPT, which Oracle implemented with the name MINUS.
The DISTINCT keyword makes no difference to the query. Duplicates are removed either way.
Why does this matter? It means that the technique doesn’t prove that the two sets have the same number of rows, as the following example demonstrates.

WITH Q1 as (SELECT 1 x1 UNION ALL SELECT 1)
, Q2 as (SELECT 1 x1)
SELECT * FROM Q1 EXCEPT SELECT * FROM Q2
UNION ALL
SELECT * FROM Q2 EXCEPT SELECT * FROM Q1;

Returns no rows even though the inputs are clearly different.
So what to do? You could write a statement that compared every column, but that’s too much error-prone code for my liking. Instead combine the MINUS technique with a simple Count, as follows:

WITH Q1 as (SELECT * FROM A)
, Q2 as (SELECT 1 * FROM B)
SELECT * FROM Q1 EXCEPT SELECT * FROM Q2
UNION ALL
SELECT * FROM Q2 EXCEPT SELECT * FROM Q1;

SELECT count(*) FROM A;
SELECT count(*) FROM B;
Good hunting,
Steven

0 Comments

Trackbacks/Pingbacks

  1. Comparing sets in SQL – Steven Ensslen - […] Read the rest on the Optimal BI blog. […]
Submit a Comment

Your email address will not be published. Required fields are marked *