Nic Cox used to be a Data Warehouse and ETL developer here at OptimalBI. Nic used to regularly battle it out whenever we did an exercise challenge. He has now spread his wings and is now living in London where is enjoying life on the Kiwi OE. You can read the original post of this blog at Nic Cox Business Intelligence here.
In this blog, I take a look at common table expressions (CTE) in SQL Server, the benefits they provide and some of the reasons you might choose to use one.
What is a Common Table Expression?
Common table expressions are a useful way of creating a temporary result set within a query. A CTE is similar to a view, except the result set is only stored for the duration of the query and it is only able to be referenced within the scope of that query. This is particularly useful if you need to use the same result set multiple times within a query as the result set is stored in memory instead of you having to query the same table or tables over and over again.
A CTE can be thought of as a variation of a derived table or subquery, however, I believe it is much more readable. This is because a CTE is read from top to bottom, left to right like you read most books, articles or even comics. The final select statement, or the result set you are looking to achieve, is at the end of the query as opposed to the inner most query of a derived table or subquery. For those, it is located in the middle, with the outermost query located at both the top and the bottom of the query unlike any book, article or comic I’ve ever read.
A CTE can also be used instead of a temporary table or table variable, as they all provide similar yet different functionality. However, out of the three, a CTE will likely provide you the least amount of performance gains against large data sets. Without going into too much detail about the alternatives, a CTE is like a view except only available within the query itself; a table variable is like other declared variables and only available for the scope of a procedure for instance; and a temporary table is much like a normal table except it is stored in tempdb and is available until it is either dropped, or the creating connection is terminated. If you are working with large data sets, then a temporary table is definitely the way to go as you can create indexes on them before using them further down your process.
Keep an eye out for future blogs of mine that delve deeper into the previously mentioned alternatives to a common table expression.
How is a CTE structured?
A CTE always starts with a WITH statement, followed by AS and the chosen name of the result set. After that, you can optionally include the column names of the CTE, but this is only mandatory when your CTE query will return identical column names. You will notice all syntax examples I provide do not include columns in their definition. Subsequent CTE’s in your query follow a comma, denoting the end of the previous CTE and the start of the next. To call the result set from a CTE, it must always be associated with a SELECT, INSERT, UPDATE, DELETE or MERGE. If your CTE is part of a greater procedure, it must be terminated with a semi-colon (;). I prefer to prefix my common table expression names with “cte”, so others reading will instantly recognize it for what it is but the choice is up to you.
A common table expression follows the following structure:
WITH cte_dataset AS ( SELECT Id, Name FROM Schema.Table), cte_dateset2 AS ( SELECT Name,max(RandomDa
te) AS ImportantDate FROM Schema.Table2) SELECT t1.Id, t1.Name, t2.ImportantDate FROM cte_dataset t1 INNER JOIN cte_dataset2 t2 ON t1.Name = t2.Name
While this gives you an idea of the basic structure of a CTE, in fact, two CTE’s, it, of course, doesn’t really make too much sense. In this situation, you could skip the common table expression altogether but I just wanted to provide some basic syntax with a couple of CTE’s to give you an idea of what they look like. You will see firstly the WITH statement, the only one allowed in the query. The CTE name follows, after that, I have intentionally skipped defining the optional column names. Next is the AS statement and an opening bracket and the first table query. As you can see, a closed bracket and comma separate the start of the next CTE. After the second CTE, you will finally see the actual query we are using to present the end result. This queries both common table expressions and is not given a name like those CTEs.
One of the main benefits of a CTE from my point of view is that they allow you to gradually build up your query using logical blocks. This makes writing and reading your queries much easier as there is a logical story path from top to bottom. This also makes it easier for others to read your queries as well. The top down writing approach to a WITH statement means you can test each CTE along the way and watch what happens to your data throughout the query. You can reference an already defined CTE as many times as you want in your query, but you are unable to reference a CTE further down the query that is yet to be defined.
A CTE is perfect when used in conjunction with some functions, such as RANK or ROW_NUMBER. Window functions cannot be used in a WHERE clause as they may create ambiguity in your result set. Instead, you can use a CTE to first use the window function and then filter on it in the final select statement.
WITH cte_func AS ( SELECT Id, Name, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Dob) AS rn) SELECT Id, Name FROM cte_func WHERE rn = 1
This example will return you the Id and Name of the oldest person of each distinct name.
Recursive Common Table Expressions
A CTE is also able to reference itself. This is known as a recursive common table expression. A recursive CTE allows a query to perform logic against itself. This can be useful for tasks involving a hierarchy. A recursive CTE must have at least two query definitions, one as the anchor, or starting result set, and one as the recursive set. You can define as many anchor and recursive sets as you want, but the anchors must always come first. It is useful for hierarchies because it keeps on referencing itself until it has reached the end of its own hierarchy. As with all recursive operations, you can find yourself in a loop, so make sure you define it carefully.
The basic structure of a recursive CTE is as follows:
WITH cte_recursive AS ( SELECT Name, 0 AS Level FROM Schema.Table UNION ALL SELECT t.Name, r.Level + 1 FROM Schema.Table t INNER JOIN cte_recursive r ON t.Name = r.Name WHERE r.Level < 5) SELECT * FROM cte_recursive
You can see in the recursive CTE that there is first an anchor query, this provides the initial result set, then the recursive set works off of that adding one more to the level. The recursive set keeps on processing until a condition is met to end it, in this case, while the level remains less than 5, the recursion will keep on running. If no condition to terminate is specified, a continuous loop can occur.
Common table expressions are much like a standard view, except they are only available within the scope of the query. If readability or recursion are what you need, then a common table expression may be the perfect candidate for you. If it is significant performance enhancements you are after, then you may be better off looking at other options. I hope this blog gave you a quick insight into common table expressions.
As always, thanks for reading,
We will be running regular guest posts from Nic. You can read his blog, SQL Server Series: What is an Index? or visit his full site here. You can also read Nic’s blogs from his time with OptimalBI here.
OptimalBI run regular Data Vault course for business analysts, data architects, and business intelligence developers in Wellington and Auckland. Find out more here.