Fun With Views and Common Table Expressions
originally published 2/1/2018 on the Hashrocket blog
A view is a stored query the results of which can be treated like a table. Note that it is the query that is saved and not the results of the query. Each time you use a view, its associated query is executed.
A related concept is that of the common table expression or CTE. A CTE can be thought of as a temporary view; you can only use it within the query in which it appears (you can refer to it multiple times, however).
Let’s say that you’re the SQL-savvy owner of a store that sells kits for building robots. On your site, customers are guided through the process of selecting all of the components needed to build a quadrupedal robot, i.e. getting the right motors, microcontrollers, sensors, batteries, grippers, etc.
Robots have lots of component parts and you have to make sure that the correct components for each order arrive or your customers will leave mean comments on social media.
Right now you have the following tables:
And those tables contain the following data:
You want to see a list of all the purchases made and the components in those purchases. You might write a query like the following:
select p.id as purchase_id,
array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info,
sum(cp.num_components * c.price) as purchase_total_price,
customer_id,
created_at
from
purchases as p
inner join component_purchases as cp on p.id = cp.purchase_id
inner join components as c on c.id = cp.component_id
group by p.id
order by p.created_at
Just a quick explanation in case anything in the query is unclear. We’re joining the tables purchases
, component_purchases
, and components
and then grouping the rows by purchase_id
. SQLFiddle here.
For each of the components in a purchase, the sum
function adds the multiple of the number of units of a component ordered and the price of a component. The array_agg
function composes an array of the details of the purchase. Notice that the sum
and array_agg
functions act across rows, but the concat
function acts across columns.
The result of our query is thus:
Check out the SQLFiddle here.
It’s not an exceptionally long query, but you suspect that you’ll be using it quite a lot and you’d prefer not to have to write the whole thing out all of the time.
That’s where views come in:
create view purchase_information as
select p.id as purchase_id,
array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info,
sum(cp.num_components * c.price) as purchase_total_price,
customer_id,
created_at
from
purchases as p
inner join component_purchases as cp on p.id = cp.purchase_id
inner join components as c on c.id = cp.component_id
group by p.id
order by p.created_at;
Now you have a simple way to reuse this query. Let’s say that you want to know all of the purchases made within the last week. Now you can just write the following:
select * from
purchase_information
where
created_at > now() - interval '7 months';
SQLFiddle here
From within the PostgreSQL interactive terminal, you can use the command \dv
to see a list of all the views in the database.
Dropping a view is as simple as dropping a table:
drop view purchase_information;
Note that if you drop a table using cascade
, any views dependent on that table will be dropped as well.
It’s also possible to perform operations on the data in the underlying tables of some views. Updatable views are views where data can be edited, inserted, or deleted. To be updatable, a view must be really basic. Think something along the lines of:
create view customer_info as
select first_name, last_name, email
from
customers
where last_name > 'F';
Updatable views can only select rows from a single table (or updatable view). They also cannot contain ‘group_by’, ‘limit’, or ‘with’ clauses.
You can see all of the relevant restrictions in the Postgres docs here.
If you do have an updatable view, you can do something like
insert into customer_info values
('Anne', 'Droid', 'anne.droid@example.com');
A new row for ‘Anne Droid’ has been inserted into the customer table. Note that the customer_info
view only includes customers whose names begin with letters after ‘E’. If you were to run the following query:
select * from customer_info;
Anne’s name would not appear in the results because her last name begins with a ‘D’. A new row has been inserted into the customers
table using the customer_info
view, but that row is not accessible via the view. SQLFiddle here
Note that you cannot insert values for columns that are present in the underlying table, but not in the view. The following query would result in an error:
insert into customer_info (first_name, last_name, email, phone) values
('Simon', 'Borg', 'simon.borg@example.com', '555-777-9999');
So, that’s views in a nutshell. Let’s move on to CTEs.
To start with, CTE stands for ‘common table expression’. They’re quite similar to views except that once the query is executed, a CTE ceases to be.
In the view purchase_information
, we were able to get all of the components in a purchase, as well as the price of the purchase. Let’s say I wanted a query that would provide me with purchase information, but also the customers to whom the purchases belonged. This can be achieved pretty easily using a common table expression. You start a CTE using the keyword ‘with’ followed by the name of the CTE and the query itself.
with
purchases as
(select p.id as purchase_id,
array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info,
sum(cp.num_components * c.price) as purchase_total_price,
customer_id
from
purchases as p
inner join component_purchases as cp on p.id = cp.purchase_id
inner join components as c on c.id = cp.component_id
group by p.id
order by p.created_at)
select
purchases.purchase_id,
purchases.purchase_total_price,
concat(cu.last_name, ', ', cu.first_name) as customer_name,
concat(cu.email, ', ', cu.phone) as customer_contact_info,
concat(sa.street1, ', ', sa.street2, ', ', sa.postal_code, ', ', sa.city, ', ', sa.region, ', ', sa.country) as customer_address,
component_info
from purchases
inner join customers as cu on purchases.customer_id = cu.id
inner join shipping_addresses as sa on sa.customer_id = cu.id;
SQLFiddle here
The purchases
CTE can be referred to like a table. Of course, once the query is executed, purchases
doesn’t exist.
The results of this query are:
Phew! Now you can send the correct robot parts to your customers.
But wait! There’s some complexity to the orders that you’ve forgotten. Some components need to be combined with other components in order to work. For the sake of simplicity, let’s say that any given component requires one or fewer other components to function. Drawing out the relationship between our components, we see that we get a tree graph.
To describe this in the database, we’ll need to add another column to the component
table. The subcomponent_id
field will point to the component upon which the current component depends.
Let’s say that a customer orders the component with id 4. In order to get all of the parts the customer needs, we’ll need traverse the tree all the way down.
Luckily, Postgres has just the solution for this - recursive common table expressions. Recursive CTEs provide a way to allow queries to perform some action with a table, and then to perform the same action on the results of the previous action.
That’s probably kind of confusing.
Below is the structure of a recursive common table expression:
with recursive cte_name as (
query to produce initial working table
union/union all
recursive query
)
do something with cte_name
The recursive CTE begins much like a non-recursive CTE. You’ll notice that the modifier recursive
is required after with
. Immediately after recursive
is the name of the CTE.
Above the union
operator is a query that initializes the working table. The working table is the result set upon which the recursive term is currently acting. For instance, you might put select id, name from robots where name='Cindi Mayweather'
. Since there is one and only one of the illustrious Cindi Mayweather, this query will return one row.
Note that this query is not executed again.
We’ll come back to what purpose the union
operator serves in a bit. For now, let’s talk about the recursive query. The recursive query will make a reference to the CTE name. Wherever you see the CTE name in the query, that is a reference to the working table.
Given a single component that needs to be shipped, we need to fetch the ids of all of its subcomponents, and their subcomponents, etc. To get all of the subcomponents for the component with an id of 4, we use the following query:
with recursive purchase_subcomponents as (
select id, subcomponent_id from components
where id = 4
union
select components.id, components.subcomponent_id from
components
inner join purchase_subcomponents
on purchase_subcomponents.subcomponent_id = components.id
)
select id from purchase_subcomponents;
That’s pretty gross-looking, so let’s go through what’s happening step by step.
Execution of the cte starts with the non-recursive query. Running that query returns the following:
This is the first working table.
We see that an inner join of the working table and the components table produces the following:
From this, we get the next working table.
The working table is substituted for purchase_components
in the recursive query, with the following result:
And so on…
The recursion ends when the working table has no rows, i.e. when the values for the rows are all null.
Now we come to what union
is doing. The union
operator acts on all of the working tables that have been generated, combining them all into one table. To ensure that there are no duplicate rows, you use union all
.
Hence the result of running the query is
SQLFiddle here.
Boom! Nailed it.
Kinda.
If you recall, up top we have a list of purchases with the associated components and customer information. Well, that component list isn’t complete; we have to include subcomponents as well.
with recursive
purchases_made as
(
select p.id as purchase_id,
c.id as component_id,
c.name as component_name,
subcomponent_id,
num_components
from
purchases as p
inner join component_purchases as cp on p.id = cp.purchase_id
inner join components as c on c.id = cp.component_id
order by p.created_at
),
purchase_components as (
select * from purchases_made
union
select
purchase_id,
components.id,
components.name,
components.subcomponent_id,
num_components
from components
inner join purchase_components
on purchase_components.subcomponent_id = components.id
)
select
purchase_id,
array_agg(concat('[', num_components::text, ' x cid:', component_id, ', ', component_name, ']')) as purchase_items,
sum(num_components * components.price) as purchase_total_price,
concat(cu.last_name, ', ', cu.first_name) as customer_name,
concat(cu.email, ', ', cu.phone) as customer_contact_info,
concat(sa.street1, ', ', sa.street2, ', ', sa.postal_code, ', ', sa.city, ', ', sa.region, ', ', sa.country) as customer_address
from purchase_components
inner join components on purchase_components.component_id = components.id
inner join purchases on purchases.id = purchase_components.purchase_id
inner join customers as cu on purchases.customer_id = cu.id
inner join shipping_addresses as sa on sa.customer_id = cu.id
group by purchase_id,
cu.last_name,
cu.first_name,
cu.email,
cu.phone,
sa.street1,
sa.street2,
sa.postal_code,
sa.city,
sa.region,
sa.country;
SQLFiddle here
In this query, we have two CTEs. When you want to use multiple CTEs in a query, you place with
at the start of the group and separate each CTE with a comma. Notice that, even though only one of the CTEs is recursive, the recursive
modifier is placed directly after with
, as opposed to directly before the recursive CTE. Each CTE is executed normally. Another nifty thing about using recursive
is that its presence means your CTEs can be in any order, i.e. one CTE can make reference to a CTE that is defined later. If you’re wondering how the component_id
alias showed up in the final results, remember that aliases are preserved in the results of a union.
To understand how this query works, we first need to understand what the results of the purchases_made
query are.
SQLFiddle here
This is our first working table.
Following a process similar to the one in the earlier recursive query, the first inner join results in:
The results of the purchase_components
CTE can be found in this SQLFiddle.
I would recommend drawing out tables in a notebook to really hammer home some of the ideas presented in this blog post. And feel free to play around in the SQLFiddle examples provided. It’s a pretty great tool.
Related articles:
Photo by Tess Nebula on Unsplash