Always qualify your SQL columns
Consider the following PostgreSQL query. There’s something wrong with it.
UPDATE users
SET passwordHash = '...'
WHERE email = (
SELECT email
FROM resets
WHERE resetSecret = '...'
);
You can’t tell what’s wrong with this query unless I give you the table schema, of course. Let’s say the users
table has the columns id
, email
, passwordHash
, and some other fields. Let’s say the resets
table has two columns – id
and resetSecret
. When a user asks to reset their password, we populate resets
table with a tuple with id
as their email’s ID and resetSecret
as some random string.
You happily run the above query and it goes horribly wrong and everybody’s passwordHash
got reset. Now that you have context, what’s wrong with the above query?
Answer: There’s no email
column in the resets
table1. We should be using the id
column to select the row to update, not the email
column.
And do you want to know the worst part? Postgres (and basically every SQL flavour) will silently accept this query and you’ll never see a damned error. But what’s really happening?
Postgres knows that the resets
table does not have an email
column. So it assumes the email
on line 4 refers to the column in the users
table. Here’s what Postgres sees:
UPDATE users
SET users.passwordHash = '...'
WHERE users.email = (
SELECT users.email
FROM resets
WHERE resets.resetSecret = '...'
);
Let’s say there’s a row in resets
that matches the resets.resetSecret = '...'
condition in the subquery. In the outer query, when going through each row in users
, the subquery will return that row’s email
column value. This returned email
is checked with itself in line 3, which obviously returns true. As a result, the update is applied to every row in the users
table.
Why did this mistake happen? Human error and oversight. If we had qualified our query as we had originally intended, it would look like this:
UPDATE users
SET users.passwordHash = '...'
WHERE users.email IN (
SELECT resets.email
FROM resets
WHERE resets.resetSecret = '...'
);
The key difference is in line 4. While we had intended to write resets.email
, omitting the table qualifier allowed Postgres to assume wrongly. The above query would properly return an error at execution because there is no email
column in the resets
table.
This is a real story that happened to me (thankfully, the impact wasn’t too massive). I know it’s easier to just drop the table.
prefix in the interest of brevity and clarity. I know human error isn’t completely eliminated even if you qualify your columns. But it gives us a much better chance of not committing such errors, and honestly, we could use any help we can get in the war against bugs.
-
If you answered correctly, would you still be able to do the same at 2:00am in the night when you’re dead tired? ↩