Top 7 SQL interview questions for tech professionals

Doing an SQL interview as part of your job application?
Get some advance practice with our free sample questions.

Show Me Question #1

SQL, a programming language used to communicate with databases, is one of the most useful skills to know for recruiting in the tech industry. Pronounced "ess-cue-ell" and short for Structured Query Language, this incredible tool is a must-have for analyzing large data sets. In particular, it shines when applied to relational databases — unique tables of data that are all related to each other in some way.

Because SQL is so ubiquitous in the tech world, many companies conduct SQL interviews before extending job offers. This helps ensure that job applicants — particularly for roles in project management, analytics, business intelligence, and software engineering — are comfortable using SQL on the job.

If you’ve got an upcoming SQL interview, you’re probably wondering what sorts of questions you might get. Recruiters are generally vague on details, and it can be scary to walk into a conversation like this one blind.

So, we’ve provided our list of the 7 most common SQL interview questions so that you can get some practice in before you exam. With a little bit of advance preparation, you’ll feel prepared and confident on interview day.

Show Me Question #1

 

1. What is a relational database, and what is SQL? Please provide examples as part of your explanation.

Even though you probably know what relational databases are — and what SQL itself is — it can be tough to come up with coherent, simple explanations during a live interview. To that end, be sure that you've prepared in advance to answer this simple question. Here are some tips:

A relational database is a set of data tables that are somehow linked to — or related to — each other. It is used to store different types of information that can be pulled together to answer specific analytical questions. It's a useful way to minimize the amount of data stored on a server without losing any critical information.

That's a bit of a vague definition, so let's take a look at a relational database in practice. A simple version of a relational database for an online retailer might contain two separate data tables:

  • Customers. A list of customer information, including customer names, contact information, and shipping preferences. Each record in this database contains a unique customer_id field by which the customer can be identified.
  • Orders. A list of orders purchased at the retailer's website. Each order listing also contains a customer_id field, which is used to link that order's details with the specific customer who placed the order.

Of course, we wouldn't need a multi-table database if we simply included customer information on the Orders table. But that wouldn't be particularly efficient: if a single customer placed multiple orders, his or her name, contact information, and shipping preferences would be listed on multiple lines of the Orders table — leading to unnecessary duplication and an unmanageably large database. Instead, we create a relational database to save space and show how different pieces of data are linked together.

SQL, then, is simply the language used to communicate with this relational database. Databases don't yet understand human languages like English — it's simply too syntactically complex — so we use a standardized language to communicate with them that we know will be understood.

2. What are the different types of SQL JOIN clauses, and how are they used?

In SQL, a JOIN clause is used to return a table that merges the contents of two or more other tables together. For example, if we had two tables — one containing information on Customers and another containing information on the Orders various customers have placed — we could use a JOIN clause to bring them together to create a new table: a complete list of orders by customer, with all necessary information to make shipments.

There are multiple types of JOIN clauses, and they all serve slightly different functions:

  • INNER JOIN returns a list of rows for which there is a match in both tables specified. It's the default join type, so if you just type JOIN without specifying any other conditions, an INNER JOIN will be used.
  • LEFT JOIN will return all results from the left table in your statement, matched against rows in the right table when possible. If a row in the left table does not contain a corresponding match in the right table, it will still be listed — with NULL values in columns for the right table.
  • RIGHT JOIN will return all results from the right table in your statement, matched against rows in the left table when possible. If a row in the right table does not contain a corresponding match in the left table, it will still be listed — with NULL values in columns for the left table.
  • FULL JOIN will return all results from both the left and the right tables in your statement. If there are instances in which rows from the left table do not match the right table or vice versa, all data will still be pulled in — but SQL will output NULL values in all columns that are not matched.
  • CROSS JOIN returns the Cartesian product of two tables — in other words, each individual row of the left table matched with each individual row of the right table.

3. Why is this query not returning the expected results?

We have 1000 total rows in the orders table:

SELECT * FROM orders;
-- 1000 rows in set (0.05 sec)

And 23 of those orders are from the user with customer_id = 45:

SELECT * FROM orders WHERE customer_id = 45;
-- 23 rows in set (0.10 sec)

Yet, when we SELECT the number of orders that are not from customer_id = 45, we only get 973 results:

SELECT * FROM orders WHERE customer_id <> 45;
-- 973 rows in set (0.11 sec)

973 + 23 = 996. But shouldn't the number of orders with customer_id equal to 45 plus the number of orders with customer_id not equal to 45 equal 1000? Why is this query not returning the expected results?

The answer: this data set most likely contains order values with a NULL customer_id. When using the SELECT clause with conditions, rows with the NULL value will not match against either the = or the <> operator.

Our second query above could be modified as follows to produce the expected results:

SELECT * FROM orders WHERE (customer_id <> 45 OR customer_id IS NULL);
-- 977 rows in set (0.11 sec)

4. Why does one of these queries work while the other does not?

Consider the following query, which returns the expected results:

SELECT CASE WHEN (3 IN (1, 2, 3, NULL)) THEN 'Three is here!' ELSE "Three isn't here!" END AS result;

/*
	+----------------+
	| result         |
	+----------------+
	| Three is here! |
	+----------------+
	1 row in set (0.00 sec)
*/

The output "Three is here!" is shown, because the value 3 is included in the IN clause. But what about the following query?

SELECT CASE WHEN (3 NOT IN (1, 2, NULL)) THEN "Three isn't here!" ELSE 'Three is here!' END AS result;

/*
	+----------------+
	| result         |
	+----------------+
	| Three is here! |
	+----------------+
	1 row in set (0.00 sec)
*/

Three is not included in the second set — so why does our query mistakenly deliver the output, "Three is here!"?

The answer, once again, has to do with the way MYSQL handles NULL values. Let's take a closer look. In our first query, we ask whether the value 3 is included in the set (1, 2, 3, NULL). Our statement is functionally equivalent to the following:

SELECT CASE WHEN ((3 = 1) OR (3 = 2) OR (3 = 3) OR (3 = NULL)) THEN 'Three is here!' ELSE "Three isn't here!" END AS result;

/*
	+----------------+
	| result         |
	+----------------+
	| Three is here! |
	+----------------+
	1 row in set (0.00 sec)
*/

Since 3 is definitely equal to 3, one of our OR conditions is met, and the statement outputs, "Three is here!". Our second statement, on the other hand, asks whether the value 3 is NOT included in the set (1, 2, NULL). This statement is functionally equivalent to the following:

SELECT CASE WHEN ((3 <> 1) AND (3 <> 2) AND (3 <> NULL)) THEN "Three isn't here!" ELSE "Three is here!" END AS result;

/*
	+----------------+
	| result         |
	+----------------+
	| Three is here! |
	+----------------+
	1 row in set (0.00 sec)
*/

In this case, the conditional check 3 <> NULL fails, because in ANSI-standard SQL, we need to use the IS NULL statement rather than the <> operator.

5. Can you construct a basic INNER JOIN?

Consider our customers and orders tables, with the following respective schema:

CREATE TABLE `customers` (
  `customer_id` int(11) NOT NULL AUTO_INCREMENT,
  `first_name` varchar(255) NOT NULL,
  `last_name` varchar(255) NOT NULL,
  `email` varchar(255) NOT NULL,
  `address` varchar(255) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  `state` varchar(2) DEFAULT NULL,
  `zip_code` varchar(5) DEFAULT NULL,
  PRIMARY KEY (`customer_id`)
);

CREATE TABLE `orders` (
  `order_id` int(11) NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) NOT NULL,
  `order_placed_date` date NOT NULL,
  PRIMARY KEY (`order_id`),
  KEY `customer_id` (`customer_id`),
  FOREIGN KEY (`customer_id`) REFERENCES `customers` (`customer_id`)
);

Can you construct a simple SELECT statement that uses an INNER JOIN to combine all information from both the customers and orders tables?

The answer here is really simple. Here's how we'd do it:

SELECT * FROM orders INNER JOIN customers on orders.customer_id = customers.customer_id;

6. Working with the AS statement

We've written a query based on the orders table above to select all orders from the year 2016. But something is wrong with our query. Can you figure out what it is?

SELECT order_id, customer_id, YEAR(order_placed_date) AS order_year FROM orders WHERE order_year = 2016;

Here's the answer: order_year is an alias, meaning that it's being used as another name for a more complex reference: YEAR(order_placed_date). It turns out that in SQL, aliases can only be referenced in GROUP BY, ORDER BY, and HAVING clauses — they can't be used in WHERE clauses. Running the above code will produce the following result:

--ERROR 1054 (42S22): Unknown column 'order_year' in 'where clause'

To fix this problem, we need to reiterate the definition of the order_year alias in the WHERE clause like so:

SELECT order_id, customer_id, YEAR(order_placed_date) AS order_year FROM orders WHERE YEAR(order_placed_date) = 2016;
--498 rows in set (0.00 sec)

7. Using the SUM function

Consider the following database schema:

CREATE TABLE `products` (
  `product_id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `price` decimal(19,4) NOT NULL,
  PRIMARY KEY (`product_id`)
);

CREATE TABLE `order_products` (
  `order_product_id` int(11) NOT NULL AUTO_INCREMENT,
  `order_id` int(11) NOT NULL,
  `product_id` int(11) NOT NULL,
  PRIMARY KEY (`order_product_id`),
  KEY `order_id` (`order_id`),
  KEY `product_id` (`product_id`),
  FOREIGN KEY (`order_id`) REFERENCES `orders` (`order_id`),
  FOREIGN KEY (`product_id`) REFERENCES `products` (`product_id`)
)

Can you write a query that finds the total order price (e.g., the sum of product.price from each order) for all order_ids?

This question is a bit tough, as we'll have to use both the SUM function and the GROUP BY clause to aggregate orders by order_id. Here's how we do it:

SELECT order_id, SUM(price) AS total_order_price FROM order_products INNER JOIN products ON order_products.product_id = products.product_id GROUP BY order_id;
--1000 rows in set (0.01 sec)

Looking for more SQL prep? Don't forget to check out our resources page!