LeetCode SQL Hack | Zax's Blog

Open Table of contents

Checklist
Local Test

Checklist

4-Step Framework to Solve SQL Problems during an Interview

Explore Underlying Data
Identify Required Columns
Visualize The Output
Build Solution Step-By-Step And Test

Debug your Code

Follow the 4 steps to debug your code for most common mistakes you may have:

DISTINCT or not?
- Do you need to return distinct results?
AGG what? Data Types?
- Which column should you aggregate?
  - e.g., in some cases, COUNT(*) != COUNT(col)
- What Data Type should the result be? INT, BOOL, or NUMERIC?
NULL values & Missing values?
- How to handle NULL values? Pay attention to the Primary Keys.
- How to handle the missing values, should we leave 0 or remove the records?
WHERE, HAVING, or CASE?
- While they are all filters, think clear before applying the filter.
- Especially when LEFT JOIN, HAVING (on the right table) is completely different than applying WHERE clause on the entire tables.
  - When you don’t need the records, apply WHERE
  - When you need the records from the left table and leave NULL or 0 on the results, apply HAVING

Complete logical execution order of SQL

FROM, JOIN
WHERE
GROUP BY
HAVING
Window functions
SELECT
DISTINCT
ORDER BY
LIMIT/OFFSET

That’s why you can use alias when ORDER the results since ORDER BY is at the end of the execution sequence. While HAVING is before SELECT, in some database like Postgresql and MySQL, you can still use alias when HAVING.

1. FROM/JOIN       ← No aliases exist yet
2. WHERE           ← No aliases exist yet ❌
3. GROUP BY        ← Aliases may work (DB-dependent)
4. HAVING          ← Aliases often work ✅
5. SELECT          ← Aliases are CREATED here
6. DISTINCT        ← Can use aliases (DB-dependent)
7. ORDER BY        ← Aliases always work ✅
8. LIMIT/OFFSET    ← Works on final result

The rule: You can only use aliases in clauses that execute AFTER SELECT (or at the same time).

The Rules by Clause

Clause	Executes	Name Resolution Priority
WHERE	Before SELECT	Original column only (aliases don’t exist)
GROUP BY	Before SELECT	Original column first, then alias
HAVING	After GROUP BY	Alias first, then original column
SELECT	Creating aliases	N/A (this is where aliases are born)
ORDER BY	After SELECT	Alias first, then original column

While Postgresql can use alias when GROUP BY, HAVING, ORDER BY, you should NOT to use it when GROUP BY since it use original column first! Just explicitly specify the columns for GROUP BY!

An example of GROUP BY - Before SELECT - Original column first, then alias: This code is only standard SQL but runs on MySQL/Postgresql/SQLit

SELECT
	CASE
		WHEN income < 20000 THEN 'Low Salary'
		WHEN income < 50000 THEN 'Average Salary'
		ELSE 'High Salary'
	END AS category,
	COUNT(*) AS accounts_count
FROM Accounts
GROUP BY category

Local Test

VSCode Extensions

PostgreSQL: Postgresql DB connecter
SQL Formatter VSCode: format SQL code.
Data Wrangler: for handling tabular data in Jupyter.

Set up Your Local Test ENV

For better learning LeetCode SQL, you can set up your local SQL ENV.

For example, open your MySQL Workbench and connect to your local SQL server. Run:

create database leetcode;
use leetcode;

For the LeetCode SQL problem, click on SQL Schema under the question name. Then copy the SQL schema from the right window.

Example DB Schema

Open VSCode or any text editor that supports Regular Exression. Paste the SQL schema to any new file.

Like this:

Searching on text editor

Press Command + F for OS X users or Control + F for Windows users.

Click on the button to turn on User Regular Expression

Replace by regular expression

For the area Find, input \n For the area Replace, input ;\n This command is for replacing every line break with a ; then add a line break. You must have ; for each SQL statement to make it valid.

Then click on Replace All

Replace All

Then you will get the valid SQL INSERT statements.

Complete the INSERT statement

Copy and paste them to your MySQL workbench for local testing.

Don’t forget to drop your test table if you have a table with the same name. Otherwise you cannot run this SQL INSERT statements.

For example, LeetCode usually use table Order to store order data. You may have multiple questions with the same table name but different table schema. So you need to drop it first.

Tools

str_to_sql can convert LeetCode SQL testcase to SQL INSERT statemens for better debugging in your local SQL env.

Source Cdoe:

def str_to_sql(tb_string: str, tb_name: str, data_types=None) -> str:
    """
    Convert a string representation of a table to an SQL INSERT statement.

    Args:
        tb_string: Multiline string with the first line as headers and the rest as data rows.
        tb_name: Name of the SQL table to insert data into.
        data_types(data type, optional): A set of data types to map each column.

    Returns: SQL INSERT statement as a string.
    """
    lines = tb_string.strip().split("\n")
    headers = lines[0].split("|")[
        1:-1
    ]  # Skip the first and last empty splits from the borders
    headers = [header.strip() for header in headers]

    if data_types and len(data_types) != len(headers):
        raise ValueError(
            f"Expected {len(headers)} data types, received {len(data_types)} data types!"
        )

    values = []
    # apply str to each column
    if not data_types:
        for line in lines[2:]:  # Skip headers and '----'
            row = line.split("|")[
                1:-1
            ]  # Skip the first and last empty splits from the borders
            row = [i.strip() for i in row]
            values.append(f"({', '.join(map(repr, row))})")
    # map desired data type to each column
    else:
        for line in lines[2:]:  # Skip headers and '----'
            row = line.split("|")[
                1:-1
            ]  # Skip the first and last empty splits from the borders
            row = [i.strip() for i in row]
            i = 0
            while i < len(data_types):
                row[i] = data_types[i](row[i])
                i += 1
            values.append(f"({', '.join(map(repr, row))})")

    columns = ", ".join(headers)
    # multiline output
    # values_string = ',\n'.join(values)
    # single line output
    values_string = ", ".join(values)

    # multiline output
    # return f"INSERT INTO {tb_name} ({columns}) VALUES\n{values_string};"
    # single line output
    return f"INSERT INTO {tb_name} ({columns}) VALUES {values_string};"

How get your failed testcase and use this program?

Under the Input area of your Testcase, select the table and copy it.
Paste the table string to your notebook and assign the value to variable tb_string. Assign tb_name with the same table name as your Testcase. Then run funciton str_to_sql.
Notice you can also pass date types in a tuple to the parameter data_types.

Example 1:

tb_string = """
| player_id | device_id | event_date | games_played |
| --------- | --------- | ---------- | ------------ |
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
"""

str_to_sql(tb_string, 'Customer')

Output:

'INSERT INTO Customer (customer_id, product_key) VALUES ('1', '5'), ('2', '6'), ('3', '5'), ('3', '6'), ('1', '6');'

Example 2:

Use print to skip the '' Pass data types to the parameter data_types to assign data type for each column. If you skip the para data_types, the default data type for each column is string.

print(str_to_sql(tb_string, 'Activity', (int, int, str, int)))

INSERT INTO Activity (player_id, device_id, event_date, games_played) VALUES (1, 2, '2016-03-01', 5), (1, 2, '2016-05-02', 6), (2, 3, '2017-06-25', 1), (3, 1, '2016-03-02', 0), (3, 4, '2018-07-03', 5);