Table of contents
Open Table of contents
Regular Expression
Basic Operators
PostgreSQL offers several operators for regex matching:
~- Matches regular expression, case sensitive~*- Matches regular expression, case insensitive!~- Does not match regular expression, case sensitive!~*- Does not match regular expression, case insensitive
Simple Examples
-- Find emails containing 'gmail'
SELECT email FROM users WHERE email ~ 'gmail';
-- Case-insensitive search for names starting with 'john'
SELECT name FROM customers WHERE name ~* '^john';
-- Find phone numbers NOT matching a pattern
SELECT phone FROM contacts WHERE phone !~ '^\+1';
Common Pattern Elements
^- Start of string$- End of string.- Any single character*- Zero or more of preceding element+- One or more of preceding element?- Zero or one of preceding element[abc]- Any character in brackets[^abc]- Any character NOT in brackets|- OR operator{n}- Exactly n occurrences{n,m}- Between n and m occurrences
Useful Functions
regexp_match() - Returns first match as an array:
SELECT regexp_match('My email is test@example.com', '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}');
regexp_matches() - Returns all matches:
SELECT regexp_matches('abc123def456', '\d+', 'g');
regexp_replace() - Replace matched patterns:
SELECT regexp_replace('Phone: 123-456-7890', '\d', 'X', 'g');
-- Result: 'Phone: XXX-XXX-XXXX'
regexp_split_to_table() - Split string by pattern:
SELECT regexp_split_to_table('apple,banana;orange', '[,;]');
regexp_split_to_array() - Split into array:
SELECT regexp_split_to_array('one:two:three', ':');
Practical Examples
Validate email format:
SELECT email
FROM users
WHERE email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$';
Extract domain from email:
SELECT regexp_replace(email, '^.*@', '') AS domain
FROM users;
Find US phone numbers:
SELECT phone
FROM contacts
WHERE phone ~ '^\+?1?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$';
Extract all numbers from text:
SELECT regexp_matches(description, '\d+', 'g') AS numbers
FROM products;
Flags
You can add flags as a third parameter to functions:
g- Global (find all matches, not just first)i- Case insensitiven- Newline-sensitive matchingp- Partial newline-sensitive matching
SELECT regexp_replace('Hello World', 'o', '0', 'gi');
-- Result: 'Hell0 W0rld'
Regular expressions in PostgreSQL follow POSIX standards and are incredibly versatile for data validation, cleaning, and extraction tasks!
JOIN vs. WHERE
- INNER JOIN - WHERE and ON are Often Equivalent
- LEFT JOIN - WHERE vs ON Produce DIFFERENT Results! ⚠️
The Golden Rule
JOIN conditions → Control MATCHING behavior
WHERE conditions → Control FINAL OUTPUT
A LEFT JOIN always preserves all rows from the left table, but:
- JOIN conditions determine which right-table rows get matched
- WHERE conditions filter the final result set (can remove left-table rows)
Conditions in JOIN Clause
LEFT JOIN right_table r ON left_table.id = right_table.id
AND <condition>
| Condition on | Effect | Use Case |
|---|---|---|
| Right table columns | ✅ Filters which right rows to match | Filter related records only |
| Left table columns | ⚠️ Acts like a filter, may exclude left rows | ==Rarely useful, usually wrong== |
Result: All left rows appear, but only matching right rows are joined
Conditions in WHERE Clause
LEFT JOIN right_table r ON left_table.id = right_table.id
WHERE <condition>
| Condition on | Effect | Use Case |
|---|---|---|
| Left table columns | ✅ Filters left rows in final output | Filter main dataset |
| Right table columns | ⚠️ Converts to INNER JOIN behavior | Eliminates unmatched left rows |
Result: Filters the complete result set after joining
Example 1
Example:
-- orders
order_id | customer_id | amount
---------|-------------|-------
1 | 101 | 100
2 | 102 | 200
3 | 103 | 150
-- customers
customer_id | name | country
------------|-------|--------
101 | Alice | USA
102 | Bob | UK
103 | Carol | USA
Using WHERE (Filters AFTER join):
SELECT *
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.country = 'USA';
Result:
order_id | customer_id | amount | customer_id | name | country
---------|-------------|--------|-------------|-------|--------
1 | 101 | 100 | 101 | Alice | USA
3 | 103 | 150 | 103 | Carol | USA
What happened:
- LEFT JOIN keeps all orders
- WHERE filters out Bob (UK customer)
- Effectively becomes an INNER JOIN! ❌
Using ON (Filters DURING join):
SELECT *
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
AND c.country = 'USA';
Result:
order_id | customer_id | amount | customer_id | name | country
---------|-------------|--------|-------------|-------|--------
1 | 101 | 100 | 101 | Alice | USA
2 | 102 | 200 | NULL | NULL | NULL
3 | 103 | 150 | 103 | Carol | USA
What happened:
- LEFT JOIN keeps all orders
- ON condition filters customers during the join
- Bob’s order still appears, but with NULL customer data ✅
The Critical Difference Visualized
LEFT JOIN with filter in ON:
orders ← customers (filtered)
✅ All orders kept, some have NULL customer data
LEFT JOIN with filter in WHERE:
(orders ← customers) → filter
❌ Orders without matching filtered customers are removed
❌ Acts like INNER JOIN!
Example 2: E-commerce Orders
❌ WRONG - Breaks LEFT JOIN:
SELECT o.*, c.name
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'USA'; -- Removes all non-USA orders!
✅ CORRECT - Preserves all orders:
SELECT o.*, c.name
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id AND c.country = 'USA'
WHERE o.status = 'completed'; -- Filters orders, not join
Example 3: Product Inventory
Find products with optional supplier info, only active suppliers:
SELECT p.*, s.name AS supplier_name
FROM products p
LEFT JOIN suppliers s
ON p.supplier_id = s.id
AND s.status = 'active' -- Filter suppliers, keep all products
WHERE p.in_stock = true; -- Filter products
Example 4: User Activity Logs
Get all users, with optional last login (only in 2024):
SELECT u.*, l.login_time
FROM users u
LEFT JOIN (
SELECT user_id, MAX(login_time) AS login_time
FROM logins
WHERE EXTRACT(YEAR FROM login_time) = 2024
GROUP BY user_id
) l ON u.user_id = l.user_id
WHERE u.active = true;
Bottom Line
Simple mental model:
ON = "How do these tables connect?"
WHERE = "What data do I want to see?"
For INNER JOIN: Technically interchangeable, but use WHERE for filters (convention)
For OUTER JOIN: Critical difference - use ON to filter the “optional” side
Master this and you’ll write clearer, more maintainable SQL! 🚀
LIKE & Regular Expression
1527. Patients With a Condition
Regular Expression
SELECT *
FROM Patients
WHERE conditions ~ '(^|[\ ])DIAB1'
LIKE
SELECT *
FROM Patients
WHERE
conditions LIKE 'DIAB1%'
OR conditions LIKE '% DIAB1%'
1873. Calculate Special Bonus
Regular Expression
SELECT
employee_id,
CASE
WHEN employee_id % 2 = 1 AND "name" !~ '^M' THEN salary
ELSE 0
END AS bonus
FROM Employees
ORDER BY employee_id
LIKE
SELECT
employee_id,
CASE
WHEN employee_id % 2 = 1 AND "name" NOT LIKE 'M%' THEN salary
ELSE 0
END AS bonus
FROM Employees
ORDER BY employee_id
1517. Find Users With Valid E-Mails
Regular Expression
SELECT *
FROM Users
WHERE mail ~ '^[[:alpha:]][[:alnum:]_.-]*(@leetcode\.com)$'
-- WHERE mail ~ '^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$'
Pandas
import pandas as pd
def valid_emails(users: pd.DataFrame) -> pd.DataFrame:
filter = users.mail.str.contains(r"^[A-Za-z][A-Za-z0-9_.-]*(@leetcode\.com)$", regex=True)
return users[filter]
JOIN
1158. Market Analysis I
1083. Sales Analysis II
1084. Sales Analysis III
NOT IN
1607. Sellers With No Sales
See LeetCode SQL Analysis - 1607. Sellers With No Sales