This advanced SQL lesson dives deep into powerful techniques crucial for People Analytics: Window Functions and Common Table Expressions (CTEs). You will learn how to perform sophisticated calculations, analyze trends, and structure complex queries efficiently for insightful workforce analysis.
Window functions are powerful tools in SQL that allow you to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions (e.g., SUM, AVG) that collapse multiple rows into a single output row, window functions return a value for each row, based on calculations across the defined window (partition) of rows. This capability is essential for trends analysis, ranking, and comparisons within specific groups.
Syntax:
function_name(expression) OVER (
PARTITION BY partition_expression, ...
ORDER BY order_expression [ASC | DESC], ...
[frame_clause]
)
function_name
: The window function to apply (e.g., SUM
, AVG
, RANK
, ROW_NUMBER
).expression
: The column or calculation to perform on the window.PARTITION BY
: Divides the rows into partitions to which the window function is applied separately. This is optional; if omitted, the entire table is treated as one partition.ORDER BY
: Specifies the order within each partition. Crucial for ranking and calculating running totals.frame_clause
: (Optional) Further refines the window, e.g., to calculate a moving average or sum for a specific number of rows before and after the current row.Example: Calculating the salary rank within each department.
SELECT
employee_id,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM
employees;
Let's explore common window functions with examples relevant to People Analytics:
Ranking Functions:
RANK()
: Assigns a rank to each row within a partition based on the order. Skips ranks if there are ties.DENSE_RANK()
: Similar to RANK()
, but doesn't skip ranks for ties (e.g., 1, 2, 2, 3).ROW_NUMBER()
: Assigns a unique, sequential integer to each row. Useful for numbering rows within a partition.Example (Top Performers):
sql
SELECT
employee_id,
performance_score,
RANK() OVER (ORDER BY performance_score DESC) AS performance_rank
FROM
performance_reviews
WHERE
review_year = 2023;
Lead and Lag Functions: Access data from preceding or subsequent rows within the partition.
LAG(column, offset, default)
: Retrieves the value from a preceding row. offset
specifies how many rows back (default is 1). default
is the value to return if there is no preceding row.LEAD(column, offset, default)
: Retrieves the value from a subsequent row.Example (Churn Analysis):
sql
SELECT
employee_id,
hire_date,
termination_date,
LAG(termination_date, 1, hire_date) OVER (PARTITION BY employee_id ORDER BY hire_date) AS previous_termination_date
FROM
employee_history;
Aggregate Functions with Windowing: SUM()
, AVG()
, COUNT()
, etc., used in conjunction with OVER()
to compute running totals, moving averages, etc.
Example (Salary Progression):
sql
SELECT
employee_id,
salary,
salary_date,
SUM(salary) OVER (PARTITION BY employee_id ORDER BY salary_date) AS cumulative_salary
FROM
salary_history;
CTEs are temporary result sets defined within the scope of a single SQL statement. They improve query readability, organization, and reusability, especially when dealing with complex logic. Think of them as named subqueries that you can reference within a larger query.
Syntax:
WITH cte_name AS (
SELECT statement
)
SELECT statement
FROM cte_name;
You can chain multiple CTEs using commas:
WITH cte1 AS (...
), cte2 AS (...
)
SELECT ...
FROM cte1
JOIN cte2 ON ...;
Benefits:
* Readability: Break down complex queries into logical blocks.
* Organization: Easier to understand and maintain.
* Reusability: Avoid repeating the same subquery multiple times within a single query.
* Optimization: The query optimizer can often optimize queries involving CTEs effectively.
Example (Calculating Bonus Based on Performance):
WITH performance_summary AS (
SELECT
employee_id,
AVG(performance_score) AS avg_performance
FROM
performance_reviews
GROUP BY
employee_id
),
bonus_calculation AS (
SELECT
employee_id,
CASE
WHEN avg_performance >= 90 THEN salary * 0.10 -- 10% bonus
WHEN avg_performance >= 70 THEN salary * 0.05 -- 5% bonus
ELSE 0 -- No bonus
END AS bonus_amount
FROM
performance_summary
JOIN employees ON performance_summary.employee_id = employees.employee_id
)
SELECT
employees.employee_id,
employees.salary,
COALESCE(bonus_calculation.bonus_amount,0) AS bonus_amount
FROM employees
LEFT JOIN bonus_calculation ON employees.employee_id = bonus_calculation.employee_id;
CTEs and window functions are extremely powerful when combined. CTEs can be used to prepare data for window function calculations, making complex analyses manageable. This is a common pattern in People Analytics for things like identifying top performers and calculating compensation changes.
Example (Identify Top 10% Paid Employees):
WITH employee_salaries_ranked AS (
SELECT
employee_id,
salary,
RANK() OVER (ORDER BY salary DESC) as salary_rank,
COUNT(*) OVER() AS total_employees
FROM
employees
),
top_10_percent AS (
SELECT
employee_id,
salary
FROM
employee_salaries_ranked
WHERE
salary_rank <= (total_employees * 0.10) -- Top 10%
)
SELECT
* -- Consider explicitly listing columns for better readability
FROM
top_10_percent;
Explore advanced insights, examples, and bonus exercises to deepen understanding.
Welcome back! You've already conquered the fundamentals of Window Functions and CTEs. Now, let's elevate your SQL prowess and explore how to apply these powerful techniques for even more sophisticated People Analytics tasks. This session goes beyond simple calculations and dives into query optimization, advanced scenarios, and real-world applications that will truly set you apart.
While Window Functions and CTEs improve query readability, optimization is crucial for handling large datasets. Let's explore some optimization techniques and an advanced CTE application:
Recursive CTEs allow you to query hierarchical data, such as organizational charts or reporting structures. This is invaluable in People Analytics for tasks like:
Example (Simplified Organizational Chart): Imagine a table called `employees` with columns like `employee_id`, `employee_name`, `manager_id` (representing the employee's direct manager). A Recursive CTE might look like this (database-specific syntax may vary):
WITH RECURSIVE employee_hierarchy AS (
-- Anchor Member: Select the initial set of employees (e.g., the CEO)
SELECT employee_id, employee_name, manager_id, 1 AS level
FROM employees
WHERE employee_id = 'CEO_ID'
UNION ALL
-- Recursive Member: Select employees based on the previous result set
SELECT e.employee_id, e.employee_name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT employee_name, level
FROM employee_hierarchy
ORDER BY level;
In this example, the anchor member selects the CEO. The recursive member joins the `employees` table with the `employee_hierarchy` CTE itself, following the `manager_id` to determine the reporting structure level by level.
Let's put your enhanced skills to the test:
Build upon your existing knowledge of Churn Prediction. Assume you have a `employee_activity` table with columns like `employee_id`, `activity_date`, and `activity_type` (e.g., 'login', 'logout', 'performance_review'). Use window functions to calculate:
Using the `employees` table (with `employee_id`, `employee_name`, `salary`, `start_date`, and `manager_id`), write a recursive CTE to:
These advanced techniques are directly applicable in the real world:
For an extra challenge:
Keep exploring these areas:
Write a SQL query using window functions to rank employees within their respective departments based on their performance scores (from a `performance_reviews` table). The ranking should be in descending order of performance score. Your result should include employee ID, department, performance score, and their rank within the department.
Using the `employee_history` table (with columns like `employee_id`, `hire_date`, and `termination_date`), write a query to identify employees who were re-hired after a period. Use the `LAG()` function to compare each employee's termination date with their subsequent hire date (if any). The result should show employee ID, hire date, termination date, and any re-hire date (if applicable). Handle nulls appropriately, and only show employees where their previous termination date occurred *before* their current hire date.
Create a query to calculate an employee's cumulative salary over time using the `salary_history` table. Use a CTE to first order your salary records. Then, in the main query use the `SUM()` window function and `ORDER BY salary_date` within the `OVER()` clause to derive the cumulative salary for each employee. Your result set should show each employee's `employee_id`, `salary`, `salary_date`, and `cumulative_salary`.
Combine CTEs and window functions to create a query that calculates bonuses for employees based on multiple criteria: (1) average performance score, (2) years of service, and (3) a market adjustment factor. * Use a CTE to calculate each employee's average performance score and years of service. * Use a second CTE to apply bonus logic, considering both performance and years of service using CASE statements. * Join the results with a market adjustment factor table (e.g., `market_adjustments`, with columns like `employee_id`, `adjustment_factor`). * Apply the market adjustment factor to the bonus amount.
Develop a churn prediction model using historical employee data (e.g., hire date, termination date, salary, performance reviews, demographics). Use CTEs and window functions to calculate features like tenure, average performance over time, and time since last promotion. Then, use these features to identify factors that contribute to employee turnover. You might then create a dashboard showing key churn risk indicators.
Prepare for the next lesson on data aggregation and summarization using SQL, including aggregate functions, grouping, and filtering (HAVING clause). You should be comfortable with using join queries, and subqueries.
We're automatically tracking your progress. Sign up for free to keep your learning paths forever and unlock advanced features like detailed analytics and personalized recommendations.